python data processing 67 pandas functions are summarized and used after reading them

  • 2021-12-12 09:13:31
  • OfStack

Directory guide data export data view data data selection data processing data grouping sorting perspective data merging

Whether it is business data analysis or data modeling. Data processing is an extremely important step, which is very important for the final result.

Today, I will summarize several important knowledge of "Pandas data processing" for everyone, and use it immediately.

Derived data Export data Viewing Data Data selection Data processing Data grouping and sorting Data consolidation

#  Before you can use it, you need to import pandas Library 
import pandas as pd

Derived data

Here I summarize seven common usages for you.


pd.DataFrame() #  Create your own data box for practice 

pd.read_csv(filename) #  From CSV⽂ Component guide ⼊ Data 

pd.read_table(filename) #  Object that qualifies the delimiter ⽂ Ben ⽂ Component guide ⼊ Data 

pd.read_excel(filename) #  From Excel⽂ Component guide ⼊ Data 

pd.read_sql(query,connection_object) #  From SQL Table / Library guide ⼊ Data 

pd.read_json(json_string) #  From JSON String derivation of format ⼊ Data 

pd.read_html(url) #  Analyse URL String or HTML⽂ Component, and extract the tables Forms 

Export data

Here are five common usages.


df.to_csv(filename) # Export data to CSV⽂ Piece 

df.to_excel(filename) # Export data to Excel⽂ Piece 

df.to_sql(table_name,connection_object) # Export data to SQL Table 

df.to_json(filename) # With Json Format to export data to ⽂ Ben ⽂ Piece 

writer=pd.ExcelWriter('test.xlsx',index=False) 
df1.to_excel(writer,sheet_name=' Unit ') And writer.save() Write multiple data frames to ⼊ Same as ⼀ A ⼯ Multiple books sheet(⼯ Make a table )

Viewing Data

Here are 11 common usages.


df.head(n) #  View DataFrame Object before n⾏

df.tail(n) #  View DataFrame The last of the object n⾏

df.shape() #  View ⾏ Number and number of columns 

df.info() #  Viewing indexes, data types, and memory information 

df.columns() #  View Fields ( ⾸⾏ ) Name 

df.describe() #  View summary statistics for numeric columns 

s.value_counts(dropna=False) #  View Series Object's only ⼀ Values and counts 

df.apply(pd.Series.value_counts) #  View DataFrame Object for each ⼀ Column only ⼀ Values and counts 

df.isnull().any() #  Check to see if there are missing values 

df[df[column_name].duplicated()] #  View column_name Data information with duplicate field data 

df[df[column_name].duplicated()].count() #  View column_name Number of duplicates of field data 

Data selection

Here are 10 common usages.


df[col] #  According to the column name, and with Series Returns a column in the form of 

df[[col1,col2]] #  With DataFrame Form returns multiple columns 

s.iloc[0] #  Select data by location 

s.loc['index_one'] #  Select data by index 

df.iloc[0,:] #  Return to the ⼀⾏

df.iloc[0,0] #  Return to the ⼀ The first of the column ⼀ Elements 

df.loc[0,:] #  Return to the ⼀⾏ When the index is the default number, ⽤ Fa Tong df.iloc ), but it should be noted that loc Is by index ,iloc Parameter only accepts numeric parameters 

df.ix[[:5],["col1","col2"]] #  The returned field is col1 And col2 Before the 5 Data, which can be understood as loc And 
iloc The combination of. 

df.at[5,"col1"] #  Select the index name as 5 The field name is col1 Data of 

df.iat[5,0] #  Select the index sort as 5 The fields are sorted as 0 Data of 

Data processing

Here are 16 common usages.


df.columns= ['a','b','c'] #  Rename column names (all column names need to be listed, otherwise an error will be reported) 

pd.isnull() #  Check DataFrame Object and returns a null value in the ⼀ A Boolean Array 

pd.notnull() #  Check DataFrame Object in the ⾮ Null value and returns ⼀ A Boolean Array 

df.dropna() #  Delete all null values ⾏

df.dropna(axis=1) #  Delete all columns containing null values 

df.dropna(axis=1,thresh=n) #  Delete all ⼩ In n A ⾮ Null value ⾏

df.fillna(value=x) # ⽤x Replace DataFrame All null values in the object, ⽀ Hold 

df[column_name].fillna(x)

s.astype(float) #  Will Series Change the data type in the float Type 

s.replace(1,'one') # ⽤ ' one' Instead of all equal to 1 Value of 

s.replace([1,3],['one','three']) # ⽤'one' Substitute 1 , ⽤'three' Substitute 3

df.rename(columns=lambdax:x+1) #  Bulk change of column names 

df.rename(columns={'old_name':'new_ name'}) #  Selective change of column name 

df.set_index('column_one') #  Set a field as an index, which can accept list parameters, that is, set multiple indexes 

df.reset_index("col1") #  Set the index to col1 Field and set the index to 0,1,2...

df.rename(index=lambdax:x+1) #  Batch rename index 

Data grouping, sorting, perspective

Here are 13 common usages.


df.sort_index().loc[:5] #  Aligned front 5 Bar data entry ⾏ Index sorting 

df.sort_values(col1) #  By column col1 Sort data, default ascending order 

df.sort_values(col2,ascending=False) #  By column col1 Arrange data in descending order 

df.sort_values([col1,col2],ascending=[True,False]) #  By column first col1 Arrange in ascending order, then press col2 Arrange data in descending order 

df.groupby(col) #  Return ⼀ By column col Advance ⾏ Grouped Groupby Object 

df.groupby([col1,col2]) #  Return ⼀ Enter by multiple columns ⾏ Grouped Groupby Object 

df.groupby(col1)[col2].agg(mean) #  Return by Column col1 Advance ⾏ After grouping, columns col2 Mean value of ,agg You can accept list parameters, agg([len,np.mean])

df.pivot_table(index=col1,values=[col2,col3],aggfunc={col2:max,col3:[ma,min]}) #  Create ⼀ By column col1 Advance ⾏ Grouping, calculation col2 The most ⼤ Value sum col3 The most ⼤ Value, most ⼩ Pivottable for values 

df.groupby(col1).agg(np.mean) #  Return by Column col1 Average value of all columns grouped ,⽀ Hold 

df.groupby(col1).col2.agg(['min','max'])

data.apply(np.mean) #  Right DataFrame Every one in ⼀ Column should ⽤ Function np.mean

data.apply(np.max,axis=1) #  Right DataFrame Every one in ⼀⾏ Should ⽤ Function np.max

df.groupby(col1).col2.transform("sum") #  Usually associated with groupby Even ⽤ Avoid index changes 

Data consolidation

Here are five common usages.


df1.append(df2) #  Will df2 In ⾏ Add to df1 Tail of 

df.concat([df1,df2],axis=1,join='inner') #  Will df2 Add columns in the df1 Tail of , Value is null correspondence ⾏ Neither the corresponding column nor the corresponding column 

df1.join(df2.set_index(col1),on=col1,how='inner') #  Right df1 Columns and columns of df2 The listing of ⾏SQL Formal join By default, enter by index ⾏ Merge, if df1 And df2 If there is a common field, an error will be reported. You can set the lsuffix,rsuffix Come in ⾏ Solve, if necessary, according to the common list ⾏ Merge, it is necessary ⽤ To set_index(col1)

pd.merge(df1,df2,on='col1',how='outer') #  Right df1 And df2 Merge, according to col1 , ⽅ The formula is outer

pd.merge(df1,df2,left_index=True,right_index=True,how='outer') # And  df1.join(df2, how='outer') Have the same effect 

The above is the python data processing 67 pandas function summary used after reading the details, more information about python data processing 6pandas function please pay attention to other related articles on this site!


Related articles: