python data processing 67 pandas functions are summarized and used after reading them
- 2021-12-12 09:13:31
- OfStack
Whether it is business data analysis or data modeling. Data processing is an extremely important step, which is very important for the final result.
Today, I will summarize several important knowledge of "Pandas data processing" for everyone, and use it immediately.
Derived data Export data Viewing Data Data selection Data processing Data grouping and sorting Data consolidation
# Before you can use it, you need to import pandas Library
import pandas as pd
Derived data
Here I summarize seven common usages for you.
pd.DataFrame() # Create your own data box for practice
pd.read_csv(filename) # From CSV⽂ Component guide ⼊ Data
pd.read_table(filename) # Object that qualifies the delimiter ⽂ Ben ⽂ Component guide ⼊ Data
pd.read_excel(filename) # From Excel⽂ Component guide ⼊ Data
pd.read_sql(query,connection_object) # From SQL Table / Library guide ⼊ Data
pd.read_json(json_string) # From JSON String derivation of format ⼊ Data
pd.read_html(url) # Analyse URL String or HTML⽂ Component, and extract the tables Forms
Export data
Here are five common usages.
df.to_csv(filename) # Export data to CSV⽂ Piece
df.to_excel(filename) # Export data to Excel⽂ Piece
df.to_sql(table_name,connection_object) # Export data to SQL Table
df.to_json(filename) # With Json Format to export data to ⽂ Ben ⽂ Piece
writer=pd.ExcelWriter('test.xlsx',index=False)
df1.to_excel(writer,sheet_name=' Unit ') And writer.save() Write multiple data frames to ⼊ Same as ⼀ A ⼯ Multiple books sheet(⼯ Make a table )
Viewing Data
Here are 11 common usages.
df.head(n) # View DataFrame Object before n⾏
df.tail(n) # View DataFrame The last of the object n⾏
df.shape() # View ⾏ Number and number of columns
df.info() # Viewing indexes, data types, and memory information
df.columns() # View Fields ( ⾸⾏ ) Name
df.describe() # View summary statistics for numeric columns
s.value_counts(dropna=False) # View Series Object's only ⼀ Values and counts
df.apply(pd.Series.value_counts) # View DataFrame Object for each ⼀ Column only ⼀ Values and counts
df.isnull().any() # Check to see if there are missing values
df[df[column_name].duplicated()] # View column_name Data information with duplicate field data
df[df[column_name].duplicated()].count() # View column_name Number of duplicates of field data
Data selection
Here are 10 common usages.
df[col] # According to the column name, and with Series Returns a column in the form of
df[[col1,col2]] # With DataFrame Form returns multiple columns
s.iloc[0] # Select data by location
s.loc['index_one'] # Select data by index
df.iloc[0,:] # Return to the ⼀⾏
df.iloc[0,0] # Return to the ⼀ The first of the column ⼀ Elements
df.loc[0,:] # Return to the ⼀⾏ When the index is the default number, ⽤ Fa Tong df.iloc ), but it should be noted that loc Is by index ,iloc Parameter only accepts numeric parameters
df.ix[[:5],["col1","col2"]] # The returned field is col1 And col2 Before the 5 Data, which can be understood as loc And
iloc The combination of.
df.at[5,"col1"] # Select the index name as 5 The field name is col1 Data of
df.iat[5,0] # Select the index sort as 5 The fields are sorted as 0 Data of
Data processing
Here are 16 common usages.
df.columns= ['a','b','c'] # Rename column names (all column names need to be listed, otherwise an error will be reported)
pd.isnull() # Check DataFrame Object and returns a null value in the ⼀ A Boolean Array
pd.notnull() # Check DataFrame Object in the ⾮ Null value and returns ⼀ A Boolean Array
df.dropna() # Delete all null values ⾏
df.dropna(axis=1) # Delete all columns containing null values
df.dropna(axis=1,thresh=n) # Delete all ⼩ In n A ⾮ Null value ⾏
df.fillna(value=x) # ⽤x Replace DataFrame All null values in the object, ⽀ Hold
df[column_name].fillna(x)
s.astype(float) # Will Series Change the data type in the float Type
s.replace(1,'one') # ⽤ ' one' Instead of all equal to 1 Value of
s.replace([1,3],['one','three']) # ⽤'one' Substitute 1 , ⽤'three' Substitute 3
df.rename(columns=lambdax:x+1) # Bulk change of column names
df.rename(columns={'old_name':'new_ name'}) # Selective change of column name
df.set_index('column_one') # Set a field as an index, which can accept list parameters, that is, set multiple indexes
df.reset_index("col1") # Set the index to col1 Field and set the index to 0,1,2...
df.rename(index=lambdax:x+1) # Batch rename index
Data grouping, sorting, perspective
Here are 13 common usages.
df.sort_index().loc[:5] # Aligned front 5 Bar data entry ⾏ Index sorting
df.sort_values(col1) # By column col1 Sort data, default ascending order
df.sort_values(col2,ascending=False) # By column col1 Arrange data in descending order
df.sort_values([col1,col2],ascending=[True,False]) # By column first col1 Arrange in ascending order, then press col2 Arrange data in descending order
df.groupby(col) # Return ⼀ By column col Advance ⾏ Grouped Groupby Object
df.groupby([col1,col2]) # Return ⼀ Enter by multiple columns ⾏ Grouped Groupby Object
df.groupby(col1)[col2].agg(mean) # Return by Column col1 Advance ⾏ After grouping, columns col2 Mean value of ,agg You can accept list parameters, agg([len,np.mean])
df.pivot_table(index=col1,values=[col2,col3],aggfunc={col2:max,col3:[ma,min]}) # Create ⼀ By column col1 Advance ⾏ Grouping, calculation col2 The most ⼤ Value sum col3 The most ⼤ Value, most ⼩ Pivottable for values
df.groupby(col1).agg(np.mean) # Return by Column col1 Average value of all columns grouped ,⽀ Hold
df.groupby(col1).col2.agg(['min','max'])
data.apply(np.mean) # Right DataFrame Every one in ⼀ Column should ⽤ Function np.mean
data.apply(np.max,axis=1) # Right DataFrame Every one in ⼀⾏ Should ⽤ Function np.max
df.groupby(col1).col2.transform("sum") # Usually associated with groupby Even ⽤ Avoid index changes
Data consolidation
Here are five common usages.
df1.append(df2) # Will df2 In ⾏ Add to df1 Tail of
df.concat([df1,df2],axis=1,join='inner') # Will df2 Add columns in the df1 Tail of , Value is null correspondence ⾏ Neither the corresponding column nor the corresponding column
df1.join(df2.set_index(col1),on=col1,how='inner') # Right df1 Columns and columns of df2 The listing of ⾏SQL Formal join By default, enter by index ⾏ Merge, if df1 And df2 If there is a common field, an error will be reported. You can set the lsuffix,rsuffix Come in ⾏ Solve, if necessary, according to the common list ⾏ Merge, it is necessary ⽤ To set_index(col1)
pd.merge(df1,df2,on='col1',how='outer') # Right df1 And df2 Merge, according to col1 , ⽅ The formula is outer
pd.merge(df1,df2,left_index=True,right_index=True,how='outer') # And df1.join(df2, how='outer') Have the same effect
The above is the python data processing 67 pandas function summary used after reading the details, more information about python data processing 6pandas function please pay attention to other related articles on this site!