pandas Realization of Deleting Row Deleting Column Adding Row and Adding Column

  • 2021-07-10 20:06:40
  • OfStack

Create df:


>>> df = pd.DataFrame(np.arange(16).reshape(4, 4), columns=list('ABCD'), index=list('1234'))

>>> df

  A  B  C  D

1  0  1  2  3

2  4  5  6  7

3  8  9 10 11

4 12 13 14 15 

1. Delete rows

1.1, drop

Delete by row name:


df = df.drop(['1', '2'])      #  Do not specify axis Default to 0

df.drop(['1', '3'], inplace=True) 

Delete by line number:


df.drop(df.index[0], inplace=True)    #  Delete the 1 Row 
df.drop(df.index[0:3], inplace=True)   #  Before deletion 3 Row 
df.drop(df.index[[0, 2]], inplace=True) #  Delete the 1 No. 1 3 Row  

1.2. Delete rows through various filtering methods

See the notes of pandas "Select Row Cell, Select Row and Column" for details

For example, many functions can be realized through filtering. For example, to deduplicate a row of data, you can obtain the deduplicated index list and use loc method:


>>> df.loc['2','B']=9

>>> df

  A  B  C  D

1  0  1  2  3

2  4  9  6  7

3  8  9 10 11

4 12 13 14 15

>>> chooses = df['B'].drop_duplicates().index

>>> df.loc[chooses]

  A  B  C  D

1  0  1  2  3

2  4  9  6  7

4 12 13 14 15 

2. Delete columns

2.1. del


del df['A'] #  Delete A Column, which will be modified in place  

2.2. drop

Delete by column name:


df = df.drop(['B', 'C'], axis=1)        # drop Will not be modified in place, create a copy and return 

df.drop(['B', 'C'], axis=1, inplace=True)   # inplace=True Will be modified in place  

Using column number deletion, the passed parameters are int, list, and slice:


df.drop(df.columns[0], axis=1, inplace=True)    #  Delete the 1 Column 

df.drop(df.columns[0:3], axis=1, inplace=True)   #  Before deletion 3 Column 

df.drop(df.columns[[0, 2]], axis=1, inplace=True) #  Delete the 1 No. 1 3 Column  

2.3. Delete columns through various filtering methods

See pandas "Select Row Cell, Select Row and Column" note for details

3. Add rows

3.1, loc, at, set_value

Want to add 1 line with line name '5' and content [16, 17, 18, 19]


df.loc['5'] = [16, 17, 18, 19]  #  The following sequence is Iterable Just do it 

df.at['5'] = [16, 17, 18, 19]

df.set_value('5', df.columns, [16, 17, 18, 19], takeable=False)  # warning , set_value Will be cancelled  

3.2. append

Series with name added:


s = pd.Series([16, 17, 18, 19], index=df.columns, name='5')

df = df.append(s) 

To add Series without name, ignore_index is required:


s = pd.Series([16, 17, 18, 19], index=df.columns)

df = df.append(s, ignore_index=True)   

append dictionary list is available, and ignore_index is also required:


df = df.drop(['1', '2'])      #  Do not specify axis Default to 0

df.drop(['1', '3'], inplace=True) 
0

3.3. Increase line by line

Simply add content line by line, you can:


df = df.drop(['1', '2'])      #  Do not specify axis Default to 0

df.drop(['1', '3'], inplace=True) 
1

However, it should be noted that len (df) generates int. If the generated int and df already exist, the data in this row will be overwritten instead of being added

3.4, Insert Row

There is no insertable method like insert for adding rows. The temporary alternative method can be reindex first and then assign values:


df = df.drop(['1', '2'])      #  Do not specify axis Default to 0

df.drop(['1', '3'], inplace=True) 
2

4. df adds columns

1 When it comes to adding columns, it often traverses the existing data to obtain the value of the new columns, so this paper discusses adding columns in combination with the traversal of DataFrame.

For example, if you want to add one column 'E', the value is equal to the sum of the corresponding values of 'A' and 'C' columns.

4.1. Traverse the DataFrame to get the sequence


s = [a + c for a, c in zip(df['A'], df['C'])]     #  Get a sequence by traversing 

s = [row['A'] + row['C'] for i, row in df.iterrows()] #  Pass iterrows() Gets the sequence, s For list

s = df.apply(lambda row: row['A'] + row['C'], axis=1) #  Pass apply Gets the sequence, s For Series

s = df['A'] + df['C']                 #  Pass Series Vector addition acquisition sequence 

s = df['A'].values + df['C'].values          #  Pass Numpy Vector addition acquisition sequence  

4.2, [], loc

Add sequences through df [] or df. loc


df = df.drop(['1', '2'])      #  Do not specify axis Default to 0

df.drop(['1', '3'], inplace=True) 
4

4.3, Insert

You can specify the insertion location and insert column name


df = df.drop(['1', '2'])      #  Do not specify axis Default to 0

df.drop(['1', '3'], inplace=True) 
5

4.4, concat


df = df.drop(['1', '2'])      #  Do not specify axis Default to 0

df.drop(['1', '3'], inplace=True) 
6

4.5, Assigning Column Values During iloc and loc Traversal

The efficiency is relatively low

df ['E'] is an Series of DataFrame, which is a reference. Modification of DataFrame can also change DataFrame, but the runtime reported Warning


df = df.drop(['1', '2'])      #  Do not specify axis Default to 0

df.drop(['1', '3'], inplace=True) 
7

Warning will not be reported without Series:


df['E'] = None

col_no = [i for i in df.columns].index('E') 

for i in range(len(df)):

  df.iloc[i, col_no] = df['A'].iloc[i] + df['C'].iloc[i] 

You do not first assign a null value to the E column with loc:


df = df.drop(['1', '2'])      #  Do not specify axis Default to 0

df.drop(['1', '3'], inplace=True) 
9

4.6, column by column

By simply adding content column by column, you can:


df[len(df)] = [16, 17, 18, 19] 

However, it should be noted that len (df) generates int. If the generated int and df already exist, the column data will be overwritten instead of being added

4.7. Other methods

Add 3 columns, EFG and value default to np. NaN


df = pd.concat([df, pd.DataFrame(columns=list('EFG'))])  #  The order of columns cannot be specified, and fillna Will affect the whole df Make adjustments 

df = df.reindex(columns=list('ABCDEFG'), fill_value=0)  #  Columns are ordered according to list Specifies, and fill_value Only make adjustments to new columns, recommended!  


Related articles: