pandas Realization of Deleting Row Deleting Column Adding Row and Adding Column
- 2021-07-10 20:06:40
- OfStack
Create df:
>>> df = pd.DataFrame(np.arange(16).reshape(4, 4), columns=list('ABCD'), index=list('1234'))
>>> df
A B C D
1 0 1 2 3
2 4 5 6 7
3 8 9 10 11
4 12 13 14 15
1. Delete rows
1.1, drop
Delete by row name:
df = df.drop(['1', '2']) # Do not specify axis Default to 0
df.drop(['1', '3'], inplace=True)
Delete by line number:
df.drop(df.index[0], inplace=True) # Delete the 1 Row
df.drop(df.index[0:3], inplace=True) # Before deletion 3 Row
df.drop(df.index[[0, 2]], inplace=True) # Delete the 1 No. 1 3 Row
1.2. Delete rows through various filtering methods
See the notes of pandas "Select Row Cell, Select Row and Column" for details
For example, many functions can be realized through filtering. For example, to deduplicate a row of data, you can obtain the deduplicated index list and use loc method:
>>> df.loc['2','B']=9
>>> df
A B C D
1 0 1 2 3
2 4 9 6 7
3 8 9 10 11
4 12 13 14 15
>>> chooses = df['B'].drop_duplicates().index
>>> df.loc[chooses]
A B C D
1 0 1 2 3
2 4 9 6 7
4 12 13 14 15
2. Delete columns
2.1. del
del df['A'] # Delete A Column, which will be modified in place
2.2. drop
Delete by column name:
df = df.drop(['B', 'C'], axis=1) # drop Will not be modified in place, create a copy and return
df.drop(['B', 'C'], axis=1, inplace=True) # inplace=True Will be modified in place
Using column number deletion, the passed parameters are int, list, and slice:
df.drop(df.columns[0], axis=1, inplace=True) # Delete the 1 Column
df.drop(df.columns[0:3], axis=1, inplace=True) # Before deletion 3 Column
df.drop(df.columns[[0, 2]], axis=1, inplace=True) # Delete the 1 No. 1 3 Column
2.3. Delete columns through various filtering methods
See pandas "Select Row Cell, Select Row and Column" note for details
3. Add rows
3.1, loc, at, set_value
Want to add 1 line with line name '5' and content [16, 17, 18, 19]
df.loc['5'] = [16, 17, 18, 19] # The following sequence is Iterable Just do it
df.at['5'] = [16, 17, 18, 19]
df.set_value('5', df.columns, [16, 17, 18, 19], takeable=False) # warning , set_value Will be cancelled
3.2. append
Series with name added:
s = pd.Series([16, 17, 18, 19], index=df.columns, name='5')
df = df.append(s)
To add Series without name, ignore_index is required:
s = pd.Series([16, 17, 18, 19], index=df.columns)
df = df.append(s, ignore_index=True)
append dictionary list is available, and ignore_index is also required:
df = df.drop(['1', '2']) # Do not specify axis Default to 0
df.drop(['1', '3'], inplace=True)
0
3.3. Increase line by line
Simply add content line by line, you can:
df = df.drop(['1', '2']) # Do not specify axis Default to 0
df.drop(['1', '3'], inplace=True)
1
However, it should be noted that len (df) generates int. If the generated int and df already exist, the data in this row will be overwritten instead of being added
3.4, Insert Row
There is no insertable method like insert for adding rows. The temporary alternative method can be reindex first and then assign values:
df = df.drop(['1', '2']) # Do not specify axis Default to 0
df.drop(['1', '3'], inplace=True)
2
4. df adds columns
1 When it comes to adding columns, it often traverses the existing data to obtain the value of the new columns, so this paper discusses adding columns in combination with the traversal of DataFrame.
For example, if you want to add one column 'E', the value is equal to the sum of the corresponding values of 'A' and 'C' columns.
4.1. Traverse the DataFrame to get the sequence
s = [a + c for a, c in zip(df['A'], df['C'])] # Get a sequence by traversing
s = [row['A'] + row['C'] for i, row in df.iterrows()] # Pass iterrows() Gets the sequence, s For list
s = df.apply(lambda row: row['A'] + row['C'], axis=1) # Pass apply Gets the sequence, s For Series
s = df['A'] + df['C'] # Pass Series Vector addition acquisition sequence
s = df['A'].values + df['C'].values # Pass Numpy Vector addition acquisition sequence
4.2, [], loc
Add sequences through df [] or df. loc
df = df.drop(['1', '2']) # Do not specify axis Default to 0
df.drop(['1', '3'], inplace=True)
4
4.3, Insert
You can specify the insertion location and insert column name
df = df.drop(['1', '2']) # Do not specify axis Default to 0
df.drop(['1', '3'], inplace=True)
5
4.4, concat
df = df.drop(['1', '2']) # Do not specify axis Default to 0
df.drop(['1', '3'], inplace=True)
6
4.5, Assigning Column Values During iloc and loc Traversal
The efficiency is relatively low
df ['E'] is an Series of DataFrame, which is a reference. Modification of DataFrame can also change DataFrame, but the runtime reported Warning
df = df.drop(['1', '2']) # Do not specify axis Default to 0
df.drop(['1', '3'], inplace=True)
7
Warning will not be reported without Series:
df['E'] = None
col_no = [i for i in df.columns].index('E')
for i in range(len(df)):
df.iloc[i, col_no] = df['A'].iloc[i] + df['C'].iloc[i]
You do not first assign a null value to the E column with loc:
df = df.drop(['1', '2']) # Do not specify axis Default to 0
df.drop(['1', '3'], inplace=True)
9
4.6, column by column
By simply adding content column by column, you can:
df[len(df)] = [16, 17, 18, 19]
However, it should be noted that len (df) generates int. If the generated int and df already exist, the column data will be overwritten instead of being added
4.7. Other methods
Add 3 columns, EFG and value default to np. NaN
df = pd.concat([df, pd.DataFrame(columns=list('EFG'))]) # The order of columns cannot be specified, and fillna Will affect the whole df Make adjustments
df = df.reindex(columns=list('ABCDEFG'), fill_value=0) # Columns are ordered according to list Specifies, and fill_value Only make adjustments to new columns, recommended!