DataFrame Data Deletion Details in Pandas
- 2021-12-04 10:44:02
- OfStack
This article introduces
Pandas
MediumDataFrame
Data deletion, mainly usingdrop
,del
Way.
# drop Parameter interpretation of function
drop(
self,
labels=None, # Is the label of the row and column to be deleted, given by the list ;
axis=0, # axis Where does it mean 1 Axis, 0 Is the row (default), 1 For columns ;
index=None, # index Refers to a 1 Line or more
columns=None, # columns Refers to a 1 Column or multiple columns
level=None, # level Refers to the level, for the case of multiple indexes ;
inplace=False, # inplaces Whether to replace the original dataframe;
errors="raise",
)
axis=0 Or And index Or columns Specifies that the column only needs to use the 1 Group will do
1. Operate by default row and column index
Sample data
import numpy as np
import pandas as pd
# Generate random arrays -5 Row 5 Column
df = pd.DataFrame(np.random.rand(5,5))
print(df)
Data display
0 1 2 3 4
0 0.760489 0.074633 0.788416 0.087612 0.560539
1 0.758450 0.599777 0.384075 0.525483 0.628910
2 0.386808 0.148106 0.742207 0.452627 0.775963
3 0.662909 0.134640 0.186186 0.735429 0.459556
4 0.328694 0.269088 0.331404 0.835388 0.899107
1.1 Line deletion
[1] Delete single line
# Delete a single line, delete the 2 Row
df.drop(df.index[1],inplace=True) # inplace=True In-situ modification
print(df)
Implementation results:
0 1 2 3 4
0 0.605764 0.234973 0.566346 0.598105 0.478153
2 0.383230 0.822174 0.228855 0.743258 0.076701
3 0.875287 0.576668 0.176982 0.341827 0.112582
4 0.205425 0.898544 0.799174 0.000905 0.377990
[2] Delete discontinuous lines
# Delete discontinuous lines , Delete the 2 And 4 Row
df.drop(df.index[[1,3]],inplace=True)
print(df)
Implementation results:
0 1 2 3 4
0 0.978612 0.556539 0.781362 0.547527 0.706686
2 0.845822 0.321716 0.444176 0.053915 0.296631
4 0.617735 0.040859 0.129235 0.525116 0.005357
[3] Delete multiple consecutive lines
# Delete multiple consecutive lines
df.drop(df.index[1:3],inplace=True) # Open interval, finally 1 Index numbers are not counted
print(df)
Implementation results:
0 1 2 3 4
0 0.072891 0.926297 0.882265 0.971368 0.567840
3 0.163212 0.546069 0.360990 0.494274 0.065744
4 0.752917 0.242112 0.526675 0.918713 0.320725
1.2 Column deletion
Columns can be deleted using the
del
Anddrop
There are two ways, del df [1] # Delete the second column, which is in-place deletion. This article explains the deletion of drop function in detail.
[1] Delete specified column
df.drop([1,3],axis=1,inplace=True) # Specify an axis as a column
# df.drop(columns=[1,3],inplace=True) # Specify columns directly
Implementation results:
0 2 4
0 0.592869 0.123369 0.815126
1 0.127064 0.093994 0.332790
2 0.411560 0.118753 0.143854
3 0.965317 0.267740 0.349927
4 0.688604 0.699658 0.932645
[2] Delete contiguous columns
df.drop(df.columns[1:3],axis=1,inplace=True) # Specify axis
# df.drop(columns=df.columns[1:3],inplace = True) # Specify column
print(df)
Implementation results:
0 3 4
0 0.309674 0.974694 0.660285
1 0.677328 0.969440 0.953452
2 0.954114 0.953569 0.959771
3 0.365643 0.417065 0.951372
4 0.733081 0.880914 0.804032
2. Operate according to a custom row and column index
Sample data
df = pd.DataFrame(data=np.random.rand(5,5))
df.index = list('abcde')
df.columns = list('12345')
print(df)
Data display
1 2 3 4 5
a 0.188495 0.574422 0.530326 0.842489 0.474946
b 0.912522 0.982093 0.964031 0.498638 0.826693
c 0.580789 0.013957 0.515229 0.795052 0.859267
d 0.540641 0.865602 0.305256 0.552566 0.754791
e 0.375407 0.236118 0.129210 0.711744 0.067356
2.1 Row Deletion
[1] Delete single line
import numpy as np
import pandas as pd
# Generate random arrays -5 Row 5 Column
df = pd.DataFrame(np.random.rand(5,5))
print(df)
0
Implementation results:
1 2 3 4 5
a 0.306350 0.622067 0.030573 0.490563 0.009987
c 0.672423 0.071661 0.274529 0.400086 0.263024
d 0.654204 0.809087 0.066099 0.167290 0.534452
e 0.628917 0.232629 0.070167 0.469962 0.957898
[2] Delete multiple rows
import numpy as np
import pandas as pd
# Generate random arrays -5 Row 5 Column
df = pd.DataFrame(np.random.rand(5,5))
print(df)
1
Implementation results:
1 2 3 4 5
a 0.391583 0.509862 0.924634 0.466563 0.058414
c 0.802016 0.621347 0.659215 0.575728 0.935811
e 0.223372 0.286116 0.130587 0.113544 0.910859
2.2 Column deletion
[1] Delete single column
import numpy as np
import pandas as pd
# Generate random arrays -5 Row 5 Column
df = pd.DataFrame(np.random.rand(5,5))
print(df)
2
Implementation results:
1 3 4 5
a 0.276147 0.797404 0.184472 0.081162
b 0.630190 0.328055 0.428668 0.168491
c 0.979958 0.029032 0.934626 0.106805
d 0.762995 0.003134 0.136252 0.317423
e 0.137211 0.116607 0.367742 0.840080
[2] Delete multiple columns
df.drop(['2','4'],axis=1,inplace=True) # Delete multiple columns
# df.drop(columns=['2','4'],inplace=True) # Delete multiple columns
print(df)
Implementation results:
1 3 5
a 0.665647 0.709243 0.019711
b 0.920729 0.995913 0.490998
c 0.352816 0.185802 0.406174
d 0.136414 0.563546 0.762806
e 0.259710 0.775422 0.794880