DataFrame Data Deletion Details in Pandas

  • 2021-12-04 10:44:02
  • OfStack

Directory 1. According to the default row and column index operation 1.1 row delete 1.2 column delete 2. According to the custom row and column index operation 2.1 Row Delete 2.2 Column Delete

This article introduces Pandas Medium DataFrame Data deletion, mainly using drop , del Way.


# drop Parameter interpretation of function 
drop(
        self,
        labels=None, #  Is the label of the row and column to be deleted, given by the list ;
        axis=0, # axis Where does it mean 1 Axis, 0 Is the row (default), 1 For columns ;
        index=None, # index Refers to a 1 Line or more 
        columns=None, # columns Refers to a 1 Column or multiple columns 
        level=None, # level Refers to the level, for the case of multiple indexes ;
        inplace=False, # inplaces Whether to replace the original dataframe;
        errors="raise",
)
axis=0 Or   And  index Or columns  Specifies that the column only needs to use the 1 Group will do 

1. Operate by default row and column index

Sample data


import numpy as np
import pandas as pd
#  Generate random arrays -5 Row 5 Column 
df = pd.DataFrame(np.random.rand(5,5))
print(df)

Data display


          0         1         2         3         4
0  0.760489  0.074633  0.788416  0.087612  0.560539
1  0.758450  0.599777  0.384075  0.525483  0.628910
2  0.386808  0.148106  0.742207  0.452627  0.775963
3  0.662909  0.134640  0.186186  0.735429  0.459556
4  0.328694  0.269088  0.331404  0.835388  0.899107

1.1 Line deletion

[1] Delete single line


#  Delete a single line, delete the 2 Row 
df.drop(df.index[1],inplace=True) # inplace=True  In-situ modification 
print(df)

Implementation results:

0 1 2 3 4
0 0.605764 0.234973 0.566346 0.598105 0.478153
2 0.383230 0.822174 0.228855 0.743258 0.076701
3 0.875287 0.576668 0.176982 0.341827 0.112582
4 0.205425 0.898544 0.799174 0.000905 0.377990

[2] Delete discontinuous lines


#  Delete discontinuous lines , Delete the 2 And 4 Row 
df.drop(df.index[[1,3]],inplace=True)
print(df)

Implementation results:

0 1 2 3 4
0 0.978612 0.556539 0.781362 0.547527 0.706686
2 0.845822 0.321716 0.444176 0.053915 0.296631
4 0.617735 0.040859 0.129235 0.525116 0.005357

[3] Delete multiple consecutive lines


#  Delete multiple consecutive lines 
df.drop(df.index[1:3],inplace=True) #  Open interval, finally 1 Index numbers are not counted 
print(df)

Implementation results:

0 1 2 3 4
0 0.072891 0.926297 0.882265 0.971368 0.567840
3 0.163212 0.546069 0.360990 0.494274 0.065744
4 0.752917 0.242112 0.526675 0.918713 0.320725

1.2 Column deletion

Columns can be deleted using the del And drop There are two ways, del df [1] # Delete the second column, which is in-place deletion. This article explains the deletion of drop function in detail.

[1] Delete specified column


df.drop([1,3],axis=1,inplace=True) #  Specify an axis as a column 
# df.drop(columns=[1,3],inplace=True) #  Specify columns directly 

Implementation results:

0 2 4
0 0.592869 0.123369 0.815126
1 0.127064 0.093994 0.332790
2 0.411560 0.118753 0.143854
3 0.965317 0.267740 0.349927
4 0.688604 0.699658 0.932645

[2] Delete contiguous columns


df.drop(df.columns[1:3],axis=1,inplace=True) # Specify axis 
# df.drop(columns=df.columns[1:3],inplace = True) #  Specify column 
print(df)

Implementation results:

0 3 4
0 0.309674 0.974694 0.660285
1 0.677328 0.969440 0.953452
2 0.954114 0.953569 0.959771
3 0.365643 0.417065 0.951372
4 0.733081 0.880914 0.804032

2. Operate according to a custom row and column index

Sample data


df = pd.DataFrame(data=np.random.rand(5,5))
df.index = list('abcde')
df.columns = list('12345')
print(df)

Data display


          1         2         3         4         5
a  0.188495  0.574422  0.530326  0.842489  0.474946
b  0.912522  0.982093  0.964031  0.498638  0.826693
c  0.580789  0.013957  0.515229  0.795052  0.859267
d  0.540641  0.865602  0.305256  0.552566  0.754791
e  0.375407  0.236118  0.129210  0.711744  0.067356

2.1 Row Deletion

[1] Delete single line


import numpy as np
import pandas as pd
#  Generate random arrays -5 Row 5 Column 
df = pd.DataFrame(np.random.rand(5,5))
print(df)

0

Implementation results:

1 2 3 4 5
a 0.306350 0.622067 0.030573 0.490563 0.009987
c 0.672423 0.071661 0.274529 0.400086 0.263024
d 0.654204 0.809087 0.066099 0.167290 0.534452
e 0.628917 0.232629 0.070167 0.469962 0.957898

[2] Delete multiple rows


import numpy as np
import pandas as pd
#  Generate random arrays -5 Row 5 Column 
df = pd.DataFrame(np.random.rand(5,5))
print(df)

1

Implementation results:

1 2 3 4 5
a 0.391583 0.509862 0.924634 0.466563 0.058414
c 0.802016 0.621347 0.659215 0.575728 0.935811
e 0.223372 0.286116 0.130587 0.113544 0.910859

2.2 Column deletion

[1] Delete single column


import numpy as np
import pandas as pd
#  Generate random arrays -5 Row 5 Column 
df = pd.DataFrame(np.random.rand(5,5))
print(df)

2

Implementation results:

1 3 4 5
a 0.276147 0.797404 0.184472 0.081162
b 0.630190 0.328055 0.428668 0.168491
c 0.979958 0.029032 0.934626 0.106805
d 0.762995 0.003134 0.136252 0.317423
e 0.137211 0.116607 0.367742 0.840080

[2] Delete multiple columns


df.drop(['2','4'],axis=1,inplace=True) #  Delete multiple columns 
# df.drop(columns=['2','4'],inplace=True) #  Delete multiple columns 
print(df)

Implementation results:

1 3 5
a 0.665647 0.709243 0.019711
b 0.920729 0.995913 0.490998
c 0.352816 0.185802 0.406174
d 0.136414 0.563546 0.762806
e 0.259710 0.775422 0.794880


Related articles: