Easy mastery of dataframe data selection in ten minutes

2021-10-16 02:23:56
OfStack

Data initialization


import pandas as pd
import numpy as np
a=np.array([[' Beijing ',' North ','1 Line ',' Non-coastal '],[' Hangzhou ',' South ','2 Line ',' Non-coastal '],[' Shenzhen ',' South ','1 Line ',' Coastal '],[' Yantai ',' North ','3 Line ',' Coastal ']])
df=pd.DataFrame(a,index=['1','2','3','4'],columns=[' City ',' Geography ',' Level ',' Whether it is coastal or not '])

Is the geographical level of the city coastal
1 Beijing North Line 1 is not coastal
2 Hangzhou South Line 2 is not coastal
3 coastal areas of Shenzhen South Line 1
4 coastal areas of Yantai North Line 3

Select a 1 row

Select a 1 row through loc

The loc tag is the axis tag, which is our index name, and it is also very simple to use


df.loc['2']

City Hangzhou
Geographical south
Level 2 line
Whether it is coastal or not
Name: 2, dtype: object

Select a 1 row through iloc

iloc is an integer label, similar to the index of the tuple list we use. For example, we want to select the data in the second row, and the index in the second row is 1.


df.iloc[1]

City Hangzhou
Geographical south
Level 2 line
Whether it is coastal or not
Name: 2, dtype: object

Select a 1 column

The easiest way to select a 1 column

If we know the column index, it becomes 10 points simple to select a 1 column


df[' Level ']

1 1 line
Line 2 2
Line 3 1
Line 4 3
Name: Level, dtype: object

Select a column 1 by iloc

As we did with iloc above, we simply pass in the index of the row or column. In fact, two parameters can be entered in the brackets of iloc. The front row is followed by the column, which is separated by commas. (If commas are omitted, the selected row is taken by default.)

For example, now that we want to select Column 2, we just need to enter the comma money for all rows, and then enter 1 for Column 2


df.iloc[:, 1]

1 North
2 South
3 South
4 North
Name: Geography, dtype: object

Select a 1 column through loc

This is similar to the use of iloc, except that we no longer use row integer indexes in data filtering, but specific index values.


df.loc[:, ' Whether it is coastal or not ']

1 Non-coastal
2 Non-coastal
3 Coastal
4 Coastal
Name: Coastal, dtype: object

Select a few columns of a 1 row or a 1 row of a 1 column

In fact, loc and iloc are the most efficient ways to select data in dataframe, and their functions are 10 points powerful. We can combine at will.

Select a few columns of a 1 row

For example, we now select the middle two columns of row 2


df.iloc[1,1:3]

Geographical south
Level 2 line
Name: 2, dtype: object

Of course, we can also not use integer indexes


df.loc['2':,' Geography ':' Level ']

Geographical level
2 Southern Line 2
3 Southern Line 1
4 Northern Line 3

Select data through free combination of rows and columns

For example, we want to select columns 2 and 3 from rows 2 to 3


df.iloc[2:4:, 2:4]

Is the level coastal
Coastal line 3 1
Coastal line 4 3

The same 10 points is simple, and the use effect through loc is the same, which is not described too much here

Select certain columns or rows

Select a few columns


df.iloc[:,2:4]

Is the level coastal
Line 1 1 is not coastal
Line 2 is not coastal
Coastal line 3 1
Coastal line 4 3

Select certain lines

Is the geographical level of the city coastal
2 Hangzhou South Line 2 is not coastal
3 coastal areas of Shenzhen South Line 1

Get a single scalar value

If you think of dataframe as a table, you can think of it as getting the value of a cell in the table

Get it through iat

iat is an integer label


df.loc['2']

'Line 1'

Get through at

at is the specific index value to get


df.loc['2']

'Line 1'