Easy mastery of dataframe data selection in ten minutes
- 2021-10-16 02:23:56
- OfStack
Data initialization
import pandas as pd
import numpy as np
a=np.array([[' Beijing ',' North ','1 Line ',' Non-coastal '],[' Hangzhou ',' South ','2 Line ',' Non-coastal '],[' Shenzhen ',' South ','1 Line ',' Coastal '],[' Yantai ',' North ','3 Line ',' Coastal ']])
df=pd.DataFrame(a,index=['1','2','3','4'],columns=[' City ',' Geography ',' Level ',' Whether it is coastal or not '])
Is the geographical level of the city coastal
1 Beijing North Line 1 is not coastal
2 Hangzhou South Line 2 is not coastal
3 coastal areas of Shenzhen South Line 1
4 coastal areas of Yantai North Line 3
Select a 1 row
Select a 1 row through loc
The loc tag is the axis tag, which is our index name, and it is also very simple to use
df.loc['2']
City Hangzhou
Geographical south
Level 2 line
Whether it is coastal or not
Name: 2, dtype: object
Select a 1 row through iloc
iloc is an integer label, similar to the index of the tuple list we use. For example, we want to select the data in the second row, and the index in the second row is 1.
df.iloc[1]
City Hangzhou
Geographical south
Level 2 line
Whether it is coastal or not
Name: 2, dtype: object
Select a 1 column
The easiest way to select a 1 column
If we know the column index, it becomes 10 points simple to select a 1 column
df[' Level ']
1 1 line
Line 2 2
Line 3 1
Line 4 3
Name: Level, dtype: object
Select a column 1 by iloc
As we did with iloc above, we simply pass in the index of the row or column. In fact, two parameters can be entered in the brackets of iloc. The front row is followed by the column, which is separated by commas. (If commas are omitted, the selected row is taken by default.)
For example, now that we want to select Column 2, we just need to enter the comma money for all rows, and then enter 1 for Column 2
df.iloc[:, 1]
1 North
2 South
3 South
4 North
Name: Geography, dtype: object
Select a 1 column through loc
This is similar to the use of iloc, except that we no longer use row integer indexes in data filtering, but specific index values.
df.loc[:, ' Whether it is coastal or not ']
1 Non-coastal
2 Non-coastal
3 Coastal
4 Coastal
Name: Coastal, dtype: object
Select a few columns of a 1 row or a 1 row of a 1 column
In fact, loc and iloc are the most efficient ways to select data in dataframe, and their functions are 10 points powerful. We can combine at will.
Select a few columns of a 1 row
For example, we now select the middle two columns of row 2
df.iloc[1,1:3]
Geographical south
Level 2 line
Name: 2, dtype: object
Of course, we can also not use integer indexes
df.loc['2':,' Geography ':' Level ']
Geographical level
2 Southern Line 2
3 Southern Line 1
4 Northern Line 3
Select data through free combination of rows and columns
For example, we want to select columns 2 and 3 from rows 2 to 3
df.iloc[2:4:, 2:4]
Is the level coastal
Coastal line 3 1
Coastal line 4 3
The same 10 points is simple, and the use effect through loc is the same, which is not described too much here
Select certain columns or rows
Select a few columns
df.iloc[:,2:4]
Is the level coastal
Line 1 1 is not coastal
Line 2 is not coastal
Coastal line 3 1
Coastal line 4 3
Select certain lines
Is the geographical level of the city coastal
2 Hangzhou South Line 2 is not coastal
3 coastal areas of Shenzhen South Line 1
Get a single scalar value
If you think of dataframe as a table, you can think of it as getting the value of a cell in the table
Get it through iat
iat is an integer label
df.loc['2']
0
'Line 1'
Get through at
at is the specific index value to get
df.loc['2']
1
'Line 1'