pandas DataFrame Row column Index and Method of Obtaining Value

  • 2021-07-09 08:27:00
  • OfStack

pandas DataFrame is 2-dimensional, so it has both column and row indexes

Only column indexes were introduced in the previous article:


import pandas as pd

df = pd.DataFrame({'A': [0, 1, 2], 'B': [3, 4, 5]})
print df

#  Results :
  A B
0 0 3
1 1 4
2 2 5

The row index automatically generates 0, 1, 2

If you want to specify your own row and column indexes, you can use the index and column parameters:

This data is the passenger flow data of 5 stations within 10 days:


ridership_df = pd.DataFrame(
  data=[[  0,  0,  2,  5,  0],
     [1478, 3877, 3674, 2328, 2539],
     [1613, 4088, 3991, 6461, 2691],
     [1560, 3392, 3826, 4787, 2613],
     [1608, 4802, 3932, 4477, 2705],
     [1576, 3933, 3909, 4979, 2685],
     [ 95, 229, 255, 496, 201],
     [  2,  0,  1,  27,  0],
     [1438, 3785, 3589, 4174, 2215],
     [1342, 4043, 4009, 4665, 3033]],
  index=['05-01-11', '05-02-11', '05-03-11', '05-04-11', '05-05-11',
      '05-06-11', '05-07-11', '05-08-11', '05-09-11', '05-10-11'],
  columns=['R003', 'R004', 'R005', 'R006', 'R007']
)

The data parameter is an numpy 2-dimensional array, the index parameter is a row index, and the column parameter is a column index

The generated data is displayed in tabular form:


     R003 R004 R005 R006 R007
05-01-11   0   0   2   5   0
05-02-11 1478 3877 3674 2328 2539
05-03-11 1613 4088 3991 6461 2691
05-04-11 1560 3392 3826 4787 2613
05-05-11 1608 4802 3932 4477 2705
05-06-11 1576 3933 3909 4979 2685
05-07-11  95  229  255  496  201
05-08-11   2   0   1  27   0
05-09-11 1438 3785 3589 4174 2215
05-10-11 1342 4043 4009 4665 3033

Here's how to get the value in DataFrame:

1. Get a 1 column: Directly ['key']


print(ridership_df['R003'])

#  Results :
05-01-11    0
05-02-11  1478
05-03-11  1613
05-04-11  1560
05-05-11  1608
05-06-11  1576
05-07-11   95
05-08-11    2
05-09-11  1438
05-10-11  1342
Name: R003, dtype: int64

2. Get a 1 line:. loc ['key']


print(ridership_df.loc['05-01-11'])
#  Or 
print(ridership_df.iloc[0])


#  Results :
R003  0
R004  0
R005  2
R006  5
R007  0
Name: 05-01-11, dtype: int64

3. Get a value in a 1 row and a 1 column:


print(ridership_df.loc['05-05-11','R003'])
#  Or 
print(ridership_df.iloc[4,0])

#  Results :
1608

4. Get the original numpy2-dimensional array:


print(ridership_df.values)

#  Results :
[[  0  0  2  5  0]
 [1478 3877 3674 2328 2539]
 [1613 4088 3991 6461 2691]
 [1560 3392 3826 4787 2613]
 [1608 4802 3932 4477 2705]
 [1576 3933 3909 4979 2685]
 [ 95 229 255 496 201]
 [  2  0  1  27  0]
 [1438 3785 3589 4174 2215]
 [1342 4043 4009 4665 3033]]

* Note that in this process, if the data format is not 1, it will be converted.

1 integrated chestnut:

From ridership_df, find the station with the most passenger flow on the first day, and then return the average daily passenger flow to this station and the average daily passenger flow to all stations for comparison:


def mean_riders_for_max_station(ridership):
  max_index = ridership.iloc[0].argmax()
  mean_for_max = ridership[max_index].mean()
  overall_mean = ridership.values.mean()
  return (overall_mean, mean_for_max)

print mean_riders_for_max_station(ridership_df)

#  Results :
(2342.6, 3239.9)

Related articles: