Realization of Series and DataFrame Index in Pandas

  • 2021-07-03 00:33:21
  • OfStack

Text

When indexing Series objects and DataFrame objects, we should make clear such a concept: whether to use subscripts for indexing or keywords for indexing. For example, list uses subscripts when indexing, while dict uses keywords when indexing.

When using subscript index, the subscript always starts from 0, and the index value is always a number. And use keywords for indexing, keywords are the values in key, which can be numbers or strings.

Introduction to the Series object:

The Series object is composed of an index index and a value values, with one index corresponding to one value. Where index is the Index object in pandas. values is an array object in numpy.


import pandas as pd
s1 = pd.Series([2,3,4,5], index=['a', 'b', 'c', 'd'])
print(s1)
 Results: 
a  2
b  3
c  4
d  5
dtype: int64

print(s1.index)
 Results: 
Index(['a', 'b', 'c', 'd'], dtype='object')

print(s1.values)
 Results: 
[2 3 4 5]

How do I index an Series object?

1: Indexing using values in index


print(s1['a'])
 Results: 
2

print(s1[['a','d']])
 Results: 
a  2
d  5
dtype: int64


print(s1['b':'d'])
 Results (note that the slice index is saved last 1 Values): 
b  3
c  4
d  5
dtype: int64

2: Indexing with subscripts


print(s1[0])
 Results: 
2

print(s1[[0,3]])
 Results: 
a  2
d  5
dtype: int64

print(s1[1:3])
 Results (Note: The difference here is that the last is not saved 1 Values, the same as the normal index): 
b  3
c  4
dtype: int64

3: Special circumstances:

The above index is a string. If index is a number, is the index carried out according to index value or subscript?


s1 = pd.Series([2,3,4,5], index=[1,2,3,4])
print(s1[2])
 Results: 
3
print(s1[0])  Will report an error 

print(s1[[2,4]])
 Results: 
2  3
4  5
dtype: int64

print(s1[1:3])
 Results: 
2  3
3  4
dtype: int64

It can be seen that when index is an integer, the first two choices are indexed with the value of index, and the last one slice choice is indexed with subscripts.

4: Indexing with Boolean Series

When using Boolean Series for indexing, we actually require Boolean Series and our index object to have the same index.


s1 = pd.Series([2,3,4,5], index=['a', 'b', 'c', 'd']
print(s1 > 3)
 The result (this is 1 A bool Series ): 
a  False
b  False
c   True
d   True
dtype: bool

print(s1[s1 > 3])
 The result (just put bool Series  Incoming Series You can implement the index): 
c  4
d  5
dtype: int64

5: Use the Index object for indexing

There is no essential difference between using Index objects for indexing and using value indexes. Because there are many values stored in Index, Index can be regarded as an list.

Introduction to the DataFrame object:

The DataFrame object is a table consisting of rows and columns. In DataFrame, the rows are made up of columns and the columns are made up of index, both of which are Index objects. Its value is also an numpy array.


data = {'name':['ming', 'hong', 'gang', 'tian'], 'age':[12, 13, 14, 20], 'score':[80.3, 88.2, 90, 99.9]}
df1 = pd.DataFrame(data)

print(df1.index)
 Results: 
RangeIndex(start=0, stop=4, step=1)

print(df1.columns)
 Results: 
Index(['age', 'name', 'score'], dtype='object')

print(df1.values)
 Results: 
[[12 'ming' 80.3]
 [13 'hong' 88.2]
 [14 'gang' 90.0]
 [20 'tian' 99.9]]

How to Index DataFrame Objects

1: Indexing columns with columns values

Indexing directly using the values in columns results in 1 column or multi-column values


print(df1['name'])
 Results: 
0  ming
1  hong
2  gang
3  tian
Name: name, dtype: object

print(df1[['name','age']])
 Results: 
name age
0 ming  12
1 hong  13
2 gang  14
3 tian  20
 Note: Columns cannot be indexed directly with subscripts unless the columns Contains the value. For example, the following operation is wrong 
print(df1[0])
Results: Errors

2: Slicing or Boolean Series to index rows

Use slice index, or Boolean type Series for index:


print(df1[0:3])
 Use slices to select, and the results are: 
age name score
0  12 ming  80.3
1  13 hong  88.2
2  14 gang  90.0

print(df1[ df1['age'] > 13 ])
 Use Boolean types Series Indexing actually requires Boolean Series And DataFrame Have the same index Results: 
age name score
2  14 gang  90.0
3  20 tian  99.9

3: Indexing with loc and iloc

Essentially, loc is indexed with the values of index and columns, while iloc ignores the values of index and columns, and is always indexed with subscripts starting from 0. So when you understand this sentence, the following index will become very simple:


print(df1.loc[3])
 Results: 
name   hong
score  88.2
Name: 3, dtype: object

print(df1.loc[:,'age'])
 Results: 
1  12
3  13
4  14
5  20
Name: age, dtype: int64

print(df1.iloc[3])
 Results: 
age    20
name   tian
score  99.9
Name: 5, dtype: object

print(df1.iloc[:,1])
 Results: 
1  ming
3  hong
4  gang
5  tian
Name: name, dtype: object


Related articles: