Realization of Series and DataFrame Index in Pandas
- 2021-07-03 00:33:21
- OfStack
Text
When indexing Series objects and DataFrame objects, we should make clear such a concept: whether to use subscripts for indexing or keywords for indexing. For example, list uses subscripts when indexing, while dict uses keywords when indexing.
When using subscript index, the subscript always starts from 0, and the index value is always a number. And use keywords for indexing, keywords are the values in key, which can be numbers or strings.
Introduction to the Series object:
The Series object is composed of an index index and a value values, with one index corresponding to one value. Where index is the Index object in pandas. values is an array object in numpy.
import pandas as pd
s1 = pd.Series([2,3,4,5], index=['a', 'b', 'c', 'd'])
print(s1)
Results:
a 2
b 3
c 4
d 5
dtype: int64
print(s1.index)
Results:
Index(['a', 'b', 'c', 'd'], dtype='object')
print(s1.values)
Results:
[2 3 4 5]
How do I index an Series object?
1: Indexing using values in index
print(s1['a'])
Results:
2
print(s1[['a','d']])
Results:
a 2
d 5
dtype: int64
print(s1['b':'d'])
Results (note that the slice index is saved last 1 Values):
b 3
c 4
d 5
dtype: int64
2: Indexing with subscripts
print(s1[0])
Results:
2
print(s1[[0,3]])
Results:
a 2
d 5
dtype: int64
print(s1[1:3])
Results (Note: The difference here is that the last is not saved 1 Values, the same as the normal index):
b 3
c 4
dtype: int64
3: Special circumstances:
The above index is a string. If index is a number, is the index carried out according to index value or subscript?
s1 = pd.Series([2,3,4,5], index=[1,2,3,4])
print(s1[2])
Results:
3
print(s1[0]) Will report an error
print(s1[[2,4]])
Results:
2 3
4 5
dtype: int64
print(s1[1:3])
Results:
2 3
3 4
dtype: int64
It can be seen that when index is an integer, the first two choices are indexed with the value of index, and the last one slice choice is indexed with subscripts.
4: Indexing with Boolean Series
When using Boolean Series for indexing, we actually require Boolean Series and our index object to have the same index.
s1 = pd.Series([2,3,4,5], index=['a', 'b', 'c', 'd']
print(s1 > 3)
The result (this is 1 A bool Series ):
a False
b False
c True
d True
dtype: bool
print(s1[s1 > 3])
The result (just put bool Series Incoming Series You can implement the index):
c 4
d 5
dtype: int64
5: Use the Index object for indexing
There is no essential difference between using Index objects for indexing and using value indexes. Because there are many values stored in Index, Index can be regarded as an list.
Introduction to the DataFrame object:
The DataFrame object is a table consisting of rows and columns. In DataFrame, the rows are made up of columns and the columns are made up of index, both of which are Index objects. Its value is also an numpy array.
data = {'name':['ming', 'hong', 'gang', 'tian'], 'age':[12, 13, 14, 20], 'score':[80.3, 88.2, 90, 99.9]}
df1 = pd.DataFrame(data)
print(df1.index)
Results:
RangeIndex(start=0, stop=4, step=1)
print(df1.columns)
Results:
Index(['age', 'name', 'score'], dtype='object')
print(df1.values)
Results:
[[12 'ming' 80.3]
[13 'hong' 88.2]
[14 'gang' 90.0]
[20 'tian' 99.9]]
How to Index DataFrame Objects
1: Indexing columns with columns values
Indexing directly using the values in columns results in 1 column or multi-column values
print(df1['name'])
Results:
0 ming
1 hong
2 gang
3 tian
Name: name, dtype: object
print(df1[['name','age']])
Results:
name age
0 ming 12
1 hong 13
2 gang 14
3 tian 20
Note: Columns cannot be indexed directly with subscripts unless the columns Contains the value. For example, the following operation is wrong
print(df1[0])
Results: Errors
2: Slicing or Boolean Series to index rows
Use slice index, or Boolean type Series for index:
print(df1[0:3])
Use slices to select, and the results are:
age name score
0 12 ming 80.3
1 13 hong 88.2
2 14 gang 90.0
print(df1[ df1['age'] > 13 ])
Use Boolean types Series Indexing actually requires Boolean Series And DataFrame Have the same index Results:
age name score
2 14 gang 90.0
3 20 tian 99.9
3: Indexing with loc and iloc
Essentially, loc is indexed with the values of index and columns, while iloc ignores the values of index and columns, and is always indexed with subscripts starting from 0. So when you understand this sentence, the following index will become very simple:
print(df1.loc[3])
Results:
name hong
score 88.2
Name: 3, dtype: object
print(df1.loc[:,'age'])
Results:
1 12
3 13
4 14
5 20
Name: age, dtype: int64
print(df1.iloc[3])
Results:
age 20
name tian
score 99.9
Name: 5, dtype: object
print(df1.iloc[:,1])
Results:
1 ming
3 hong
4 gang
5 tian
Name: name, dtype: object