How to use Series for pandas data structure
- 2021-06-29 11:24:04
- OfStack
1. Series
Series is the data structure of an array of classes with labels (lable) or indexes (index).
A simplest Series object is generated below 1.1 because no index is specified for Series, so the default index (from 0 to N-1) is used.
# Introduce Series and DataFrame
In [16]: from pandas import Series,DataFrame
In [17]: import pandas as pd
In [18]: ser1 = Series([1,2,3,4])
In [19]: ser1
Out[19]:
0 1
1 2
2 3
3 4
dtype: int64
1.2 When you want to generate an Series for a specified index, you can do this:
# to index Appoint 1 individual list
In [23]: ser2 = Series(range(4),index = ["a","b","c","d"])
In [24]: ser2
Out[24]:
a 0
b 1
c 2
d 3
dtype: int64
1.3 You can also create Series objects from a dictionary
In [45]: sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
In [46]: ser3 = Series(sdata)
# Can be found, created with a dictionary Series Yes by index Ordered
In [47]: ser3
Out[47]:
Ohio 35000
Oregon 16000
Texas 71000
Utah 5000
dtype: int64
When a dictionary is used to generate Series, an index can also be specified. When the value in the dictionary corresponding to the value in the index does not exist, the value of the index is marked as Missing, NA, and the function (pandas.isnull, pandas.notnull) can be used to determine which index has no corresponding value.
In [48]: states = ['California', 'Ohio', 'Oregon', 'Texas']
In [49]: ser3 = Series(sdata,index = states)
In [50]: ser3
Out[50]:
California NaN
Ohio 35000.0
Oregon 16000.0
Texas 71000.0
dtype: float64
# Determine which values are null
In [51]: pd.isnull(ser3)
Out[51]:
California True
Ohio False
Oregon False
Texas False
dtype: bool
In [52]: pd.notnull(ser3)
Out[52]:
California False
Ohio True
Oregon True
Texas True
dtype: bool
1.4 Access elements and indexes in Series:
# Access index is "a" Elements of
In [25]: ser2["a"]
Out[25]: 0
# Access index is "a","c" Elements of
In [26]: ser2[["a","c"]]
Out[26]:
a 0
c 2
dtype: int64
# Get all values
In [27]: ser2.values
Out[27]: array([0, 1, 2, 3])
# Get all indexes
In [28]: ser2.index
Out[28]: Index([u'a', u'b', u'c', u'd'], dtype='object')
1.5 Simple operations
In pandas's Series, the NumPy array operations (filtering data with Boolean arrays, scalar multiplication, and using mathematical functions) are preserved, while the use of references is maintained
In [34]: ser2[ser2 > 2]
Out[34]:
a 64
d 3
dtype: int64
In [35]: ser2 * 2
Out[35]:
a 128
b 2
c 4
d 6
dtype: int64
In [36]: np.exp(ser2)
Out[36]:
a 6.235149e+27
b 2.718282e+00
c 7.389056e+00
d 2.008554e+01
dtype: float64
1.6 Series Auto-alignment
One of the important functions of Series is auto-alignment (not noticeable), just look at the examples.Almost the same way different Series objects are computed according to their index.
# ser3 Contents
In [60]: ser3
Out[60]:
Ohio 35000
Oregon 16000
Texas 71000
Utah 5000
dtype: int64
# ser4 Contents
In [61]: ser4
Out[61]:
California NaN
Ohio 35000.0
Oregon 16000.0
Texas 71000.0
dtype: float64
# Addition of elements with the same index value
In [62]: ser3 + ser4
Out[62]:
California NaN
Ohio 70000.0
Oregon 32000.0
Texas 142000.0
Utah NaN
dtype: float64
1.7 Naming
The Series object itself, as well as the index, has an name attribute
In [64]: ser4.index.name = "state"
In [65]: ser4.name = "population"
In [66]: ser4
Out[66]:
state
California NaN
Ohio 35000.0
Oregon 16000.0
Texas 71000.0
Name: population, dtype: float64