Several methods of creating category type data in pandas are introduced in detail
- 2021-10-16 02:11:41
- OfStack
Detailed introduction of several methods to create category type data in pandas
T1, directly create category type data
It can be seen that in category type data, the value of every 1 element is either one of the preset types or a null value (np. nan).
T2 dynamically adds category type data by using box-dividing mechanism (combining max, mean and min to realize two classifications)
Output result
[NaN, 'medium', 'medium', 'fat']
Categories (2, object): ['medium', 'fat']
name ID age age02 ... weight test01 test02 age02_mark
0 Bob 1 NaN 14 ... 140.5 1.000000 1.000000 Minors
1 LiSa 2 28 26 ... 120.8 2.123457 2.123457 Adults
2 Mary 38 24 ... 169.4 3.123457 3.123457 Adults
3 Alan None 6 ... 155.6 4.123457 4.123457 Minors
[4 rows x 12 columns]
Practice code
import pandas as pd
import numpy as np
contents={"name": ['Bob', 'LiSa', 'Mary', 'Alan'],
"ID": [1, 2, ' ', None], # Output NaN
"age": [np.nan, 28, 38 , '' ], # Output
"age02": [14, 26, 24 , 6],
"born": [pd.NaT, pd.Timestamp("1990-01-01"), pd.Timestamp("1980-01-01"), ''], # Output NaT
"sex": [' Male ', ' Female ', ' Female ', None,], # Output None
"hobbey":[' Play basketball ', ' Play badminton ', ' Play table tennis ', '',], # Output
"money":[200.0, 240.0, 290.0, 300.0], # Output
"weight":[140.5, 120.8, 169.4, 155.6], # Output
"test01":[1, 2.123456789, 3.123456781011126, 4.123456789109999], # Output
"test02":[1, 2.123456789, 3.123456781011126, 4.123456789109999], # Output
}
data_frame = pd.DataFrame(contents)
# T1 Directly create category Type data
weight_mark=pd.Categorical(['thin','medium','medium','fat'],categories=['medium','fat'])
print(weight_mark)
# T2 , using the box splitting mechanism ( Combine max , mean , min Realization 2 Classification ) Dynamic addition category Type data
col_age_des=pd.Series(data_frame['age02']).describe()
age_ranges=[col_age_des['min']-1,col_age_des['mean'],col_age_des['max']+1]
age_labels=['Minors','Adults'] # Those above the average are fat
data_frame['age02_mark']=pd.cut(data_frame['age02'],age_ranges,labels=age_labels)
print(data_frame)