Several methods of creating category type data in pandas are introduced in detail

  • 2021-10-16 02:11:41
  • OfStack

Detailed introduction of several methods to create category type data in pandas

T1, directly create category type data
It can be seen that in category type data, the value of every 1 element is either one of the preset types or a null value (np. nan).

T2 dynamically adds category type data by using box-dividing mechanism (combining max, mean and min to realize two classifications)

Output result
[NaN, 'medium', 'medium', 'fat']
Categories (2, object): ['medium', 'fat']
name ID age age02 ... weight test01 test02 age02_mark
0 Bob 1 NaN 14 ... 140.5 1.000000 1.000000 Minors
1 LiSa 2 28 26 ... 120.8 2.123457 2.123457 Adults
2 Mary 38 24 ... 169.4 3.123457 3.123457 Adults
3 Alan None 6 ... 155.6 4.123457 4.123457 Minors

[4 rows x 12 columns]

Practice code


import pandas as pd
import numpy as np
 
contents={"name": ['Bob',    'LiSa',           'Mary',            'Alan'],
     "ID":  [1,       2,             ' ',             None],  #  Output  NaN
     "age": [np.nan,    28,              38 ,             '' ],  #  Output  
     "age02": [14,      26,              24 ,             6], 
    "born": [pd.NaT,   pd.Timestamp("1990-01-01"), pd.Timestamp("1980-01-01"),    ''],   #  Output  NaT
     "sex": [' Male ',     ' Female ',            ' Female ',            None,],  #  Output  None
     "hobbey":[' Play basketball ',   ' Play badminton ',          ' Play table tennis ',          '',],  #  Output  
     "money":[200.0,        240.0,          290.0,           300.0], #  Output 
     "weight":[140.5,        120.8,         169.4,           155.6], #  Output 
     "test01":[1,  2.123456789,    3.123456781011126,  4.123456789109999],  #  Output 
     "test02":[1,  2.123456789,    3.123456781011126,  4.123456789109999],  #  Output 
     }
data_frame = pd.DataFrame(contents)
 
 
 
# T1 Directly create  category Type data 
weight_mark=pd.Categorical(['thin','medium','medium','fat'],categories=['medium','fat'])
print(weight_mark)
 
 
 
# T2 , using the box splitting mechanism ( Combine max , mean , min Realization 2 Classification ) Dynamic addition  category Type data 
col_age_des=pd.Series(data_frame['age02']).describe()
age_ranges=[col_age_des['min']-1,col_age_des['mean'],col_age_des['max']+1]
age_labels=['Minors','Adults']        #  Those above the average are fat 
data_frame['age02_mark']=pd.cut(data_frame['age02'],age_ranges,labels=age_labels)
print(data_frame)

Related articles: