An instance of Groupby usage in Python data analysis grouped by dictionary or Series

  • 2020-06-15 09:25:29
  • OfStack

In data analysis, sometimes you need to define your own grouping rules. Here's a brief introduction to 1


people=DataFrame(
  np.random.randn(5,5),
  columns=['a','b','c','d','e'],
  index=['Joe','Steve','Wes','Jim','Travis']
)
mapping={'a':'red','b':'red','c':'blue','d':'blue','e':'red','f':'orange'}
by_column=people.groupby(mapping,axis=1)# Grouping in the column direction 

I don't know how python works underneath the surface, it's better to print out the result of the run


for i in by_column:
  print (i)

Results of traversal:


('blue',  c     d
Joe   0.218189 -0.228336
Steve  1.677264 0.630303
Wes   0.315320 -0.250787
Jim   3.343462 0.483021
Travis 0.854553 -0.760884)
('red',   a     b     e
Joe   0.218164 0.823654 -1.425720
Steve  1.191175 -0.327735 1.926470
Wes  -1.418855 0.497466 0.110105
Jim  -1.157157 0.817122 0.749023
Travis -0.440583 -0.907922 1.374294)

As can be seen from the results, a b e red, c d blue

a b e--- > red

c d --- > blue

Next, execute people.groupby (mapping,axis=1).mean ()


      blue    red
Joe   0.241336 -0.182099
Steve  0.459773 -0.448336
Wes   0.205278 0.605721
Jim  -0.094838 1.254174
Travis 0.354140 0.142075

The results show that after the aggregate function mean() is executed in the column direction of the group group, the only column indexes are blue and red.

The whole process can be understood in this way: grouping a b e is red for group 1 and c d for group 1 is blue. Finally, red blue is used as the column index of the new DataFraem

Similarly, Series has the same functionality and can be viewed as a fixed size mapping. For the example above, if series is used as the grouping key, pandas checks Series to ensure that its index grouping axis is aligned:

ser=Series(mapping)
a red
b red
c blue
d blue
e red
f orange

by_ser_group=people.groupby(ser,axis=1).mean()

blue red
Joe 0.241336 -0.182099
Steve 0.459773 -0.448336
Wes 0.205278 0.605721
Jim -0.094838 1.254174
Travis 0.354140 0.142075

As you can see from the results, grouping by dictionary is the same as grouping by Series. That is, they perform the same trick, matching the index (for series) or the dictionary's key to the index of Dataframe,

Those with the same value of value or values in the dictionary will be divided into 1 group and then aggregated according to each group.

groupby USES a lot, and I'll update the blog later. If there are any mistakes in those areas, you are welcome to point out that 1 block of learning, common progress.


Related articles: