Notes on pandas DataFrame Assignment of index

  • 2021-10-24 23:21:20
  • OfStack

1 pandas DataFrame1 column assignment problem

Description, assign the column of b to a

Case 1: a, b and index are the same

The following code


import pandas as pd
import numpy as np
a = pd.DataFrame(np.arange(16).reshape(4,4),index=list('abcd'),columns=list('wxyz'))
b = pd.DataFrame(np.array([11,22,33,44]),index=list('abcd'),columns=['m'])
a['m'] = b['m']
print(a)

The result of the above code is as follows


  w  x  y  z  m
a  0  1  2  3 11
b  4  5  6  7 22
c  8  9 10 11 33
d 12 13 14 15 44

Case 1 is the most basic situation, and the result is in line with expectations. The reason why it is in line with expectations is that a and b are all equipped with the same index, and the assignment operation comes according to index. What if b does not set Index, but uses the default index?

Case 2: index of b takes the default value

The code is as follows


import pandas as pd
import numpy as np
a = pd.DataFrame(np.arange(16).reshape(4,4),index=list('abcd'),columns=list('wxyz'))
b = pd.DataFrame(np.array([11,22,33,44]),columns=['m'])
a['m'] = b['m']
print(a)

The results are as follows


  w  x  y  z  m
a  0  1  2  3 NaN
b  4  5  6  7 NaN
c  8  9 10 11 NaN
d 12 13 14 15 NaN

In case 2, the result is beyond imagination. index in b is 0, 1, 2, 3, which is different from index ('a', 'b', 'c', 'd') in a. In the process of assignment, the value at the same position of index is found in b according to index in a. Because index is different, a is assigned to NaN

Case 3: Part of Index in b is the same as that in a

The code is as follows


import pandas as pd
import numpy as np
a = pd.DataFrame(np.arange(16).reshape(4,4),index=list('abcd'),columns=list('wxyz'))
b = pd.DataFrame(np.array([11,22,33,44]),index=list('arpb'),columns=['m'])
a['m'] = b['m']
print(a)

The results are as follows


  w  x  y  z   m
a  0  1  2  3 11.0
b  4  5  6  7 44.0
c  8  9 10 11  NaN
d 12 13 14 15  NaN

From the results of case 3, it can be seen that only the rows with the same Index can be assigned successfully

Summary:

It can be seen from the above that Pandas DataFrame is assigned strictly according to Index, and if Index is different, it is assigned to NaN

Supplement: A solution for the original data unchanged after DataFrame modifies the specific cell value in python programming process

Recently, I participated in a competition, which was designed to clean the data. I need to modify some abnormal values. I always operate like this


df[condition]['column'].iloc[0:3] = ......

Or


df[condition]['column'][0:3] = ......

condition represents the logical expression that satisfies the condition, and column represents the column name

1 is still useful, but occasionally errors occur, mainly because the expression df [condition] is not standardized in python, so it is easy for cells to fail to assign values after running. After trying many methods, we finally use the standard loc or iloc expression


df.loc[[row condition],['column']] = ......

For example:


NA.loc[[23,29,49],' Overall size of North America '] = ......

Or


  w  x  y  z  m
a  0  1  2  3 11
b  4  5  6  7 22
c  8  9 10 11 33
d 12 13 14 15 44
0

Note that loc is connected with specific column names, and iloc is connected with the position number list corresponding to the column names that meet the conditions, so avoid confusion!


Related articles: