Notes on pandas DataFrame Assignment of index
- 2021-10-24 23:21:20
- OfStack
1 pandas DataFrame1 column assignment problem
Description, assign the column of b to a
Case 1: a, b and index are the same
The following code
import pandas as pd
import numpy as np
a = pd.DataFrame(np.arange(16).reshape(4,4),index=list('abcd'),columns=list('wxyz'))
b = pd.DataFrame(np.array([11,22,33,44]),index=list('abcd'),columns=['m'])
a['m'] = b['m']
print(a)
The result of the above code is as follows
w x y z m
a 0 1 2 3 11
b 4 5 6 7 22
c 8 9 10 11 33
d 12 13 14 15 44
Case 1 is the most basic situation, and the result is in line with expectations. The reason why it is in line with expectations is that a and b are all equipped with the same index, and the assignment operation comes according to index. What if b does not set Index, but uses the default index?
Case 2: index of b takes the default value
The code is as follows
import pandas as pd
import numpy as np
a = pd.DataFrame(np.arange(16).reshape(4,4),index=list('abcd'),columns=list('wxyz'))
b = pd.DataFrame(np.array([11,22,33,44]),columns=['m'])
a['m'] = b['m']
print(a)
The results are as follows
w x y z m
a 0 1 2 3 NaN
b 4 5 6 7 NaN
c 8 9 10 11 NaN
d 12 13 14 15 NaN
In case 2, the result is beyond imagination. index in b is 0, 1, 2, 3, which is different from index ('a', 'b', 'c', 'd') in a. In the process of assignment, the value at the same position of index is found in b according to index in a. Because index is different, a is assigned to NaN
Case 3: Part of Index in b is the same as that in a
The code is as follows
import pandas as pd
import numpy as np
a = pd.DataFrame(np.arange(16).reshape(4,4),index=list('abcd'),columns=list('wxyz'))
b = pd.DataFrame(np.array([11,22,33,44]),index=list('arpb'),columns=['m'])
a['m'] = b['m']
print(a)
The results are as follows
w x y z m
a 0 1 2 3 11.0
b 4 5 6 7 44.0
c 8 9 10 11 NaN
d 12 13 14 15 NaN
From the results of case 3, it can be seen that only the rows with the same Index can be assigned successfully
Summary:
It can be seen from the above that Pandas DataFrame is assigned strictly according to Index, and if Index is different, it is assigned to NaN
Supplement: A solution for the original data unchanged after DataFrame modifies the specific cell value in python programming process
Recently, I participated in a competition, which was designed to clean the data. I need to modify some abnormal values. I always operate like this
df[condition]['column'].iloc[0:3] = ......
Or
df[condition]['column'][0:3] = ......
condition represents the logical expression that satisfies the condition, and column represents the column name
1 is still useful, but occasionally errors occur, mainly because the expression df [condition] is not standardized in python, so it is easy for cells to fail to assign values after running. After trying many methods, we finally use the standard loc or iloc expression
df.loc[[row condition],['column']] = ......
For example:
NA.loc[[23,29,49],' Overall size of North America '] = ......
Or
w x y z m
a 0 1 2 3 11
b 4 5 6 7 22
c 8 9 10 11 33
d 12 13 14 15 44
0
Note that loc is connected with specific column names, and iloc is connected with the position number list corresponding to the column names that meet the conditions, so avoid confusion!