Splice multiple dataframe vertically using pandas to ignore row and column indexes
- 2021-10-15 11:04:17
- OfStack
After getting a batch of stock data from wind, I found that it was originally a type of data, but due to different quarters and different column names, a lot of NaN always appeared when merging multiple reports with pandas, so I wrote a function here, specifically for such a table
Its thinking is:
Generate a bunch of words, then replace all the column indexes of these tables with these words, then call pd. concat () to merge all these dataframe and then change the column indexes back. Of course, you can also specify the column indexes manually here.
See the last line of the code for how to use it. Just pass in an list of dataframe.
import pandas as pd
from random import Random
# Random generation 1 Heap words as common column names
def random_list(random_str_count, randomlengtd=6):
result_list = []
random = Random()
chars = "qwertyuiopasdfghjklzxcvbnm"
for str_count in range(random_str_count):
ranstr = ""
lengtd = len(chars) - 1
for str_lengtd in range(randomlengtd):
ranstr += chars[random.randint(0, lengtd)]
result_list.append(ranstr)
return result_list
def combine_as_data_location(pd_list, columns=''):
if not pd_list:
return None
old_columns = pd_list[0].columns
if columns:
new_columns = columns
else:
new_columns = random_list(pd_list[0].shape[1])
for data_df in pd_list:
# data is pandas Dataframe
data_df.columns = new_columns
result_df = pd.concat(pd_list, ignore_index=True)
if columns:
return result_df
else:
result_df.columns = old_columns
return result_df
result_df = combine_as_data_location([df1,df2,df3])
Supplement: pandas. concat realizes vertical splicing and horizontal splicing DataFrame
1. concat vertical splicing (default vertical, axis=0)
Without saying much, look directly at the example:
import pandas as pd
df1=pd.DataFrame([10,12,13])
df2=pd.DataFrame([22,33,44,55])
df3=pd.DataFrame([90,94])
df1
0 | |
---|---|
0 | 10 |
1 | 12 |
2 | 13 |
df2
0 | |
---|---|
0 | 22 |
1 | 33 |
2 | 44 |
3 | 55 |
df3
0 | |
---|---|
0 | 90 |
1 | 94 |
res= pd.concat([df1,df2,df3])
res
0 | |
---|---|
0 | 10 |
1 | 12 |
2 | 13 |
0 | 22 |
1 | 33 |
2 | 44 |
3 | 55 |
0 | 90 |
1 | 94 |
What if you want to build a new index and ignore the original index?
There is a default parameter ignore_index=False, and change its value to True:
res2= pd.concat([df1,df2,df3], ignore_index=True)
res2
0 | |
---|---|
0 | 10 |
1 | 12 |
2 | 13 |
3 | 22 |
4 | 33 |
5 | 44 |
6 | 55 |
7 | 90 |
8 | 94 |
2. concat is horizontally spliced
With the parameter axis = 1, look at the example:
res_heng= pd.concat([df1,df2,df3], axis=1)
res_heng
0 | 0 | 0 | |
---|---|---|---|
0 | 10.0 | 22 | 90.0 |
1 | 12.0 | 33 | 94.0 |
2 | 13.0 | 44 | NaN |
3 | NaN | 55 | NaN |