Splice multiple dataframe vertically using pandas to ignore row and column indexes

  • 2021-10-15 11:04:17
  • OfStack

After getting a batch of stock data from wind, I found that it was originally a type of data, but due to different quarters and different column names, a lot of NaN always appeared when merging multiple reports with pandas, so I wrote a function here, specifically for such a table

Its thinking is:

Generate a bunch of words, then replace all the column indexes of these tables with these words, then call pd. concat () to merge all these dataframe and then change the column indexes back. Of course, you can also specify the column indexes manually here.

See the last line of the code for how to use it. Just pass in an list of dataframe.


import pandas as pd
from random import Random 
 
#  Random generation 1 Heap words as common column names 
def random_list(random_str_count, randomlengtd=6):
 result_list = []
 random = Random()
 chars = "qwertyuiopasdfghjklzxcvbnm"
 for str_count in range(random_str_count):
 ranstr = ""
 lengtd = len(chars) - 1
 for str_lengtd in range(randomlengtd):
 ranstr += chars[random.randint(0, lengtd)]
 result_list.append(ranstr)
 return result_list 
 
def combine_as_data_location(pd_list, columns=''):
 if not pd_list:
 return None
 old_columns = pd_list[0].columns
 if columns:
 new_columns = columns
 else:
 new_columns = random_list(pd_list[0].shape[1])
 for data_df in pd_list:
 # data is pandas Dataframe
 data_df.columns = new_columns
 result_df = pd.concat(pd_list, ignore_index=True)
 if columns:
 return result_df
 else:
 result_df.columns = old_columns
 return result_df 
result_df = combine_as_data_location([df1,df2,df3])

Supplement: pandas. concat realizes vertical splicing and horizontal splicing DataFrame

1. concat vertical splicing (default vertical, axis=0)

Without saying much, look directly at the example:


import pandas as pd
df1=pd.DataFrame([10,12,13])
df2=pd.DataFrame([22,33,44,55])
df3=pd.DataFrame([90,94])

df1

0
0 10
1 12
2 13

df2

0
0 22
1 33
2 44
3 55

df3

0
0 90
1 94

res= pd.concat([df1,df2,df3])
res
0
0 10
1 12
2 13
0 22
1 33
2 44
3 55
0 90
1 94

What if you want to build a new index and ignore the original index?

There is a default parameter ignore_index=False, and change its value to True:


res2= pd.concat([df1,df2,df3], ignore_index=True)
res2
0
0 10
1 12
2 13
3 22
4 33
5 44
6 55
7 90
8 94

2. concat is horizontally spliced

With the parameter axis = 1, look at the example:


res_heng= pd.concat([df1,df2,df3], axis=1)
res_heng
0 0 0
0 10.0 22 90.0
1 12.0 33 94.0
2 13.0 44 NaN
3 NaN 55 NaN

Related articles: