Delete the operation that generated the Unnamed: 0 column in pandas

2021-10-16 02:09:25
OfStack

When we work with data, we are often careless. pandas will "actively" add the names of rows and columns. I have encountered this problem now.

This is the final data after various splices of the data generated by to_csv in pandas (default parameter, index=True, column=True)


Unnamed: 0   ip Unnamed: 0.1 ...  766  767 class
0   0 google.com    0 ... 0.376452 0.148091  0
1   1 facebook.com    1 ... -0.044634 -0.180167  0
2   2 youtube.com    2 ... 0.172028 0.002102  0
3   3  yahoo.com    3 ... 0.286067 -0.269647  0
4   4  baidu.com    4 ... 0.034892 0.445554  0

We can see that the first column Unnamed: 0 and the third column Unnamed: 0 are the data we don't want. The reason is that when we generate csv file, we use the default parameters. When we generate csv, we can use the following parameters to solve this problem.

For to_csv (), set index=False. Or add index=True, index_label= "id"

In addition, other students will say that I don't want to repeat the data processing work again, so I want to process it in our generation of CSV. One kind is OK, but the fact is that I do the same.


import pandas as pd 
data = pd.read_csv('finalData.csv')
print('1 How many samples are there? ', len(data))
print(' Before showing the sample 4 Data ')
print(data.head())
print(' Print additional details of the sample set: ')
print(data.info())
print('============================= Start processing: ==============================')
newData = data.loc[:, ~data.columns.str.contains('^Unnamed')]
print(newData.head())
newData.to_csv('myVecData.csv', index=False)

Don't forget index = False, or a new column of this unpleasant thing will be generated. Column processing is also like 1, with parameters column=False, which will not be repeated here.

Final effect:


============================= Start processing: ==============================
    ip   0   1 ...  766  767 class
0 google.com 0.282674 -0.359200 ... 0.376452 0.148091  0
1 facebook.com 0.542586 -0.390693 ... -0.044634 -0.180167  0
2 youtube.com 0.598675 -0.679748 ... 0.172028 0.002102  0
3  yahoo.com 0.212740 -0.823602 ... 0.286067 -0.269647  0
4  baidu.com 0.017386 -0.355357 ... 0.034892 0.445554  0

Added: "pandas" pandas generates 1 Unnamed column each time a row is appended using append

pandas 1 extra Unnamed column every time you use append to append a row!

Solution:

Before the row data is appended, the read_csv function adds the index_col parameter when reading the data, specifying which 1 is the index row.

Such as:


test = pd.read_csv(filename,index_col=0)