Simple methods for pandas.DataFrame (create index add and delete)

  • 2020-05-26 09:38:36
  • OfStack

preface

I've been doing a lot of research on the Internet lately pandas.DataFrame It took me quite a long time to adjust DataFrame. I will make a summary here, convenient for you and me. Let's have a look if you are interested.

1. Simple operation to create DataFrame:

1. According to the dictionary:


In [1]: import pandas as pd
In [3]: aa={'one':[1,2,3],'two':[2,3,4],'three':[3,4,5]}
In [4]: bb=pd.DataFrame(aa)
In [5]: bb
Out[5]: 
 one three two
0 1 3 2
1 2 4 3
2 3 5 4`

keys in the dictionary is columns in DataFrame, but there is no value of index, so you need to set it yourself. If you don't set it, the default is to count from zero.


bb=pd.DataFrame(aa,index=['first','second','third'])
bb
Out[7]: 
 one three two
first 1 3 2
second 2 4 3
third 3 5 4

2. Create from a multidimensional array


import numpy as np
In [9]: del aa
In [10]: aa=np.array([[1,2,3],[4,5,6],[7,8,9]])
In [11]: aa
Out[11]: 
array([[1, 2, 3],
 [4, 5, 6],
 [7, 8, 9]])
In [12]: bb=pd.DataFrame(aa)
In [13]: bb
Out[13]: 
 0 1 2
0 1 2 3
1 4 5 6
2 7 8 9

Creating from a multidimensional array requires assigning DataFrame values columns and index, otherwise it is the default and ugly.


bb=pd.DataFrame(aa,index=[22,33,44],columns=['one','two','three'])
In [15]: bb
Out[15]: 
 one two three
22 1 2 3
33 4 5 6
44 7 8 9

3. Create with other DataFrame


bb=pd.DataFrame(aa,index=[22,33,44],columns=['one','two','three'])
bb
Out[15]: 
 one two three
22 1 2 3
33 4 5 6
44 7 8 9
cc=bb[['one','three']].copy()
Cc
Out[17]: 
 one three
22 1 3
33 4 6
44 7 9

The copy here is a deep copy, and changing the value in cc does not change the value in bb.


cc['three'][22]=5
bb
Out[19]: 
 one two three
22 1 2 3
33 4 5 6
44 7 8 9

cc
Out[20]: 
 one three
22 1 5
33 4 6
44 7 9

2. Index operation of DataFrame:

Indexes are the most annoying and error-prone for an DataFrame.

1, index 1 column or several columns, relatively simple:


bb['one']
Out[21]: 
22 1
33 4
44 7
Name: one, dtype: int32

Multiple column names need to be stored in a single list as a variable of collerable, otherwise an error will be reported.


bb[['one','three']]
Out[29]: 
 one three
22 1 3
33 4 6
44 7 9

2. Index 1 record or several records:


bb[1:3]
Out[27]: 
 one two three
33 4 5 6
44 7 8 9
bb[:1]
Out[28]: 
 one two three
22 1 2 3

Note here that you must have a colon, otherwise it will be an index column.

3. Index some records of some variables in some columns, which tortured me for a long time:

1 kind


bb.loc[[22,33]][['one','three']]
Out[30]: 
 one three
22 1 3
33 4 6

You can't change the values in here, you can just read the values, you can't write the values, maybe sum loc() Function:


bb=pd.DataFrame(aa,index=['first','second','third'])
bb
Out[7]: 
 one three two
first 1 3 2
second 2 4 3
third 3 5 4
0

The second kind: also can only see


bb=pd.DataFrame(aa,index=['first','second','third'])
bb
Out[7]: 
 one three two
first 1 3 2
second 2 4 3
third 3 5 4
1

You want to change the value and you get an error.


bb=pd.DataFrame(aa,index=['first','second','third'])
bb
Out[7]: 
 one three two
first 1 3 2
second 2 4 3
third 3 5 4
2

3: can change the value of the data!!

Iloc is indexed by the number of rows and columns in the data, excluding index and columns


bb.iloc[2:3,2:3]
Out[36]: 
 three
44 9

bb.iloc[1:3,1:3]
Out[37]: 
 two three
33 5 6
44 8 9
bb.iloc[0,0]
Out[38]: 1

Here's the proof:


bb=pd.DataFrame(aa,index=['first','second','third'])
bb
Out[7]: 
 one three two
first 1 3 2
second 2 4 3
third 3 5 4
4

3. Create a new columns or several columns on top of the existing DataFrame

1. You can only create 1 column without using anything. Multiple columns are not easy to use and the test is invalid:


bb=pd.DataFrame(aa,index=['first','second','third'])
bb
Out[7]: 
 one three two
first 1 3 2
second 2 4 3
third 3 5 4
5

The assigned list is basically assigned in the order of the index values, but in general, we want to assign the corresponding index values. If we want more advanced values, we will look at the following ones.

2. Use the dictionary to assign multiple columns according to index:


bb=pd.DataFrame(aa,index=['first','second','third'])
bb
Out[7]: 
 one three two
first 1 3 2
second 2 4 3
third 3 5 4
6

Here aa is the nesting of a dictionary and a list, equivalent to a record, using keys as the index name instead of the default columns name like 1. The goal of multiple column matching by index is achieved. Due to the dict() Storage is chaotic between USES dict() It's worth noting that not assigning index to him will cause the record to go off.

4. Delete multiple columns or records:

Delete the column


bb=pd.DataFrame(aa,index=['first','second','third'])
bb
Out[7]: 
 one three two
first 1 3 2
second 2 4 3
third 3 5 4
7

Delete records


bb.drop([22,33],axis=0)
Out[61]: 
 one two three new hi hello ok
44 9 9 9 4 657 77 77

python, pandas, DataFrame, python, pandas, DataFrame, python, pandas, python

Many functions of DataFrame have not been covered yet, and they will be covered in the future. After reading API on the official website, we will continue to share everything is ok.

conclusion


Related articles: