Six Common Ways of Reading and Writing excel Data Files in python Summary of of
- 2021-10-27 07:56:14
- OfStack
3. Use numpy libraries (loadtxt, load, fromfile) 4. Use pandas libraries (read_csv, read_excel, etc.)
5. Read and write excel files (xlrd, xlwt, openpyxl, etc.)
6. Operational databases (pymysql, cx_Oracle, etc.)
What ways can python read data files?
1. python built-in methods (read, readline, readlines)
read (): Reads the entire file content once. The read (size) method is recommended, and the larger the size, the longer the runtime readline (): Read 1 line at a time. Use when there is not enough memory, 1 is not used very much readlines (): Read the whole file once and return to list in line, which is convenient for us to traverse
2. Built-in Module (csv)
python has built-in csv module for reading and writing csv files. csv is a comma separator file, which is one of the most common data storage formats in data science.
csv module can easily complete the reading and writing operations of various volume data. Of course, large data volume needs optimization at the code level.
csv module reads files
# Read csv Documents
import csv
with open('test.csv','r') as myFile:
lines=csv.reader(myFile)
for line in lines:
print (line)
csv module writes to file
import csv
with open('test.csv','w+') as myFile:
myWriter=csv.writer(myFile)
# writerrow1 Row 1 Row writing
myWriter.writerow([7,8,9])
myWriter.writerow([8,'h','f'])
# writerow Multiline write
myList=[[1,2,3],[4,5,6]]
myWriter.writerows(myList)
3. Use the numpy library (loadtxt, load, fromfile)
loadtxt method
loadtxt is used to read text files (including txt, csv, etc.) and compressed files in. gz or. bz2 format, provided that the file data must have the same number of values every 1 line.
import numpy as np
# loadtxt() In dtype Parameter is set by default to float
# Here set to str String for easy display
np.loadtxt('test.csv',dtype=str)
# out : array(['1,2,3', '4,5,6', '7,8,9'], dtype='<U5')
load method
load is used to read. npy,. npz, or pickled persistence files dedicated to numpy.
import numpy as np
# Mr. Cheng npy Documents
np.save('test.npy', np.array([[1, 2, 3], [4, 5, 6]]))
# Use load Loading npy Documents
np.load('test.npy')
'''
out:array([[1, 2, 3],
[4, 5, 6]])
'''
fromfile method
The fromfile method can read simple text data or binary data, and the data comes from binary data saved by tofile method. When reading data, the user needs to specify the element type and modify the shape of the array appropriately.
import numpy as np
x = np.arange(9).reshape(3,3)
x.tofile('test.bin')
np.fromfile('test.bin',dtype=np.int)
# out:array([0, 1, 2, 3, 4, 5, 6, 7, 8])
4. Use the pandas library (read_csv, read_excel, etc.)
pandas is one of the most commonly used analysis libraries for data processing. It can read data files in various formats and output dataframe format.
Such as: txt, csv, excel, json, clipboard, database, html, hdf, parquet, pickled file, sas, stata, etc
read_csv method
The read_csv method is used to read the csv format file and output the dataframe format.
import pandas as pd
pd.read_csv('test.csv')
read_excel method
Read excel files, including xlsx, xls, xlsm formats
import pandas as pd
pd.read_excel('test.xlsx')
read_table method
Read any text file by controlling sep parameters (separators)
read_json method
Read json format file
df = pd.DataFrame([['a', 'b'], ['c', 'd']],index=['row 1', 'row 2'],columns=['col 1', 'col 2'])
j = df.to_json(orient='split')
pd.read_json(j,orient='split')
read_html method
Read html form
read_clipboard method
Read the contents of the clipboard
read_pickle method
Read plckled persistence file
read_sql method
Read the database data, connect the database, and pass in sql statement
read_dhf method
Read hdf5 file, suitable for large file reading
read_parquet method
Read the parquet file
read_sas method
Read the sas file
read_stata method
Read the stata file
read_gbq method
Read google bigquery data
5. Read and write excel files (xlrd, xlwt, openpyxl, etc.)
There are many libraries for python to read and write excel files, including xlrd, xlwt, openpyxl, xlwings and so on.
Main modules:
xlrd Library: Read data from excel, support xls, xlsx xlwt Library: Modify excel. Modification of xlsx format is not supported xlutils Library: Modify 1 existing file in xlw and xlrd openpyxl: Mainly for reading and editing excel in xlsx format xlwings: Read, write and modify xlsx, xls and xlsm format files xlsxwriter: Used to generate excel tables, insert data, insert icons and other table operations, does not support reading Microsoft Excel API: pywin32 needs to be installed, communicates directly with Excel process, and can do anything that can be done in Excel, but it is slow
6. Operational databases (pymysql, cx_Oracle, etc.)
python almost supports the interaction of all databases. After connecting to the database, you can use sql statements for addition, deletion and modification.
Main modules: