How do I write data to an CSV file with Python

  • 2020-11-03 22:31:10
  • OfStack

preface

We crawl the data from the web, and the last step is to think about how to store the data. If the amount of data is small, it is often not selected to be stored in the database, but in files, such as text files, CSV files, xls files, and so on. Because the document is easy to carry, direct access.

As the glue language, Python is no problem. However, in the process of writing data, errors are often reported due to Chinese characters in the data source. The most hair-raising coding problem.

Let me tell you a little bit about coding. There are many encoding methods: UTF-8, GBK, ASCII, etc.

ASCII is a set of character codes developed in the United States in the 1960s. It mainly regulates the relationship between English characters and base 2 bits. English vocabulary is simple and consists of 26 letters. One byte is enough to represent one alphabetic symbol. Plus various symbols, the use of 128 characters to meet the coding requirements.

Different countries have different languages. Also, the number of text components is much larger than the number of English letters. According to incomplete statistics, the number of Chinese characters is approximately 100,000, and there are 3,000 characters in daily use. Obviously, the ASCII encoding is not sufficient. Therefore, GBK encoding is adopted for Chinese characters, and two bytes are used to represent one Chinese character. The code in Simplified Chinese is GBK2312.

So what's the code for ES23en-8? Let's start with Unicode. Unicode is intended to unify the various codes. Because each country has its own way of coding. If one encoding is used, the other encoding is used for decoding. This can lead to messy code situations. Unicode, however, is just a symbol set that specifies the code base 2 of the symbol, not how the code base 2 should be stored. UTF-8 is one of the most widely used Unicode implementations on the Internet.

Therefore, if we want to write data to a file, it is better to specify the encoding as UTF-8.

One of the Python libraries, called csv, is dedicated to reading and writing from csv.

The Python csv module encapsulates common functions. Simple examples of its use are as follows:


#  read csv file 
import csv
with open('some.csv', 'rb') as f: #  using b "Can save a lot of problems 
reader = csv.reader(f)
for row in reader:
# do something with row, such as row[0],row[1]


import csv
with open('some.csv', 'wb') as f: #  using b "Can save a lot of problems 
writer = csv.writer(f)
writer.writerows(someiterable)

Specific examples are as follows:


import csv
import codecs
# codecs  Is a natural language coding conversion module 

fileName = 'PythonBook.csv'

#  Specify the encoding as  utf-8,  Avoid writing  csv  Chinese characters appear in the file 
with codecs.open(fileName, 'w', 'utf-8') as csvfile:
 #  The specified  csv  The header of the file displays items 
 filednames = [' Title: ', ' The author ']
 writer = csv.DictWriter(csvfile, fieldnames=filednames)

 books = []
 book = {
 'title': ' The Legendary Swordsman ',
 'author': ' Jin yong ',
 }
 books.append(book)

 writer.writeheader()
 for book in books:
 try:
 writer.writerow({' Title: ':book['title'], ' The author ':book['author']})
 except UnicodeEncodeError:
 print(" Coding errors ,  The data could not be written to a file ,  Simply ignore the data ")

This approach is to write the data line by line into the CSV file, so it is less efficient. If you want to batch write data to an CSV file, you need the pandas library.

pandas is a third repository, so you need to install it before you can use it. pip is the easiest and most convenient way to install.
pip install pandas

Use pandas to batch write data as follows:


import pandas as pd

fileName = 'PythonBook.csv'
number = 1

books = []
book = {
 'title': ' The Legendary Swordsman ',
 'author': ' Jin yong ',
}
#  if  book  If I have enough of them, pandas  It writes to the file every time  50  The data. 
books.append(book)

data = pd.DataFrame(books)
#  write csv file ,'a+' It's an append mode 
try:
 if number == 1:
 csv_headers = [' Title: ', ' The author ']
 data.to_csv(fileName, header=csv_headers, index=False, mode='a+', encoding='utf-8')
 else:
 data.to_csv('fileName, header=False, index=False, mode='a+', encoding='utf-8')
 number = number + 1
except UnicodeEncodeError:
 print(" Coding errors ,  The data could not be written to a file ,  Simply ignore the data ")

conclusion


Related articles: