Understand all kinds of Sao operations of python reading and writing csv xml json files in one second

  • 2021-07-09 08:50:39
  • OfStack

Python's superior flexibility and ease of use make it one of the most popular programming languages, especially for data scientists. This is largely because working with large datasets using Python is a straightforward task.

Nowadays, every technology company is formulating a data strategy. They all realize that having the right data (clean, as much as possible) will give them a key competitive advantage. Data, if used effectively, can provide deep-seated information hidden under the surface.

The possible formats for data storage have increased significantly over the years, but CSV, JSON, and XML still dominate in everyday use. In this article, I'll share with you the easiest way to use and convert these three popular data formats in Python!

CSV data

CSV files are the most common way to store data, and you will find that most of the data in the Kaggle competition is stored in this way. We can use the csv library built into Python to read and write CSV files. Usually, we read data into a list, and each element in the list is a list, representing one row of data.

Looking at the following code, when we run csv. reader (), we can access the CSV data file we specified. The csvreader. next () function reads a line from CSV, and each time it is called, it moves to the next line. We can also use the for loop through for row in csvreader to iterate through every row of csv. In addition, it is best to ensure that the number of columns per 1 row is the same, otherwise, you may encounter 1 errors when processing the list.


import csv filename = "my_data.csv" fields = [] rows = [] #  Read csv Documents  with open(filename, 'r') as csvfile: #  Create 1 A csv reader Object  csvreader = csv.reader(csvfile) #  From the file 1 Read the attribute name information from the row  # fields = next(csvreader) python3.2  The above versions use the  fields = csvreader.next() #  Then 1 Row 1 Row read data  for row in csvreader: rows.append(row) #  Before printing 5 Row information  for row in rows[:5]: print(row)

Writing data from Python to CSV is also easy, setting the attribute name in a separate list and storing the data to be written in one list. This time, we'll create an writer () object and use it to write the data to the file, much like we read the data.


import csv #  Attribute name  fields = ['Name', 'Goals', 'Assists', 'Shots'] # csv Each of the files 1 Row data, 1 Behavior 1 List  rows = [ ['Emily', '12', '18', '112'], ['Katie', '8', '24', '96'], ['John', '16', '9', '101'], ['Mike', '3', '14', '82']] filename = "soccer.csv" #  Write data to the csv In a file  with open(filename, 'w+') as csvfile: #  Create 1 A csv writer Object  csvwriter = csv.writer(csvfile) #  Write attribute name  csvwriter.writerow(fields) #  Write data  csvwriter.writerows(rows)

Of course, using the powerful pandas library will make it much easier to process data. It only takes 1 line of code to read and write files from CSV!


import pandas as pd filename = "my_data.csv" #  Read csv File data  data = pd.read_csv(filename) #  Before printing 5 Row  print(data.head(5)) #  Write data to the csv In a file  data.to_csv("new_data.csv", sep=",", index=False)

We can even use pandas to quickly convert CSV into a dictionary list in one line of code. After converting it to a dictionary list, we can use the dicttoxml library to convert it to XML format, and we can also save it as an JSON file!


import pandas as pd from dicttoxml import dicttoxml import json #  Create 1 A DataFrame data = {'Name': ['Emily', 'Katie', 'John', 'Mike'], 'Goals': [12, 8, 16, 3], 'Assists': [18, 24, 9, 14], 'Shots': [112, 96, 101, 82] } df = pd.DataFrame(data, columns=data.keys()) #  Will DataFrame Convert to 1 Dictionaries and store it in the json In a file  data_dict = df.to_dict(orient="records") with open('output.json', "w+") as f: json.dump(data_dict, f, indent=4) #  Will DataFrame Convert to 1 Dictionaries and store it in the xml In a file  xml_data = dicttoxml(data_dict).decode() with open("output.xml", "w+") as f: f.write(xml_data)

JSON data

JSON provides a clean and easy-to-read format because it maintains a dictionary-style structure. Like the CSV1, the Python has a built-in json module that makes reading and writing super easy! From the above example, we can see that when we read CSV, we can store the data in the form of a dictionary, and then write the dictionary to a file.


import json import pandas as pd #  Use json Module from json Read data from a file  #  Store as a dictionary  with open('data.json') as f: data_listofdict = json.load(f) #  You can also use it directly pandas Direct read json Documents  data_df = pd.read_json('data.json', orient='records') #  Save dictionary data as json Documents  #  And use the  'indent' and 'sort_keys'  Formatting json Documents  with open('new_data.json', 'w+') as json_file: json.dump(data_listofdict, json_file, indent=4, sort_keys=True) #  You can also use the pandas Save the data of the dictionary structure as json Documents  export = data_df.to_json('new_data.json', orient='records')

As we saw earlier, we can easily store our data as an CSV file through pandas or using the built-in csv module of Python, while we use the dicttoxml library when converting to XML.


import json import pandas as pd import csv #  From json Read data from a file  #  Data is stored in 1 In the dictionary list  with open('data.json') as f: data_listofdict = json.load(f) #  Write to the dictionary in the list csv In a file  keys = data_listofdict[0].keys() with open('saved_data.csv', 'w') as output_file: dict_writer = csv.DictWriter(output_file, keys) dict_writer.writeheader() dict_writer.writerows(data_listofdict)

XML data

XML is somewhat different from CSV and JSON. Generally, CSV and JSON are widely used because of their simplicity. They are simple and quick to read, write, and interpret without extra work, and parsing JSON or CSV is very lightweight.

On the other hand, XML tends to have a larger amount of data. If you are sending more data, it means you need more bandwidth, more storage space and more running time. However, compared with JSON and CSV, XML does have one additional feature: you can use namespaces to build and share standard structures, better inheritance representation, and industry-standardized methods of representing data with XML, schema, DTD, etc.

To read XML data, we will use ElementTree, a submodule of the XML module built into Python. Here, we can use the xmltodict library to convert an ElementTree object into a dictionary. Once we have a dictionary, we can change the dictionary into DataFrame of CSV, JSON or pandas as above!


import xml.etree.ElementTree as ET import xmltodict import json tree = ET.parse('output.xml') xml_data = tree.getroot() xmlstr = ET.tostring(xml_data, encoding='utf8', method='xml') data_dict = dict(xmltodict.parse(xmlstr)) print(data_dict) with open('new_data_2.json', 'w+') as json_file: json.dump(data_dict, json_file, indent=4, sort_keys=True)

Summarize


Related articles: