Detailed tutorial on reading table data and splicing single row data using pandas
- 2021-09-12 01:36:12
- OfStack
Business requirements
An Excel table with several hundred thousand pieces of data now needs to splice all the data in one column into one string, such as the following short rows of table data:
id | code | price | num |
---|---|---|---|
11 | 22 | 33 | 44 |
22 | 33 | 44 | 55 |
33 | 44 | 55 | 66 |
44 | 55 | 66 | 77 |
55 | 66 | 77 | 88 |
66 | 77 | 88 | 99 |
Now we need to splice this 1 column of code into a string with commas, and each cell data is enclosed in single quotation marks, which needs to be spliced into strings' 22 ',' 33 ',' 44 ',' 55 ',' 66 'and' 77 '. What do we need to do with this situation? Of course, there are many ways …
Multi-line text batch processing
Sometimes, we will encounter the situation that we need to deal with multiple lines of text at the same time. Many text editors support batch operation of multiple lines of text. Here I mainly talk about Sublime Text. The following are shortcut keys for operation. If necessary, you can try to use it once, which is really convenient.
Select multiple rows to be operated on, and press Ctr+Shift+L to edit these rows at the same time Select the text with the mouse, and press CTRL+D repeatedly to continue to select the next 1 same text to edit at the same time Select the text with the mouse and press Alt+F3 to select all the same text at one time and edit it at the same time
How to save efficiency
At work, There may be 1 case of table data processing, For example, the operation gives you a form, There is something similar in the table: Order number, product ID, product SKU, etc., need your assistance to export the detailed data in these data for their analysis. Once or twice, we can quickly deal with it in the above way, but this way may have stuck in the processing of large texts, and the operation efficiency is low. If the text is small, it is very convenient.
If you encounter this situation many times, do you want to make a tool to process it quickly? That is, we can write a small tool to realize this batch splicing of data in the same format, which is fast and convenient, and can greatly reduce repeated work consumption.
pandas reads the table data and processes it
We use pandas module of Python to read the data of a certain column specified in the table, and then carry out circular processing according to our splicing format, and finally write the spliced string into the text file, which is convenient to retain and use the spliced data.
sheet = pandas.read_excel(io=file_name, usecols=[line_num])
data = sheet.values.tolist()
str_data = ''
# Cyclic processing of data
print_msg(' Number of columns of data obtained [' + str(len(data)) + '] , start processing data... ')
for x in range(len(data)):
if str(data[x][0]) != 'nan':
str_data += "'" + str(data[x][0]) + "',"
Complete source code
Because scripts need to be used multiple times, And for different columns of different files, Therefore, we adopt the form of accepting key parameters, and can directly use this script to complete our data splicing without changing any code. At the same time, we can also use pyinstaller module to package the script into window executable file of exe, so that it can also be used in the running environment without Python. The packaging command is:
pyinstaller -F -i favicon.ico join_excel_data.py
I have a package uploaded to the dating site Github, if you are interested, you can click to view oh, make a friend address: github.com/gxcuizy
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Splice Excel Table single row data and write text
author: gxcuizy
time: 2021-03-01
"""
import pandas
import random
import os
import time
def print_msg(msg=''):
""" Print information """
now_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
print('[' + now_time + '] ' + msg)
# Program main entry
if __name__ == "__main__":
# Get the passed-in parameter
file_name = input(' Please enter the table file name in the current directory (for example, " 01.xlsx "): ')
line_num = input(' Please enter which column of data to assemble (for example, " 1 "): ')
# Determine whether the file exists
if os.path.exists(file_name) == False:
print_msg(' File does not exist ')
os.system("pause")
exit(0)
# Determining whether the number of inputted rows is a number
if line_num.isdigit() == False:
print_msg(' Please enter a number for the number of columns ')
os.system("pause")
exit(0)
try:
# Get tabular data
print_msg(' Start fetching files [' + file_name + '] The first part of [' + str(line_num) + '] Column data ')
line_num = int(line_num) - 1
sheet = pandas.read_excel(io=file_name, usecols=[line_num])
data = sheet.values.tolist()
str_data = ''
# Cyclic processing of data
print_msg(' Number of columns of data obtained [' + str(len(data)) + '] , start processing data... ')
for x in range(len(data)):
if str(data[x][0]) != 'nan':
str_data += "'" + str(data[x][0]) + "',"
# Write to a text file
print_msg(' After data processing, start writing... ')
random_num = random.randint(1000, 9999)
with open('str_' + str(random_num) + '.txt', 'w') as f:
f.write(str_data.strip(','))
print_msg(' Data write complete .')
except Exception as err_info:
# Exception information
print_msg(str(err_info))
# Prevent exe Flash back at the end of program execution
os.system("pause")