Detailed tutorial on reading table data and splicing single row data using pandas

2021-09-12 01:36:12
OfStack

Business requirements

An Excel table with several hundred thousand pieces of data now needs to splice all the data in one column into one string, such as the following short rows of table data:

id	code	price	num
11	22	33	44
22	33	44	55
33	44	55	66
44	55	66	77
55	66	77	88
66	77	88	99

Now we need to splice this 1 column of code into a string with commas, and each cell data is enclosed in single quotation marks, which needs to be spliced into strings' 22 ',' 33 ',' 44 ',' 55 ',' 66 'and' 77 '. What do we need to do with this situation? Of course, there are many ways …

Multi-line text batch processing

Sometimes, we will encounter the situation that we need to deal with multiple lines of text at the same time. Many text editors support batch operation of multiple lines of text. Here I mainly talk about Sublime Text. The following are shortcut keys for operation. If necessary, you can try to use it once, which is really convenient.

Select multiple rows to be operated on, and press Ctr+Shift+L to edit these rows at the same time Select the text with the mouse, and press CTRL+D repeatedly to continue to select the next 1 same text to edit at the same time Select the text with the mouse and press Alt+F3 to select all the same text at one time and edit it at the same time

How to save efficiency

At work, There may be 1 case of table data processing, For example, the operation gives you a form, There is something similar in the table: Order number, product ID, product SKU, etc., need your assistance to export the detailed data in these data for their analysis. Once or twice, we can quickly deal with it in the above way, but this way may have stuck in the processing of large texts, and the operation efficiency is low. If the text is small, it is very convenient.
If you encounter this situation many times, do you want to make a tool to process it quickly? That is, we can write a small tool to realize this batch splicing of data in the same format, which is fast and convenient, and can greatly reduce repeated work consumption.

pandas reads the table data and processes it

We use pandas module of Python to read the data of a certain column specified in the table, and then carry out circular processing according to our splicing format, and finally write the spliced string into the text file, which is convenient to retain and use the spliced data.


sheet = pandas.read_excel(io=file_name, usecols=[line_num])
data = sheet.values.tolist()
str_data = ''
#  Cyclic processing of data 
print_msg(' Number of columns of data obtained [' + str(len(data)) + '] , start processing data... ')
for x in range(len(data)):
  if str(data[x][0]) != 'nan':
    str_data += "'" + str(data[x][0]) + "',"

Complete source code

Because scripts need to be used multiple times, And for different columns of different files, Therefore, we adopt the form of accepting key parameters, and can directly use this script to complete our data splicing without changing any code. At the same time, we can also use pyinstaller module to package the script into window executable file of exe, so that it can also be used in the running environment without Python. The packaging command is: pyinstaller -F -i favicon.ico join_excel_data.py I have a package uploaded to the dating site Github, if you are interested, you can click to view oh, make a friend address: github.com/gxcuizy


#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
 Splice Excel Table single row data and write text 
author: gxcuizy
time: 2021-03-01
"""

import pandas
import random
import os
import time


def print_msg(msg=''):
  """ Print information """
  now_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
  print('[' + now_time + '] ' + msg)


#  Program main entry 
if __name__ == "__main__":
  #  Get the passed-in parameter 
  file_name = input(' Please enter the table file name in the current directory (for example, " 01.xlsx "): ')
  line_num = input(' Please enter which column of data to assemble (for example, " 1 "): ')
  #  Determine whether the file exists 
  if os.path.exists(file_name) == False:
    print_msg(' File does not exist ')
    os.system("pause")
    exit(0)
  #  Determining whether the number of inputted rows is a number 
  if line_num.isdigit() == False:
    print_msg(' Please enter a number for the number of columns ')
    os.system("pause")
    exit(0)
  try:
    #  Get tabular data 
    print_msg(' Start fetching files [' + file_name + '] The first part of [' + str(line_num) + '] Column data ')
    line_num = int(line_num) - 1
    sheet = pandas.read_excel(io=file_name, usecols=[line_num])
    data = sheet.values.tolist()
    str_data = ''
    #  Cyclic processing of data 
    print_msg(' Number of columns of data obtained [' + str(len(data)) + '] , start processing data... ')
    for x in range(len(data)):
      if str(data[x][0]) != 'nan':
        str_data += "'" + str(data[x][0]) + "',"
    #  Write to a text file 
    print_msg(' After data processing, start writing... ')
    random_num = random.randint(1000, 9999)
    with open('str_' + str(random_num) + '.txt', 'w') as f:
      f.write(str_data.strip(','))
    print_msg(' Data write complete .')
  except Exception as err_info:
    #  Exception information 
    print_msg(str(err_info))
  #  Prevent exe Flash back at the end of program execution 
  os.system("pause")