python read space separated file operation

2021-10-27 08:09:22
OfStack

When searching for data sets, we find that not all data sets are stored in csv format, that is, every column of feature data is not divided by commas, and some data formats are divided by spaces

For example. data format,

Next, read the data in. data format:

(The data comes from Boston house price forecast data set, and the file name is "housing. data")


import pandas as pd
data = pd.read_csv('./housing.data', delim_whiteshape=True)

The above code is the realization of space segmentation, but do not know for the csv file is also stored in 1 column, and data segmentation is space this method is still feasible.

In addition, the parameters sep and delimiter in pd. read_csv () represent one kind of meaning, and how to use them is unclear, so they are not used much at present.

If you don't know how to read a file, you should visit the official document or google query at last.

pd.read_csv Official Document

In addition, you can directly add the suffix. csv in the original data set file can be converted into csv file, but the premise is that the data in data has been divided.

When reading files in pd.read_csv (), the default line 1 is the column name, but sometimes the line 1 is also the data we need. At this time, we need to formulate the parameter header=None, or set the name for each column in advance, names= ['column0', 'column1',...]

Supplement: python realizes reading and writing files separated by spaces and finding 2-dimensional arrays in half by column

Recently, the work of reading and writing files is more, and every time you read files, you will write a separate function to adapt to the file format, so you wrote a class to operate the files.

(Better to read files with pandas.read_csv)


import os
class DealData:
    #  Data loading function 
    def load(self, filename):
        data = []
        file = open(filename, 'r')
        for line in file.readlines():
            line = line.strip('\n')         #  Remove line breaks 
            line = line.split(' ')          #  File with "   "Separation 
            if "" in line:                  #  Solve the problem of spaces at the end of each line 
                line.remove("")
            data.append(line)
        file.close()
        return data
 
    #  Fold half search                                # array Yes 1 A 2 Dimensional array, the function is implemented according to array The first part of lie Column halving search 
    def search(self, array, lie, target):
        low = 0
        high = len(array) - 1
        while array[low][lie] <= array[high][lie]:
            mid = int((low + high) / 2)
            midval = array[mid][lie]
            if midval < target:
                low = mid + 1
            elif midval > target:
                high = mid - 1
            if high < 0 or low >= len(array):
                break
        return high
    #  Save data to a file                 
    def save(self, data, filename):
        file = open(filename, 'w')
        for i in range(0, len(data), 1):
            for k in range(0, len(data[i]), 1):
                file.write(str(data[i][k]))
                file.write(" ")
            file.write("\n")
        file.close()

You can create a separate python file for the Dealdata class, named Dealdata, and call the following method:


from DealData import DealData 
deal = DealData()
totaldata = deal.load("E:\low_data.txt")

The first reference class: from Dealdata import Dealdata, where the first Dealdata is the called python file name and the second Dealdata is the called class name.