python read space separated file operation
- 2021-10-27 08:09:22
- OfStack
When searching for data sets, we find that not all data sets are stored in csv format, that is, every column of feature data is not divided by commas, and some data formats are divided by spaces
For example. data format,
Next, read the data in. data format:
(The data comes from Boston house price forecast data set, and the file name is "housing. data")
import pandas as pd
data = pd.read_csv('./housing.data', delim_whiteshape=True)
The above code is the realization of space segmentation, but do not know for the csv file is also stored in 1 column, and data segmentation is space this method is still feasible.
In addition, the parameters sep and delimiter in pd. read_csv () represent one kind of meaning, and how to use them is unclear, so they are not used much at present.
If you don't know how to read a file, you should visit the official document or google query at last.
pd.read_csv Official Document
In addition, you can directly add the suffix. csv in the original data set file can be converted into csv file, but the premise is that the data in data has been divided.
When reading files in pd.read_csv (), the default line 1 is the column name, but sometimes the line 1 is also the data we need. At this time, we need to formulate the parameter header=None, or set the name for each column in advance, names= ['column0', 'column1',...]
Supplement: python realizes reading and writing files separated by spaces and finding 2-dimensional arrays in half by column
Recently, the work of reading and writing files is more, and every time you read files, you will write a separate function to adapt to the file format, so you wrote a class to operate the files.
(Better to read files with pandas.read_csv)
import os
class DealData:
# Data loading function
def load(self, filename):
data = []
file = open(filename, 'r')
for line in file.readlines():
line = line.strip('\n') # Remove line breaks
line = line.split(' ') # File with " "Separation
if "" in line: # Solve the problem of spaces at the end of each line
line.remove("")
data.append(line)
file.close()
return data
# Fold half search # array Yes 1 A 2 Dimensional array, the function is implemented according to array The first part of lie Column halving search
def search(self, array, lie, target):
low = 0
high = len(array) - 1
while array[low][lie] <= array[high][lie]:
mid = int((low + high) / 2)
midval = array[mid][lie]
if midval < target:
low = mid + 1
elif midval > target:
high = mid - 1
if high < 0 or low >= len(array):
break
return high
# Save data to a file
def save(self, data, filename):
file = open(filename, 'w')
for i in range(0, len(data), 1):
for k in range(0, len(data[i]), 1):
file.write(str(data[i][k]))
file.write(" ")
file.write("\n")
file.close()
You can create a separate python file for the Dealdata class, named Dealdata, and call the following method:
from DealData import DealData
deal = DealData()
totaldata = deal.load("E:\low_data.txt")
The first reference class: from Dealdata import Dealdata, where the first Dealdata is the called python file name and the second Dealdata is the called class name.