Python implements a simple way to divide a large file into smaller files by paragraph

  • 2020-05-30 20:29:52
  • OfStack

This article illustrates the simple operation of the Python implementation to divide a large file into several smaller files by paragraph. I will share it with you for your reference as follows:

Today, I helped my classmates to deal with 1 corpus. The corpus file was a little big, and was marked by two consecutive newline characters as paragraphs. He wanted to divide it into several small files according to paragraphs, that is, every 3 paragraphs formed a new file. Since I have not encountered similar operation before, I have found some similar methods on the Internet, which all seem a little complicated. So after trying, I wrote a piece of code, which solved the problem perfectly.

The basic idea is to read the contents of the original file first, and use regular expressions to slice according to \n\n. The result is 1 list, in which each list element stores the contents of 1 slice. Then create a handle to write to the file; Next traversal section list, sliced and written to the current content, determine whether have already written three paragraphs, if not, then continue to read and write down a slice, if you have enough for three, write a file handle is closed before, in a different file name to create a new file handle written, end of the cycle, waiting to read and write down a slice.


# -*- coding:utf8 -*-
import re;
p=re.compile('\n\n',re.S);
fileContent=open('files/ The office .txt','r',encoding='utf8').read();# Read the file 
paraList=p.split(fileContent) # Slice the text according to the newline character 
fileWriter=open('files/0.txt','a',encoding='utf8');# create 1 Handle to write to a file 
for paraIndex in range(len(paraList)):# Traverse the sliced text list 
  fileWriter.write(paraList[paraIndex]);# First, put the first in the list 1 Two elements are written to a file 
  if((paraIndex+1)%3==0):# Judge if you have enough 3 Slices, if that's enough 
    fileWriter.close(); # Close the current handle 
    fileWriter=open('files/'+str((paraIndex+1)/3)+'.txt','a',encoding='utf8'); # To recreate the 1 Two new handles, waiting to be written 1 Slice elements. Note the handling of file names here. 
fileWriter.close();# Close the last write file handle you created 
print('finished');

More about Python related topics: interested readers to view this site "Python file and directory skills summary", "Python skills summary text file", "Python URL skills summary", "Python pictures skills summary", "Python data structure and algorithm tutorial", "Python Socket programming skills summary", "Python function using skills summary", "Python string skills summary" and "Python introductory and advanced tutorial"

I hope this article has been helpful to you in Python programming.


Related articles: