Detail of split large file segmentation and cat merge files in Linux

  • 2020-06-12 11:36:40
  • OfStack

preface

When a large amount of data needs to be uploaded to the server or a large log file needs to be downloaded from the server, the transfer is often interrupted due to network or other reasons and has to be retransmitted. In this case, you can split large files into small files and transfer them in batches, then merge the files after the transfer.

1. Split files

File segmentation can use split command, which supports both text file segmentation and binary file segmentation. Merging files can be done using the cat command.

1.1 Text file segmentation

Text files can be split by file size or by the number of lines of text.

Split by file size

When splitting a file according to its size, the -C parameter is required to specify the size of the split file:


$ split -C 100M large_file.txt stxt

As shown above, we split the large file large_file.txt according to the size of 100M, with the prefix stxt specified. When no prefix is specified, split automatically names the split file, usually beginning with x.

According to the line integral

Text files can also be split in action units. The file size is ignored when split in lines, and the number of lines of the split file is specified with the -ES34en parameter:


$ split -l 1000 large_file.txt stxt

1.2 Binary file segmentation

Base 2 file segmentation is similar to splitting text files by size, except that the file size is specified by the -ES41en parameter:


$ split -b 100M data.bak sdata

2. File merge

File merges use the cat command. Files split in the above ways can be merged using the cat command.

cat command merges split files:


$ cat stxt* > new_file.txt

3. Command format

3.1 split command description

The split command has the following format:

split [option]... [File to be cut [Output file prefix]]

The command parameter

-ES72en, -- suffix-ES74en =N Use suffixes of length N (default 2)

-ES80en, --bytes=SIZE sets the size of the output file. Supported by: m,k

-ES88en, -- ES89en-ES90en =SIZE sets the maximum number of lines in the output file. Similar to -ES92en, but tries to maintain the integrity of each line

Use numeric suffixes in place of letters

-ES102en, --lines= NUMBER of lines of NUMBER device output file

--help displays version information

--version output version information

3.2 cat command description

cat is one of the commands used more frequently under Linux. The command details:

cat connect file and print to standard output device

Common usage scenarios for the cat command are:

Display file contents:


$ cat filename

Create 1 empty file:


$ cat > filename

File merge:


$ cat file1 file2 > file

conclusion


Related articles: