Detailed Explanation of Five Methods of python Decompression and Compression Package

  • 2021-07-10 20:09:48
  • OfStack

Here we discuss using Python to extract, for example, the following five compressed files:

.gz .tar .tgz .zip .rar

Brief introduction

gz: gzip. Usually only one file can be compressed. Combined with tar, it can be packaged first and then compressed.

tar: Packaging tool under linux system. Just pack. Uncompressed

tgz: tar. gz. The file is packaged with tar and then compressed with gz

zip: Different from gzip. Although similar algorithms are used, multiple files can be packaged and compressed. Just compress the files separately. The compression ratio is lower than that of tar.

rar: Package the zip file. Originally used in DOS, it is based on window operating system.

The compression ratio is higher than that of zip, but the speed is slower. Random questions are also slow.

About various ratios between zip and rar. Visible:

http://www.comicer.com/stronghorse/water/software/ziprar.htm

gz

Because gz1 only compresses one file, it often works with other packaging tools. For example, tar can be packaged into XXX. tar first, and then compressed into XXX. tar. gz

Extracting gz is actually reading out the single 1 file. Python methods are as follows:


import gzip
import os
def un_gz(file_name):
 """ungz zip file"""
 f_name = file_name.replace(".gz", "")
 # Gets the name of the file, removing the 
 g_file = gzip.GzipFile(file_name)
 # Create gzip Object 
 open(f_name, "w+").write(g_file.read())
 #gzip Object with read() When opened, write to open() In the file created. 
 g_file.close()
 # Shut down gzip Object 

tar

After decompressing XXX. tar. gz, XXX. tar is obtained, and it needs to be decompressed in one step.

* Note: tgz is in the same format as tar. gz. The extension of the old version number DOS is up to 3 characters, so it is represented by tgz.

Because there are multiple files here, let's read all the file names first. Then decompress. For example, the following:


import tarfile
def un_tar(file_name):
  untar zip file"""
 tar = tarfile.open(file_name)
 names = tar.getnames()
 if os.path.isdir(file_name + "_files"):
  pass
 else:
  os.mkdir(file_name + "_files")
 # Because there are many files after decompression, the directory with the same name is established in advance 
 for name in names:
  tar.extract(name, file_name + "_files/")
 tar.close()

* Note: tgz files are extracted in the same way as tar files.

zip

Similar to tar, multiple file names are first read and then extracted. For example, the following:


import zipfile
def un_zip(file_name):
 """unzip zip file"""
 zip_file = zipfile.ZipFile(file_name)
 if os.path.isdir(file_name + "_files"):
  pass
 else:
  os.mkdir(file_name + "_files")
 for names in zip_file.namelist():
  zip_file.extract(names,file_name + "_files/")
 zip_file.close()

rar

Since rar is usually used under window, an additional Python package rarfile is required.

Available address: http://sourceforge.net/projects/rarfile.berlios/files/rarfile-2. 4. tar.gz/download

Unzip to the/Scripts/folder in the Python installation folder, open the command line on the current form,

Input Python setup.py install

Installation complete.


import rarfile
import os
def un_rar(file_name):
 """unrar zip file"""
 rar = rarfile.RarFile(file_name)
 if os.path.isdir(file_name + "_files"):
  pass
 else:
  os.mkdir(file_name + "_files")
 os.chdir(file_name + "_files"):
 rar.extractall()
 rar.close()

tar Packaging

When you add a file using tar. add (), you add the path of the file itself, and add arcname to add the file to the tar package according to your own naming rules

Packaging code:


#!/usr/bin/env /usr/local/bin/python 
 # encoding: utf-8 
 import tarfile 
 import os 
 import time 
 
 start = time.time() 
 tar=tarfile.open('/path/to/your.tar,'w') 
 for root,dir,files in os.walk('/path/to/dir/'): 
   for file in files: 
     fullpath=os.path.join(root,file) 
     tar.add(fullpath,arcname=file) 
 tar.close() 
 print time.time()-start 

You can set compression rules during packaging, such as packaging in gz compression format

tar=tarfile.open('/path/to/your.tar.gz','w:gz')

Other formats, such as the following table:

tarfile. open There are many kinds of mode:

mode action

tar Unpack

tar unpacking can also be decompressed according to different compression formats.


 #!/usr/bin/env /usr/local/bin/python 
 # encoding: utf-8 
 import tarfile 
 import time 
 
 start = time.time() 
 t = tarfile.open("/path/to/your.tar", "r:") 
 t.extractall(path = '/path/to/extractdir/') 
 t.close() 
 print time.time()-start 

The above code is all decompressed, and can also do different processing one by one, but it is assumed that there are too many files in tar package, so be careful of memory ~


 tar = tarfile.open(filename, 'r:gz') 
 for tar_info in tar: 
  file = tar.extractfile(tar_info) 
  do_something_with(file) 

ps: python Realizes rar File Extraction

1.pip 3 install rarfile  Install the rarfile library

(Note that decompression is not supported.)


#coding=utf-8
import rarfile
path = "E:\\New\\New.rar"
path2 = "E:\\New"
rf = rarfile.RarFile(path)       # File to be unzipped 
rf.extractall(path2)        # Unzip the specified file path  

Summarize


Related articles: