Solve linux zip file unzipping problem

  • 2020-06-03 09:05:57
  • OfStack

why

Since the zip format did not specify the encoding format, the encoding in the zip files generated under Windows was GBK/GB2312, etc. Therefore, the unzipping of these zip files under Linux caused the problem of scrambled code, because the default encoding under Linux is UTF8.

Solution 1

Use 7z to decompress.

Install p7zip and convmv


# fedora
$ su -c 'yum install p7zip convmv'
# ubuntu
$ sudo apt-get install p7zip convmv

Execute 1 command to uncompress


#  use 7z unzip 
$ LANG=C 7za x your-zip-file.zip
#  Recursive transcoding 
$ convmv -f GBK -t utf8 --notest -r .

Solution 2

The compressed file on windows is the default system encoding Chinese to compress the file. Since the code is not declared in the zip file, unzip1 on linux is generally unzipped with the default code, and the Chinese file name is confused.

Although it was reported as bug in 2005, the official website of ES39en-ES40en does not include the AUTO-ID code, so perhaps they do not see this as a problem. Sun USES the same treatment for the zip coding problem in java in N.

There are two ways to solve the problem:

1. Unzip and specify character set through unzip line command

unzip -O CP936 xxx.zip (using GBK, GB18030 is also ok)

It is interesting to note that unzip's manual does not have a description of this option. unzip --help has a simple 1 line description of this parameter.

2. In the environment variable, specify the unzip parameter, and always display and extract files with the specified character set

Add 2 rows to /etc/environment


UNZIP="-O CP936"
ZIPINFO="-O CP936"

In this way, the Gnome desktop archive manager (ES80en-ES81en) can unzip Chinese using unzip, but ES83en-ES84en itself cannot set the code to be passed to unzip.


Related articles: