Python easily implements code encoding format conversion

  • 2020-04-02 14:42:39
  • OfStack

Recently change my job soon, not too much time to clean up the things in work, and most of the time are used to familiar with the business of the new company, familiar with their code framework, the mainest is still has a lot of new things to learn, I mainly do before PHP development background, come after me even here who took the front end of the well, and learning about c + +, ha ha, anyhow is very full, I come home from work every day can sleep soundly (one sentence summary, is to eat sweet, sweet sleep ~). Say again change the work time, at the beginning of the year officially graduated from half a year, feel technology is growing rapidly, the company is operating inside the programmer's status, so I want to change a job, interviewed three (two big, a little), has to offer, selected from a large company, of course, a comprehensive all aspects (salary, stem what, transportation, etc.) well, anyway, it's very smoothly in the (easier than graduated), ha ha, more effort, more fortunate, more lucky, more efforts! . Starting this week, keep your blog organized so you don't create a lazy habit.

First came to this company, familiar with the environment, the boss began to let me do the work of a migration, modify the code, I want to say is, this kind of work is boring ~ ~, look at the others in the code, change other people's code and change variables, where a file name here ・ ・ ・ ・ ・ ・ , is all some no technical content, tedious things, but by migration code or familiar with the environment, by the way. So much, talk about today's topic -- code coding format change, because of some reasons, the code needs to be moved from A machine room to B machine room, the two can not access each other, but the historical reason that the code of A machine room is utf8 code, B machine room is GBK code, see how to solve this.

Coding problem
First talk about why there are coding problem, for the example above, the B room full database here is GBK encoding, and hence to withdraw from the database data is GBK, come out of the database data is GBK code, want to be in show time not gibberish, in the case of wrong database of data transfer, it need to send the header set encoding to GBK, output files (HTML, the TPL, etc.) must be GBK, check out this photo will be more clear:

DB (GBK) = > PHP, etc. (the encoding format is not limited, but if there are Chinese characters in the code file, the file must be GBK code or converted to GBK when the Chinese characters are output) = > The header (GBK)   = > HTML, TPL (GBK)

Or there is a way to convert utf8 to GBK in code only when it's out of the library. Utf8 is generally more popular and less problematic

DB (GBK) = > PHP, etc. (utf8, and converts the data retrieved from the database to utf8) = > The header (utf8) = > HTML, the TPL (utf8)

As long as according to the above two standard coding format, there will be no chaos, at least I test the first way is ok, so I guess the second is ok, ok, now to write a small script to convert the file encoding format:


#!/usr/bin/python
# -*- coding: utf-8 -*-
#Filename:changeEncode.py
import os
import sys

def ChangeEncode(file,fromEncode,toEncode):
  try:
    f=open(file)
    s=f.read()
    f.close()
    u=s.decode(fromEncode)
    s=u.encode(toEncode)
    f=open(file,"w");
    f.write(s)
    return 0;
  except:
    return -1;

def Do(dirname,fromEncode,toEncode):
  for root,dirs,files in os.walk(dirname):
    for _file in files:
      _file=os.path.join(root,_file)
      if(ChangeEncode(_file,fromEncode,toEncode)!=0):
        print "[ Conversion failure :]"+_file
      else:
        print "[ Success: ]"+_file

def CheckParam(dirname,fromEncode,toEncode):
  encode=["UTF-8","GBK","gbk","utf-8"]
  if(not fromEncode in encode or not toEncode in encode):
    return 2
  if(fromEncode==toEncode):
    return 3
  if(not os.path.isdir(dirname)):
    return 1
  return 0

if __name__=="__main__":
  error={1:" The first parameter is not a valid folder ",3:" The source code is the same as the target code ",2:" The code you want to convert is no longer in scope: UTF-8 . GBK"}
  dirname=sys.argv[1]
  fromEncode=sys.argv[2]
  toEncode=sys.argv[3]
  ret=CheckParam(dirname,fromEncode,toEncode)
  if(ret!=0):
    print error[ret]
  else:
    Do(dirname,fromEncode,toEncode)

Scripts are simple and easy to use


  ./changeEncode.py target_dir fromEncode toEncode

Note the following common coding relationships:

Us - ASCII encoding is a subset of utf-8 encoding, this is from stackoverflow, said the ASCII is a subset of utf-8, so all ASCII files are already utf-8 encoded,

I tried to make sure it was. The code was us-ascii when there was no Chinese characters, and utf-8 when there was Chinese characters.

There is also the ASNI coding format, which represents the local coding format, for example, in simplified Chinese operating system, ASNI coding for GBK code, this should be noted

Another point is that a command to view file encoding in Linux is:


file -i *

You can see the encoding of the file.

Of course, some of the above files may have special characters that fail to handle, but the general program files are fine.

That's all for this article, and I hope it will help you learn python.

Please take a moment to share the article with your friends or leave a comment. We sincerely appreciate your support!


Related articles: