python version pit :md5 example of python2 differs from md5 in python3

  • 2020-06-07 04:41:30
  • OfStack

start

For some characters,python2 and python3's md5 are encrypted differently.


# python2.7
pwd = "xxx" + chr(163) + "fj"
checkcode = hashlib.md5(pwd).hexdigest()
print checkcode # ea25a328180680aab82b2ef8c456b4ce

# python3.6
pwd = "xxx" + chr(163) + "fj"
checkcode = hashlib.md5(pwd.encode("utf-8")).hexdigest()
print(checkcode) # b517e074034d1913b706829a1b9d1b67

In terms of code differences will be in python3 , you need to do the string encode Operation, if no, an error will be reported:


 checkcode = hashlib.md5(pwd).hexdigest()
TypeError: Unicode-objects must be encoded before hashing

This is because encryption requires converting the string to bytes Type,3 the default encoding is utf-8 So I use utf-8 to decode.

Analysis of the

If it's not in the string chr(163) , the result of the two versions is 1, that is to say, the problem is this chr(163) In:


# python2.7
>>> chr(163)
'\xa3'

# python3.6
>>> chr(163)
'\xa3'

Let's say pass here chr I'm going to get a result of 1, so I'm going to convert it to theta bytes See the types:


# python2.7
>>> bytes(chr(163))
'\xa3'

# python3.6
>>> chr(163).encode()
b'\xc2\xa3'

python3, in num<128 " chr(num).encode('utf-8') Get is encode0 The character base ascii106, and num>128 " chr(num).encode('utf-8') Get is 两个 Base ascii106 of bytes.

To solve
To switch to latin1 Encode and decode:


# python3.6
pwd = "xxx" + chr(163) + "fj"
checkcode = hashlib.md5(pwd.encode("latin1")).hexdigest()
print(checkcode)  # ea25a328180680aab82b2ef8c456b4ce

additional
Why is it latin1 Coding. The answer is interesting.

Let's start with the chr function, ok help(chr) You can check:


chr(...)
  chr(i) -> Unicode character
  Return a Unicode string of one character with ordinal i; 0 <= i <= 0x10ffff.

It returns 1 character at the specified position in the Unicode encoding encode And then it will be coded bytes Type.

In ascii encoding, each character encoding is 1 byte, but only 1-127. More than 128-255 belong to Extended ASCII This part is not included in python3 by default, so if you run chr(163).encode("ascii"), you will report an error 'ascii' codec encode '\xa3' position 3: ordinal in range(128)

Therefore, one encoding containing some characters in 128-255 is required, and one Byte is used to fix the large site code, such as bytes0 , that is, bytes1 Of course there are other codes such as cp1252 Also contains these characters.


Related articles: