A method in Python that restores the encoded string of the JavaScript escape function

2020-04-02 13:57:44
OfStack

I encountered a problem that required me to use Python to restore the Chinese version of escape in JavaScript, but I couldn't find the answer after searching for it for most of a day, so I had to dig into the solution myself.
Let's first look at the encoding of the escape in js



a = escape(' This is a string of words ');

alert(a);

Output:



%u8FD9%u662F%u4E00%u4E32%u6587%u5B57

At first glance, it looks a bit like json, but let's take a look at the standard json format that encodes the same man: "this is a string of text."



# encoding=utf-8

import json

a = ' This is a string of words '

print json.dumps(a)

Output:

"u8fd9u662fu4e00u4e32u6587u5b57"

After comparison, it is actually the js escape encoding that every character is "%u" symbol and 4-bit character encoding, while the json encoding is "\u" symbol and 4-bit character encoding. In this case, we can use the string replacement operation to restore the json format, and then use the json module loads



# encoding=utf-8

import json

 

# js escape  String encoding 

c = '%u8FD9%u662F%u4E00%u4E32%u6587%u5B57'

 

#  reduction Json object 

jsonObj =  '"'+"".join([(i and "\"+i) for i in c.split('%')])+'"'

 

print json.loads(jsonObj)

Remember to wrap the string in double quotes after replacing the "%" with the "\" symbol to make it a json object and then json. Loads
Later, I finally saw an easier way on one site. The code is as follows:



# encoding=utf-8

c = '%u8FD9%u662F%u4E00%u4E32%u6587%u5B57'

print "".join([(len(i)>0 and unichr(int(i,16)) or "") for i in c.split('%u')])

The idea is basically the same, replace the '%u' sign, and each of them is a 4-bit fixed-length character encoding, which is then reverted back to Chinese characters at unichr.