A method in Python that restores the encoded string of the JavaScript escape function

  • 2020-04-02 13:57:44
  • OfStack

I encountered a problem that required me to use Python to restore the Chinese version of escape in JavaScript, but I couldn't find the answer after searching for it for most of a day, so I had to dig into the solution myself.
Let's first look at the encoding of the escape in js


a = escape(' This is a string of words ');
alert(a);

Output:

%u8FD9%u662F%u4E00%u4E32%u6587%u5B57

At first glance, it looks a bit like json, but let's take a look at the standard json format that encodes the same man: "this is a string of text."

# encoding=utf-8
import json
a = ' This is a string of words '
print json.dumps(a)

Output:
"u8fd9u662fu4e00u4e32u6587u5b57"

After comparison, it is actually the js escape encoding that every character is "%u" symbol and 4-bit character encoding, while the json encoding is "\u" symbol and 4-bit character encoding. In this case, we can use the string replacement operation to restore the json format, and then use the json module loads

# encoding=utf-8
import json
 
# js escape String encoding
c = '%u8FD9%u662F%u4E00%u4E32%u6587%u5B57'
 
# reduction Json object
jsonObj =  '"'+"".join([(i and "\"+i) for i in c.split('%')])+'"'
 
print json.loads(jsonObj)

Remember to wrap the string in double quotes after replacing the "%" with the "\" symbol to make it a json object and then json. Loads
Later, I finally saw an easier way on one site. The code is as follows:

# encoding=utf-8
c = '%u8FD9%u662F%u4E00%u4E32%u6587%u5B57'
print "".join([(len(i)>0 and unichr(int(i,16)) or "") for i in c.split('%u')])

The idea is basically the same, replace the '%u' sign, and each of them is a 4-bit fixed-length character encoding, which is then reverted back to Chinese characters at unichr.


Related articles: