A method in Python that restores the encoded string of the JavaScript escape function
- 2020-04-02 13:57:44
- OfStack
I encountered a problem that required me to use Python to restore the Chinese version of escape in JavaScript, but I couldn't find the answer after searching for it for most of a day, so I had to dig into the solution myself.
Let's first look at the encoding of the escape in js
a = escape(' This is a string of words ');
alert(a);
Output:
%u8FD9%u662F%u4E00%u4E32%u6587%u5B57
At first glance, it looks a bit like json, but let's take a look at the standard json format that encodes the same man: "this is a string of text."
# encoding=utf-8
import json
a = ' This is a string of words '
print json.dumps(a)
Output:
"u8fd9u662fu4e00u4e32u6587u5b57"
After comparison, it is actually the js escape encoding that every character is "%u" symbol and 4-bit character encoding, while the json encoding is "\u" symbol and 4-bit character encoding. In this case, we can use the string replacement operation to restore the json format, and then use the json module loads
# encoding=utf-8
import json
# js escape String encoding
c = '%u8FD9%u662F%u4E00%u4E32%u6587%u5B57'
# reduction Json object
jsonObj = '"'+"".join([(i and "\"+i) for i in c.split('%')])+'"'
print json.loads(jsonObj)
Remember to wrap the string in double quotes after replacing the "%" with the "\" symbol to make it a json object and then json. Loads
Later, I finally saw an easier way on one site. The code is as follows:
# encoding=utf-8
c = '%u8FD9%u662F%u4E00%u4E32%u6587%u5B57'
print "".join([(len(i)>0 and unichr(int(i,16)) or "") for i in c.split('%u')])
The idea is basically the same, replace the '%u' sign, and each of them is a 4-bit fixed-length character encoding, which is then reverted back to Chinese characters at unichr.