Encoding and Decoding of JavaScript Character Set

2021-07-15 06:56:55
OfStack
 
1. Character set 
 
1) Characters and bytes (Character) 
 
Characters are the general name of various characters and symbols, including garbled codes; 1 character corresponds to 1 ~ n bytes, 1 byte corresponds to 8 bits, and each bit is represented by 0 or 1. 
 
2) Character Set (Character Set) 
 
Character sets are sets of multiple characters, each containing a different number of characters. Common character set names are ASCII Character Set, GB2312 Character Set, Unicode Character Set, etc. 
 
3) Character Set Encoding (Character Encoding) 
 
Character set encoding is to convert symbols into computer-readable binary, while decoding is to convert binary into human-readable symbols. 
 
Most character sets correspond to one coding mode (for example, GBK corresponds to GBK coding), but there are many kinds of Unicode coding, including UTF-8, UTF-16, UTF-32 and UTF-7. 
 
At present, "UTF-8" is the most used web page. UTF-8 uses 1 to 4 bytes to encode each character, which is a superset of ASCII, so the existing ASCII text does not need to be converted 
 
2. Browser binary 
 
1) Use of decimal and 106-ary in HTML attribute 
 
Decimal can be used in HTML " 
& 
# 56; ", 106-ary, use" 
& 
# x5a; ", which is one more x than decimal, and six characters a ~ f are added to the decimal code to represent 10 ~ 15. 
 
2) Use of decimal and 106-ary in the CSS attribute 
 
CSS is compatible with the binary form of HTML. In addition, 106 binary can also be expressed in the form of "\ 6c". 
 
3) JavaScript encoding package 
 
Two encoding modes of string octal and 106-ary can be performed directly through eval, in which octal is represented by "\ 56" and 106-ary is represented by "\ x5c". 
 
If Chinese characters are applied in the code and binary coding is needed, only 106 binary Unicode coding can be performed, and its representation form is "\ u4ee3\ u7801". 
 
In "Web front-end hacker technology disclosure", two methods are encapsulated to encode and decode, mainly using the following two methods. The specific code can be seen here. 
 
The core codes are: "str. charCodeAt (char). toString (binary)" and "String. fromCharCode (parseInt (code, binary))" 
 
The charCodeAt () method returns an integer between 0 and 65535 representing the UTF-16 code unit at the given index 
 
The static String. fromCharCode () method returns a string created using the specified sequence of Unicode values. 
 
You can also encode and decode "MonyerJS" through an online web page. 
 
4) HTML automatic decoding mechanism 
 
For example, enter the hexadecimal " 
& 
#x0048; 
& 
#x0065; 
& 
#x006c; 
& 
#x006c; 
& 
# x006f; "is automatically decoded to" hello ". 
 
There are also 1 well-known spaces. " 
& 
nbsp; "This is also the mechanism. 
 
3. Browser encoding 
 
There are three pairs of functions in JavaScript that can encode and decode strings, namely: 
 
escape/unescape, encodeURI/decodeURI, encodeURIComponent/decodeURIComponent. 
 
The main difference is the number of unencoded characters. 
 
1) escape has 69 unencoded characters 
 
*、+、-、.、/、@、_、0～9、a～z、A～Z 
And when escape encodes unicode values other than 0 ~ 255, it outputs% u**** format. 
 
2) encodeURI has 82 unencoded characters 
 
!、#、$、&、'、(、)、*、+、,、-、.、/、:、;、=、?、@、_、~、0～9、a～z、A～Z 
 
3) encodeURIComponent does not encode 71 characters 
 
!、'、(、)、*、-、.、_、~、0～9、a～z、A～Z