Solution to the problem of Chinese garbled code in URL address

  • 2021-07-18 07:07:58
  • OfStack

Solution to the problem of Chinese garbled code in URL address

Introduction: In the service design of Restful class, it is often encountered that Chinese is used as a parameter in URL address. In this case, 1 generally needs to set and encode Chinese character information correctly. The garbled problem arises. How to solve it? Listen to this article in detail.

1. The problem is raised

In the service design of Restful, when querying some information, the general URL address is designed as: get/basic/service? keyword = URL address for history, etc. However, in the actual development and use, there is indeed a garbled situation, in the background to read keyword information garbled, can not be read correctly.

2. How does garbled code come into being?

Because we use URL to transfer parameters in this way is dependent on the browser environment, that is to say, URL and URL contain each key=value format transfer parameter key value pairs of parameters are processed in the browser address bar processing principle and then transferred to the background for decoding.

Because we didn't do any processing, when javascript requests URL and sends parameters in Chinese (that is, when Chinese is input in the input box), the Chinese parameters of URL are encoded according to the browser mechanism. At this time, there is a problem of garbled coding.

3. For the first time, encodeURI () method is used for coding in javascript.

When the Chinese URL parameter is encoded in javascript using encodeURI (), the word "test" is converted to "% E 6% B 5% 8B% E 8% AF% 95". But problems remain. The reason is that in the encoded string information, the browser mechanism will think that "%" is an escaped character, and the browser will process the escaped characters between the passed translated parameters "%" and "%" in the address bar URL and pass them to the background. This will result in a discrepancy with the actual URL encoded by encodeURI (), because the browser mistook "%" for an escaped character and did not consider "%" a normal character.

4. Encoding twice, using encodeURI

Actions:


encodeURI(encodeURI("/order?name=" + name));

The processed URL is not the string "% E6% B5% 8B% E8% AF% 95" after URL processing in the previous step, but the string "% 25E6% B255% 258B% 25E8% AF% 2595" after URL processing.

At this time, the front-end javascript code has completed encoding URL with Chinese, and passed it to the background to wait for processing by URL passing parameters. Action obtained the parameter of normal conversion without garbled code as "% 25E6% B255% 258B% 25E8% AF% 2595", and the Chinese corresponding to this string is the word "test" we input.

5. How to correctly parse Chinese character information in the background?

Enter the background information, after encodeURI () twice, it is impossible to read the correct information directly. You need to continue with the following processing:


URLDecoder.decode("chinese string","UTF-8") 

The decode (String str, String ecn) method of URLDecoder has two parameters, the first parameter is the string to be decoded, and the second parameter is the corresponding encoding when decoding.

6. encodeURI, encodeURIComponent, escape

6.1 escape () Function

The escape () function encodes a string so that it can be read on all computers.

Return value: A copy of the encoded string. Some characters are replaced with 106-ary escape sequences.

Note: This method does not encode ASCII letters and numbers, nor does it encode the following ASCII punctuation marks:-_.! ~ * '(). All other characters are replaced by escape sequences. All spaces, punctuation marks, special characters, and other non-ASCII characters will be converted to the% xx character encoding (xx equals the hexadecimal digit of the character encoded in the character set table). For example, the code corresponding to the space character is% 20. Characters that will not be encoded by this method: @ */+

6.2 encodeURI () Method

URI string is converted into escape string by UTF-8 encoding format. Characters that will not be encoded by this method:! @ # $ & * ( ) = : / ; ? + '

6.3 encodeURIComponent () Method

URI string is converted into escape string by UTF-8 encoding format. Compared with encodeURI (), this method will encode more characters, such as/and so on. Therefore, if the string contains several parts of URI, it cannot be encoded in this way, otherwise URL will display an error after the/character is encoded.

Characters that will not be encoded by this method:! * () '

Therefore, for Chinese strings, if you don't want to convert the string encoding format to UTF-8 format (for example, when the charset of the original page and the target page is 1), you only need to use escape. If your page is GB2312 or other encoded, and the page that accepts the parameter is UTF-8 encoded, use encodeURI or encodeURIComponent.

7. Another scheme to deal with Chinese garbled codes in URL

The middle character of the request side has encodeURI for transcoding once, such as:


   var url="/ajax?name="+encodeURI(name);

Server-side code:


  name=new String(name.getBytes("iso8859-1"),"UTF-8");

Note: name is the obtained string, and iso8859-1 is the default character code of the project. If it is Chinese code gbk, gb2312, etc., this step is not needed for processing.

Analysis: After program verification, the results are feasible. Therefore, the default encoding mode of the browser itself is iso8859-1. Even if encodeURI is used for utf-8 encoding, the main string contents, such as ascii characters and visible characters, are still based on the characters of iso8859-1 browser itself. The reason is that these characters are coincident with UTF-8 string in encoding. And encodeURI and other escape functions mainly solve the special character%,/and other characters escape problems.

Thank you for reading, hope to help everyone, thank you for your support to this site!


Related articles: