How to configure tomcat server character set for utf 8

  • 2020-06-23 02:35:46
  • OfStack

What is a character set

Before we introduce the character set, let's understand why there is a character set. What we see on a computer screen is materialized text, whereas what we see in computer storage media is actually binary bitstreams. Then the conversion rules between the two need a unified standard, otherwise our U disk into the boss's computer, the documentation will be messy; My little friend QQ uploaded the file, in our local open and disorderly code. Therefore, in order to implement the conversion standard, various character set standards appear. Simply put, the character set specifies the conversion relationship between the binary number (encoding) of a character and the character (decoding) of a string of binary digits.

So why are there so many character set standards? This question is actually very easy to answer. Ask yourself why our plugs cannot be used in the UK. Why do monitors have DVI, VGA, HDMI, DP interfaces? Many norms and standards are created without realizing that they are going to be the norm for the rest of the world, or that it is in the interest of the organization to differentiate themselves substantially from existing standards. As a result, there are so many standards that have the same effect but are not compatible with each other.

Having said that, let's look at a practical example. The following is the result of encoding the word in base 106 and 2.

字符集 16进制编码 对应的2进制数据
UTF-8 0xE5B18C 1110 0101 1011 0001 1000 1100
UTF-16 0x5C4C 1011 1000 1001 1000
GBK 0x8CC5 1000 1100 1100 0101

The introduction

In the process of programming, we always encounter some Chinese coding problems, which need to be filtered and escaped in many aspects of the program, but there is still the possibility of encountering Chinese scrambling. The following is a method told by one of my colleagues, which doesn't work all the time, of course, it is for the tomcat server.

In addition, this method does not conflict with previous methods.

Server /conf directory/server.xml file

Change the relevant statement to:


<Connector port="8008" protocol="HTTP/1.1"
    connectionTimeout="20000"
    redirectPort="8443" URIEncoding="UTF-8"/>

The previous statement did not URIEncoding="UTF-8" This one sentence

A friend asks a question: This method seems to only work for get requests. How do I resolve POST requests? In addition to req.setCharacterEncoding(&quot;UTF-8&quot;); Out of the way?

In fact, the problem of messy code needs the overall planning of the whole system. From your database design, background character filtering, foreground data transfer. Simple to use req.set It doesn't always work.

So, if you submit post, the first check the database format is utf8, the second, post submit form form is not set utf8

conclusion


Related articles: