MySQL character set GBK GB2312 UTF8 to solve the MYSQL Chinese garble problem

  • 2020-05-13 03:38:33
  • OfStack

Several character sets are involved in MySQL

character-set-server/default-character-set: server character set, used by default.
character-set-database: database character set.
character-set-table: database table character set.
The priority increases in turn. Therefore, in general, only character-set-server needs to be set, and no character set is specified when creating databases and tables. In this way, character-set-server character set is adopted.
character-set-client: client character set. Client default character set. When a client sends a request to the server, the request is encoded in this character set.
character-set-results: result character set. When the server returns a result or information to the client, the result is encoded in this character set.
On the client side, if character-set-results is not defined, the character-set-client character set is used as the default character set. So you only need to set the character-set-client character set.

To process Chinese, character-set-server and character-set-client can be set to GB2312, or UTF8 if you want to process multiple languages simultaneously.

Chinese questions about MySQL

The solution is to set the following three system parameters of MySQL to the same character set as the server character set character-set-server before executing the SQL statement.
character_set_client: client character set.
character_set_results: result character set.
character_set_connection: connection character set.
Set these three system parameters by sending statements to MySQL: set names gb2312

About GBK, GB2312, UTF8

UTF-8: Unicode Transformation Format-8bit, BOM is allowed, but BOM is not usually included. Is used to solve the international character 1 kind of multi-byte encoding, it USES 8 bits (that is, 1 byte) for English, Chinese USES 24 for (3 bytes) to encode. UTF-8 contains the characters needed by all countries in the world. It is an international code with strong universality. The UTF-8 encoded text can be displayed in browsers that support the UTF8 character set in various countries. For example, if the code is UTF8, Chinese will also be displayed on the English IE of foreigners, who do not need to download IE's Chinese language support package.

GBK is a standard compatible with GB2312 based on the national standard GB2312. The literal encoding of GBK is represented in double bytes, that is, both Chinese and English characters are represented in double bytes. In order to distinguish Chinese characters, the highest bit of GBK is set to 1. GBK contains all Chinese characters and is a national code. It is less universal than UTF8, but UTF8 occupies a larger database than GBD.

GBK, GB2312 and UTF8 must be encoded by Unicode before they can be converted to each other:
GBK, GB2312 - Unicode - UTF8
UTF8 -- Unicode -- GBK, GB2312

For a website or forum, UTF-8 is recommended to save space if there are more English characters. However, many forums now only support GBK as a plugin 1.

GB2312 is a subset of GBK, and GBK is a subset of GB18030
GBK is a large set of characters including Chinese, Japanese and Korean characters
If it is a Chinese website, GB2312 GBK is still a bit problematic sometimes
To avoid all garish code problems, UTF-8 should be used, and it will be very convenient to support internationalization in the future
UTF-8 can be thought of as a large character set, which contains most of the text encoding.
One of the benefits of using UTF-8 is that users in other places (such as Hong Kong and Taiwan) will be able to view your text without the need to install simplified Chinese support.

gb2312 is simplified Chinese
gbk supports simplified Chinese and traditional Chinese
big5 supports traditional Chinese
utf-8 supports almost all characters

First, analyze the situation of scrambled codes
1. Write as garbled when writing to the database
2. Query results are returned in garbled code
What kind of situation is it when it happens?
Let's start with the mysql command line
show variables like '%char%';
Check the Settings of mysql character set:

mysql > show variables like '%char%';
+--------------------------+----------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------+
| character_set_client | gbk |
| character_set_connection | gbk |
| character_set_database | gbk |
| character_set_filesystem | binary |
| character_set_results | gbk |
| character_set_server | gbk |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/mysql/charsets/ |
+--------------------------+----------------------------------------+

In the query results, you can see the client, database connection, database, file system and query in mysql database system
The character set of the result, server, and system
In this case, the file system character set is fixed, and the system and server character set are determined at installation, regardless of the garble problem
The problem of garbled code is related to the character set of client, database connection, database and query results
* note: the client side is the way to access the mysql database, through the command line access, the command line window is the client side, through
Through the JDBC connection access, the program is the client
When we write Chinese data to mysql, we need to transcode to the client side, connect to the database, and write to the database
in
In the execution of the query, the return result, the database connection, the client side of the code conversion
It should be clear by now that garbled code occurs in one or more of the databases, clients, query results, and database connections
A link
Now let's solve this problem
When logging into the database, we connect using mysql -- default-character-set = charset -- u root-p
show variables like '%char%'; Command to see the setting of character set, you can find client, database connection,
The character set for the query result is set to the character set selected at login time
If you are already logged in, you can use the set names character set. Command to achieve the above effect, equivalent to the following command:
set character_set_client = character set
set character_set_connection = character set
set character_set_results = character set
If you are connecting to the database via JDBC, write URL:
URL = jdbc: mysql: / / localhost: 3306 / abs? useUnicode = true & characterEncoding = character set
Terminal such as JSP page should also set the corresponding character set
The database's character set can be specified by modifying the launch configuration of mysql, or it can be added to create database
default character set character set to force the database character set
Through this setting, the whole data writing and reading process is unified 1 character set, there will be no garbled code
Why write Chinese directly from the command line without setting and without garbled code?
It is clear from the command line that the character set Settings for the client, database connection, and query results have not changed
Input Chinese after 1 series transcoding back to the original character set, we see of course is not garbled code
However, this does not mean that the Chinese language is correctly stored as Chinese characters in the database
For example, there is now an utf8 encoding database with client connections using the GBK encoding and connection using the default
ISO8859-1 (latin1 in mysql), we send the string "Chinese" in the client, the client
A string of GBK 2-base code is sent to the connection layer, and connection layer sends this segment in ISO8859-1
The base 2 code is sent to the database, and the database stores the code in utf8, and we store the field in utf8
When the format is read out, it must be garbled, that is to say, the Chinese data is stored in garbled form when it is written to the database.
In the query operation with one client, 1 set of operations and write the opposite operation, error utf8 format 2
The code is then converted to the correct GBK code and displayed correctly.

Related articles: