MySQL Character Set Garbled Code and Solution Sharing

  • 2021-11-30 01:42:56
  • OfStack

Preface

Character set is a set of symbols and coding rules, Whether in the oracle database or the mysql database, There is a problem in the selection of character sets, Moreover, if the character set is not selected correctly in the database creation stage, it may be necessary to replace the character set in the later stage, which is a costly operation and also has a certain risk. Therefore, we recommend to select the appropriate character set correctly according to the requirements at the beginning of application to avoid unnecessary adjustment in the later stage.

Actual combat

1. Install the MySQL database

2. Garbled demonstration


mysql> show variables like 'character_set%';
+--------------------------+----------------------------------+
| Variable_name  | Value    |
+--------------------------+----------------------------------+
| character_set_client | utf8    |
| character_set_connection | utf8    |
| character_set_database | utf8    |
| character_set_filesystem | binary    |
| character_set_results | utf8    |
| character_set_server | utf8    |
| character_set_system | utf8    |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.00 sec)
 
mysql> insert into test.table10 values (1,' Crane in the Cloud ');
Query OK, 1 row affected (0.00 sec)
 
mysql> select * from test.table10;
+------+-----------+
| id | names |
+------+-----------+
| 1 |  Crane in the Cloud  |
+------+-----------+
1 row in set (0.00 sec)
 
mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)
 
mysql> select * from test.table10;
+------+-------+
| id | names |
+------+-------+
| 1 | ??? |
+------+-------+
1 row in set (0.00 sec)
 
mysql> show variables like 'character_set%';
+--------------------------+----------------------------------+
| Variable_name  | Value    |
+--------------------------+----------------------------------+
| character_set_client | latin1 # Character set used by client source data    
| character_set_connection | latin1 #  Connection layer character set 
| character_set_database | utf8 # Default character set for the currently selected database    
| character_set_filesystem | binary   
| character_set_results | latin1 # Query result character set  |
| character_set_server | utf8 # Default internal operation character set    |
| character_set_system | utf8 # System metadata ( Field name, etc ) Character set      
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.00 sec)

Solution to garbled code

As can be seen from the above, set names latin1; Three parameters have been changed. As long as the client, MySQL character-set-client, table charset three character sets are completely 1, it can be guaranteed that there will be no garbled code in 1.

Mode:

1. Execute set names xxx in mysql command line mode;


 mysql> set names utf8;
 Query OK, 0 rows affected (0.01 sec)

2. Specify the default character set when logging in to mysql


 [root@node1 ~]# mysql -S /tmp/mysql.sock4 --defaults-character-set=utf8 #-S  Specify multiple instances mysql The socket file of, 
  Use --defaults-character-set  Specifies the default character set. 

3. Modify the/etc/sysconfig/i18n file without specifying the default character set for my. cnf.


 vim /etc/sysconfig/i18n
 LANG='zh_CN.UTF-8' # If my.cnf Not specified, the default uses the system character set 

4. Modify the my. cnf file. The following two fields are OK in any one field.


  [client]
  default-character-set=latin1

  [mysql]
  default-character-set=latin1

The first two types of 1 and 2 are temporary solutions, and the latter two types of 3 and 4 are permanent solutions

View Character Set

1. Check the current character set settings of the system, and log in to mysql for execution after modification. show variables lile 'character_set%';


mysql> show variables like 'character_set%';# I revised it to read utf8 So the client's 3 All parameters are utf8. You can set your own character set. 
+--------------------------+----------------------------------+
| Variable_name   | Value       |
+--------------------------+----------------------------------+
| character_set_client  | utf8        |
| character_set_connection | utf8        |
| character_set_database | utf8        |
| character_set_filesystem | binary       |
| character_set_results | utf8        |
| character_set_server  | utf8        |
| character_set_system  | utf8        |
| character_sets_dir  | /usr/local/mysql/share/charsets/ |

2. mysql common character set. Execute the command show character set; View the character set supported by the system.


mysql> show character set;
+----------+-----------------------------+---------------------+--------+
| Charset | Description     | Default collation | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5  | Big5 Traditional Chinese | big5_chinese_ci  |  2 |
| dec8  | DEC West European   | dec8_swedish_ci  |  1 |
| cp850 | DOS West European   | cp850_general_ci |  1 |
| hp8  | HP West European   | hp8_english_ci  |  1 |
| koi8r | KOI8-R Relcom Russian  | koi8r_general_ci |  1 |
| latin1 | cp1252 West European  | latin1_swedish_ci |  1 | # Commonly used 
| latin2 | ISO 8859-2 Central European | latin2_general_ci |  1 |
| swe7  | 7bit Swedish    | swe7_swedish_ci  |  1 |
| ascii | US ASCII     | ascii_general_ci |  1 |
| ujis  | EUC-JP Japanese    | ujis_japanese_ci |  3 |
| sjis  | Shift-JIS Japanese   | sjis_japanese_ci |  2 |
| hebrew | ISO 8859-8 Hebrew   | hebrew_general_ci |  1 |
| tis620 | TIS620 Thai     | tis620_thai_ci  |  1 |
| euckr | EUC-KR Korean    | euckr_korean_ci  |  2 |
| koi8u | KOI8-U Ukrainian   | koi8u_general_ci |  1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci |  2 |
| greek | ISO 8859-7 Greek   | greek_general_ci |  1 |
| cp1250 | Windows Central European | cp1250_general_ci |  1 |
| gbk  | GBK Simplified Chinese  | gbk_chinese_ci  |  2 | # Commonly used 
| latin5 | ISO 8859-9 Turkish   | latin5_turkish_ci |  1 |
| armscii8 | ARMSCII-8 Armenian   | armscii8_general_ci |  1 |
| utf8  | UTF-8 Unicode    | utf8_general_ci  |  3 |# Commonly used 
| ucs2  | UCS-2 Unicode    | ucs2_general_ci  |  2 |
| cp866 | DOS Russian     | cp866_general_ci |  1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci |  1 |
| macce | Mac Central European  | macce_general_ci |  1 |
| macroman | Mac West European   | macroman_general_ci |  1 |
| cp852 | DOS Central European  | cp852_general_ci |  1 |
| latin7 | ISO 8859-13 Baltic   | latin7_general_ci |  1 |
| utf8mb4 | UTF-8 Unicode    | utf8mb4_general_ci |  4 | # Commonly used 
| cp1251 | Windows Cyrillic   | cp1251_general_ci |  1 |
| utf16 | UTF-16 Unicode    | utf16_general_ci |  4 |
| cp1256 | Windows Arabic    | cp1256_general_ci |  1 |
| cp1257 | Windows Baltic    | cp1257_general_ci |  1 |
| utf32 | UTF-32 Unicode    | utf32_general_ci |  4 |
| binary | Binary pseudo charset  | binary    |  1 |
| geostd8 | GEOSTD8 Georgian   | geostd8_general_ci |  1 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci |  2 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci |  3 |
+----------+-----------------------------+---------------------+--------+
39 rows in set (0.00 sec)

Character set selection

1. If you deal with various characters and publish them to different language countries and regions, select Unicode. For mysql, check number utf-8.

2, only Chinese, large amount of data, high performance requirements, choose gbk.

3. To handle the mobile Internet of Things business, choose utf8mb4

It is suggested to use a small character set as much as possible on the premise of fully satisfying the application. Because a smaller character set means that it can save space and reduce the number of bytes transmitted on the network, at the same time, it indirectly improves the performance of the system due to the smaller storage space.

Server character set settings


[mysqld]
...
character-set-server=utf8 # Add this statement to set the server-side character set. 

After the mysql server is restarted, these two parameters change to their set values.


| character_set_server  | utf8 
| character_set_database | utf8

Switch character set

Converts a database of one encoding to data of another encoding.


 mysql> set names utf8;
 Query OK, 0 rows affected (0.01 sec)
0

Common Schemes for Converting Database Character Sets


 mysql> set names utf8;
 Query OK, 0 rows affected (0.01 sec)
1

Finally

This is a note from learning MySQL for 15 years, and I will share it today

Summarize


Related articles: