Analysis of problems caused by mysql's collation rules

2020-05-07 20:32:35
OfStack
                    Here's the question:

1 test table, latin1 for character set.

select to_id from test where to_id='cn elephant _ king ';

+---------------+ 

| to_id | 

+---------------+ 

| cn tao _ tao |

| cn elephant _ king |

+---------------+ 

2 rows in set (0.00 sec) 

Take the data of cn elephant _ king, unexpectedly took the data of cn pottery _ pottery also.

This is clearly not allowed.

See their codes:

(root@im_offlog1a:)[test]
>
select hex('cn tao _ tao ');

+----------------+ 

| hex('cn tao ') |

+----------------+ 

| 636ECCD55FCCD5 | 

+----------------+ 

1 row in set (0.00 sec) 

(root@im_offlog1a:)[test]
>
select hex('cn elephant _ king ');

+----------------+ 

| hex('cn elephant _ king ') |

+----------------+ 

| 636ECFF35FCDF5 | 

+----------------+ 

1 row in set (0.00 sec) 

It is true that the encoding is not identical, but why would mysql think that the two records are identical?

1 from the beginning, we positioned the problem as caused by collation.

show variables view

| collation_connection | latin1_swedish_ci 

| collation_database | latin1_swedish_ci 

| collation_server | latin1_swedish_ci 

Manually change these parameters to latin1_bin and the result is 1. This feels really strange.

Here is the naming rule of mysql collation:

They begin with their associated character set name, usually including a language name, and end with _ci (case-insensitive), _cs (case-sensitive), or _bin (2-yuan)

For example, the latin1 character set has the following correction rules:

Collating rule meaning

latin1_german1_ci DIN - 1 in Germany

latin1_swedish_ci Sweden/Finland

latin1_danish_ci Denmark/Norway

latin1_german2_ci DIN - 2 in Germany

latin1_bin conforms to the base 2 of latin1 encoding

latin1_general_ci multilingual (Western Europe)

latin1_general_cs multilingual (western European ISO), case sensitive

latin1_spanish_ci modern Spain

Finally, we rebuild the table and manually specify that collation for the table level is latin1_bin.

The problem was solved.

So again, why didn't it work when I tested latin1_bin manually earlier?

The original MySQL selected the table character set and collation rules as follows:

If CHARACTER SET X and COLLATE Y are specified, CHARACTER SET X and COLLATE Y are used.

If CHARACTER SET X is specified and COLLATE Y is not specified, then the default collation rules for CHARACTER SET X and CHARACTER SET X are used.

Otherwise, the server character set and server collation rules are used.

We specified character set when we created the table, so it always follows the corresponding default collation rule.

Of course, we don't really need to rebuild the table, just alter table db_allot CONVERT TO CHARACTER latin1 COLLATE latin1_bin.

In addition, it is recommended that collation should adopt bin type proofreading rules corresponding to the character set as far as possible, so as not to be prone to errors