MYSQL's binary approach to the case sensitive problem of mysql data

  • 2020-06-01 11:09:20
  • OfStack


mysql> select binary 'ABCD'='abcd' COM1, 'ABCD'='abcd' COM2;
+--------+-----------+
| COM1 | COM2 |
+--------+-----------+
|      0     |      1      |
+---------+-----------+
1 row in set (0.00 sec)

(just some! 4. * before)
Because some MySQL, especially 4.*, may not be accurate in Chinese retrieval, binary can be added in the retrieval.
Build tables:


create TABLE usertest (
id int(9) unsigned NOT NULL auto_increment,
username varchar(30) NOT NULL default '',
primary key (id)
)

Insert data:

insert into usertest (username) VALUES(' The notes ');
insert into usertest (username) VALUES(' The project ');
insert into usertest (username) VALUES(' Li wen ');
insert into usertest (username) VALUES(' Ken ');
insert into usertest (username) VALUES(' Dreams drift ');
insert into usertest (username) VALUES(' Long Wu ');
insert into usertest (username) VALUES(' In the summer ');

For example: select * from usertest where username like '% summer %', results are out 7 records, more depressed.
If you use = instead of like, select * from usertest where username = 'summer', only one result will appear. Since the LIKE operation of mysql follows the ASCII operation, there may be problems with LIKE. The question continues: if plus:

insert into usertest (username) VALUES(' wen ');
insert into usertest (username) VALUES(' The tang dynasty ');

We still use select * from usertest where username = 'summer', and we still get three records, which is depressing again. The solution is as follows:
1. Use binary with create, not query.

username varchar(30) BINARY NOT NULL default '',  If the table is already built, use: 
alter table usertest modify username varchar(32) binary;  Come on the properties of the table. 

2. When adding binary, select * from usertest where username like binary '% % %' to query, an accurate query can be made.

char USES a fixed length of space for storage, char(4) stores four characters, occupying different bytes according to the encoding method; gbk, whether in Chinese or English, occupies two bytes for each character; utf8, occupying three bytes for each character.
char, such as MD5, is appropriate if the length of the string you want to store is not too different from the average length of all values.
For values that change frequently, char is better than varchar because fixed-length rows are less likely to fragment.
For very short columns, char is superior to varchar because varchar requires an additional 1 or 2 bytes to store the length of the string.

varchar stores variable-length strings, using an additional 1 or 2 bytes to store the length of the string, varchar(10), in addition to storing 10 characters, requires 1 byte to store length information (10), and 2 bytes to store a length greater than 255
Exception: when ROW_FORMAT=FIXED is used in the Myisam engine, each row USES the same space, resulting in waste

If there are Spaces after char and varchar, char will automatically remove the Spaces and store them. varchar will not remove the Spaces, but will remove the Spaces when comparing strings

+-------+--------------+------+-----+---------+----------------+
| Field | Type         | Null | Key | Default | Extra          |
+-------+--------------+------+-----+---------+----------------+
| id    | int(11)      | NO   | PRI | NULL    | auto_increment |
| name  | varchar(4)   | YES  |     | NULL    |                |
| addr  | char(8)      | YES  |     | NULL    |                |
| bn    | varbinary(4) | YES  |     | NULL    |                |
| b     | binary(8)    | YES  |     | NULL    |                |
+-------+--------------+------+-----+---------+----------------+
+----------------------+----------------------+
| concat("$",name,"$") | concat("$",addr,"$") |
+----------------------+----------------------+
| $asdf$               | $a$                  |
| $asdf$               | $a$                  |
| $a $                 | $a$                  |
| $a$                  | $a$                  |
| $t a$                | $a$                  |
+----------------------+----------------------+
mysql> select * from zcy where name='a ';   // Due to the name is varchar When comparing, 'a ' Automatic conversion to 'a'
+----+------+------+------+----------+
| id | name | addr | bn   | b        |
+----+------+------+------+----------+
|  3 | a    | a    | ab   | ab       |
|  4 | a    | a    | ab   | a        |
+----+------+------+------+----------+
2 rows in set (0.00 sec)

mysql> select * from zcy where name='a';
+----+------+------+------+----------+
| id | name | addr | bn   | b        |
+----+------+------+------+----------+
|  3 | a    | a    | ab   | ab       |
|  4 | a    | a    | ab   | a        |
+----+------+------+------+----------+
2 rows in set (0.00 sec)

 
+-------+--------------+------+-----+---------+----------------+
| Field | Type         | Null | Key | Default | Extra          |
+-------+--------------+------+-----+---------+----------------+
| id    | int(11)      | NO   | PRI | NULL    | auto_increment |
| name  | varchar(4)   | YES  |     | NULL    |                |
| addr  | char(8)      | YES  |     | NULL    |                |
| bn    | varbinary(4) | YES  |     | NULL    |                |
| b     | binary(8)    | YES  |     | NULL    |                |
+-------+--------------+------+-----+---------+----------------+
+--------------------+-------------------+
| concat("$",bn,"$") | concat("$",b,"$") |
+--------------------+-------------------+
| $ab a$             | NULL              |
| $ab $              | $ab      $        |
| $ab$               | $ab      $        |
| $ab  $             | $a       $        |
| NULL               | $a       $        |
| NULL               | $abcde   $        |
| NULL               | $abcd1234$        |
+--------------------+-------------------+

binary holds a 2-base string, which holds bytes instead of characters and has no character set restrictions
binary(8) can hold up to 8 characters, one byte each, for a total of 8 bytes
Comparisons are made byte-by-byte, not by character (char), and byte-by-byte comparisons are simpler and faster than characters
Case insensitive by character comparison, while binary is case sensitive and ends with \0 instead of Spaces

mysql> select * from zcy where b='a\0\0\0\0\0\0\0';
+----+------+------+------+----------+
| id | name | addr | bn   | b        |
+----+------+------+------+----------+
|  5 | t a  | a    | NULL | a        |
+----+------+------+------+----------+
mysql> select * from zcy where b='a \0\0\0\0\0\0'; 
+----+------+------+------+----------+
| id | name | addr | bn   | b        |
+----+------+------+------+----------+
|  4 | a    | a    | ab   | a        |
+----+------+------+------+----------+

varbinary Save the variable length string, no padding later \0
mysql> select * from zcy where bn='ab';
+----+------+------+------+----------+
| id | name | addr | bn   | b        |
+----+------+------+------+----------+
|  3 | a    | a    | ab   | ab       |
+----+------+------+------+----------+
1 row in set (0.01 sec)

mysql> select * from zcy where bn='ab ';
+----+------+------+------+----------+
| id | name | addr | bn   | b        |
+----+------+------+------+----------+
|  2 | asdf | a    | ab   | ab       |
+----+------+------+------+----------+
1 row in set (0.00 sec)

mysql> select * from zcy where bn='ab  ';
+----+------+------+------+----------+
| id | name | addr | bn   | b        |
+----+------+------+------+----------+
|  4 | a    | a    | ab   | a        |
+----+------+------+------+----------+
1 row in set (0.00 sec)

The difference between Blob and Text in MySql

BLOB is a large object in base 2 that can hold a variable amount of data. There are four types of BLOB: TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB. They just differ in the maximum length of the values that can be held.

There are four types of TEXT: TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT. These correspond to four BLOB types, with the same maximum length and storage requirements.

The BLOB column is treated as a binary string (byte string). The TEXT column is treated as a non-binary string (character string). The BLOB column has no character set and sorts and compares numeric values based on column value bytes. The TEXT column has one character set, and the values are sorted and compared according to the collation rules of the character set.

There is no case conversion during the storage or retrieval of the TEXT or BLOB columns.

When not running in strict mode, if you assign an BLOB or TEXT column to a value that exceeds the maximum length of the column type, the value is truncated to ensure fit. If the truncated character is not a space, a warning will be generated. Using strict SQL mode, an error is generated and the value is rejected rather than intercepted and warned.

In most respects, you can think of the BLOB column as a sufficiently large VARBINARY column. Again, you can think of the TEXT column as the VARCHAR column. BLOB and TEXT differ from VARBINARY and VARCHAR in the following respects:

· do not remove trailing Spaces when saving or retrieving values for columns BLOB and TEXT. This is the same as the VARBINARY and VARCHAR columns.

Note that TEXT is expanded with Spaces to fit the object of comparison, such as CHAR and VARCHAR.

· for the indexes of the BLOB and TEXT columns, the length of the index prefix must be specified. For CHAR and VARCHAR, the prefix length is optional.

· BLOB and TEXT columns cannot have default values.

LONG and LONG VARCHAR correspond to MEDIUMTEXT data types. This is to ensure compatibility. If the TEXT column type USES the BINARY attribute, the 2-yuan collation rule for the column character set is assigned to the column.

MySQL connector /ODBC defines BLOB value as LONGVARBINARY and TEXT value as LONGVARCHAR.

Because BLOB and TEXT values can be very long, you may encounter some constraints when using them:

· only the first max_sort_length bytes of the column are used when sorting. The default value for max_sort_length is 1024; This value can be changed when starting the mysqld server using the --max_sort_length option.

Increasing the value of max_sort_length at run time can make more bytes meaningful when sorting or combining. Any client can change the value of its session max_sort_length variable:


mysql> SET max_sort_length = 2000;

mysql> SELECT id, comment FROM tbl_name

    -> ORDER BY comment;

When you want to make sense of bytes larger than max_sort_length, another way to use GROUP BY or TEXT columns with long values is to convert column values to fixed-length objects. The standard method is to use the SUBSTRING function. For example, the following statement sorts the 2000 bytes of the comment column:

mysql> SELECT id, SUBSTRING(comment,1,2000) FROM tbl_name

    -> ORDER BY SUBSTRING(comment,1,2000);

· the maximum size of an BLOB or TEXT object is determined by its type, but the maximum value that can actually be passed between the client and the server is determined by the amount of available memory and the size of the communication cache. You can change the size of the message cache by changing the value of the max_allowed_packet variable, but you must modify both the server and the client program. For example, you can use mysql and mysqldump to change the client's max_allowed_packet value.

Each BLOB or TEXT value is represented by an internally assigned object. This is in contrast to other column types, which allocate a storage engine for each column when a table is opened.


Related articles: