Set the data types in MySQL to optimize the instance running speed

2020-11-03 22:37:33
OfStack

Today, I have seen an optimization case and I feel it is representative. Here, I will make a mark to commemorate the problem of 1 random field definition.

Recall that in table design, many people are used to the structure of the table as Varchar(64),Varchar(255), etc., although in most cases only 5-15 bytes are stored. So let's look at the following case.
Query statement:


 SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel;

The table (client_id,channel) is a composite index.
Using explain, look at the 1 down execution plan, which looks perfect for index usage


mysql&gt; explain SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel;
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
| id | select_type | table  | type | possible_keys  | key    | key_len | ref | rows  | Extra     |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
| 1 | SIMPLE  | xxx_sources | index | idx_client_channel | idx_client_channel | 1032 | NULL | 20207319 | Using where; Using index |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
1 row in set (0.00 sec)

Let's see the actual implementation of 1:


mysql&gt; SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel;
+---------+----------+
| channel | visitors |
+---------+----------+
| NULL |  0 |
+---------+----------+
1 row in set (11.69 sec)

The actual execution was very bad. The idea is that this execution looks perfect from the index execution plan and seems to have nothing to do with MySQL. If you look at the design of table 1 below, you will see that client_id is also designed
varchar(255). You can use the following method to try 1:


mysql&gt; explain SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = '1301' GROUP BY client_id, channel;
+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+
| id | select_type | table  | type | possible_keys  | key    | key_len | ref | rows | Extra     |
+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+
| 1 | SIMPLE  | xxx_sources | ref | idx_client_channel | idx_client_channel | 258  | const | 457184 | Using where; Using index |
+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+
1 row in set (0.00 sec)

In terms of the execution plan, it's about the same, but it's much worse. Specifically, key_len dropped from 1032 to 258, and the execution plan changed to const based on equal look-up, with the number of rows going from 10 million to 100 thousand. You don't have to understand IO
I saved a lot.
Let's look at the actual implementation:


mysql&gt; SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = '1301' GROUP BY client_id, channel;
+---------+----------+
| channel | visitors |
+---------+----------+
| NULL |  0 |
+---------+----------+
1 row in set (0.25 sec)

Wow, it went from 11.69 seconds to 0.25 seconds, what is the concept? How many times is it optimized? Let's do 1.

So if you look at what's going on here, keep in mind that this is a case where, well, good, you can optimize 1 in quotes in the future. So why don't you ask 1, can it be optimized? Why is that?
Let's start with the first question:
Can you optimize ?
The answer is yes, of course. In terms of the length of the index, 258 is still a very large data. For the field client_id, only the value of the data type can be saved, so why not save the unused one int unsigned?
The index's length immediately drops from 258 to 4. Isn't that a lot of savings?
Let's move on to the second question. Why is that?
For two reasons, and based on one principle, a cost-based optimizer. For client_id, the value of character type is defined in the table definition, and the value of numerical type is passed in the query, which needs to go through 1 numerical conversion, the beginning of the tragedy, and finally
Causes MySQL to select 1 completed index to scan.

What should we pay attention to from this case?
Reasonable choice of data type, the basic work is too important, this is called win at the starting line, 1 cut can not be arbitrary, do not define a table is reduced to the main construction of the rest is Varchar(255). Be careful when you index fields such as double/float for the database.