A case study of optimizing MySQL query operations

2020-10-07 18:54:53
OfStack

Problem description

One user reported that the execution time of one SQL statement was unacceptably slow. The SQL statement looks simple (table names and field names have been modified in this description):
SELECT count(*) FROM a JOIN b ON a.`S` = b.`S` WHERE a.`L` > '2014-03-30 00:55:00' AND a.`L` < '2014-03-30 01:00:00' ;

And all the fields required by the query have been indexed. The table structure is as follows:


CREATE TABLE `a` (
`L` timestamp NOT NULL DEFAULT '2000-01-01 00:00:00',
`I` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`A` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`S` varchar(64) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`F` tinyint(4) DEFAULT NULL,
`V` varchar(256) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT '',
`N` varchar(64) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
KEY `IX_L` (`L`),
KEY `IX_I` (`I`),
KEY `IX_S` (`S`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `b` (
`R` timestamp NOT NULL DEFAULT '2000-01-01 00:00:00',
`V` varchar(32) DEFAULT NULL,
`U` varchar(32) DEFAULT NULL,
`C` varchar(16) DEFAULT NULL,
`S` varchar(64) DEFAULT NULL,
`I` varchar(64) DEFAULT NULL,
`E` bigint(32) DEFAULT NULL,
`ES` varchar(128) DEFAULT NULL,
KEY `IX_R` (`R`),
KEY `IX_C` (`C`),
KEY `IX_S` (`S`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

From the statement, it is natural for this query plan to use a as the driver table first, using the indexes a.L and b.S. In fact explain turns out to be:


+----+-------------+-------+-------+---------------+------+---------+----------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+----------+---------+-------------+
| 1 | SIMPLE | b | index | IX_S | IX_S | 195 | NULL | 1038165 | Using index |
| 1 | SIMPLE | a | ref | IX_L,IX_S | IX_S | 195 | test.b.S | 1 | Using where |
+----+-------------+-------+-------+---------------+------+---------+----------+---------+-------------+

Analysis of the

From the results of explain, the query USES b as the driver table.

In the previous article, we introduced that MySQL's selection of jion order is a method to select the minimum cost after analyzing the costs of various join orders.

This join only involves two tables and is of course independent of optimizer_search_depth. So the question is, why didn't the join order that we expected get selected?

MySQL Tips: MySQL provides the straight_join syntax to enforce the join order.


explain SELECT count(*) FROM a straight_join b ON a.`S` = b.`S` WHERE a.`L` > '2014-03-30 00:55:00' AND a.`L` < '2014-03-30 01:00:00' ;

+----+-------------+-------+-------+---------------+------+---------+------+---------+---------------------------------------------+

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+----+-------------+-------+-------+---------------+------+---------+------+---------+---------------------------------------------+

| 1 | SIMPLE | a | range | IX_L,IX_S | IX_L | 4 | NULL | 63 | Using where |

| 1 | SIMPLE | b | index | IX_S | IX_S | 195 | NULL | 1038165 | Using where; Using index; Using join buffer |

+----+-------------+-------+-------+---------------+------+---------+------+---------+---------------------------------------------+

In MySQL Tips: explain results, the query cost of join can be estimated by sequential serials of rows.

The & # 63; join order correct, simple analysis query cost: ordinary join is 1038165*1, straight_join is 63*1038165. It looks like MySQL was right. But 1 must where wrong!

Found abnormal

Back to our original vision. We expected table a to be the driver table because we thought table b could use the IX_S index. In fact, table staight_join did use the index, but this result is not what we expected.

As we know, the filter quality of the index is an important factor determining whether an index will be selected in the query. Is the filter quality of ES77en. S not good?

MySQL Tips: show index from tbname The value of Cardinality in the return result indicates the filterability of an index.

show index has too many results and can also be taken from the information_schema table.


mysql> select * from information_schema.STATISTICS where table_name='b' and index_name='IX_S'\G
*************************** 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: test
TABLE_NAME: b
NON_UNIQUE: 1
INDEX_SCHEMA: test
INDEX_NAME: IX_S
SEQ_IN_INDEX: 1
COLUMN_NAME: S
COLLATION: A
CARDINALITY: 1038165
SUB_PART: NULL
PACKED: NULL
NULLABLE: YES
INDEX_TYPE: BTREE
COMMENT:
INDEX_COMMENT:

CARDINALITY: 1038165 is large enough for this index. So what are the estimated rows in this table.


show table status like 'b'\G
*************************** 1. row ***************************
Name: b
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 1038165
Avg_row_length: 114
Data_length: 119160832
Max_data_length: 0
Index_length: 109953024
Data_free: 5242880
Auto_increment: NULL
Create_time: 2014-05-23 00:24:25
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)

As can be seen from Rows: 1038165, the index IX_S is considered very well differentiated and is already close to the only index.

MySQL Tips: The Rows seen in the show table status result is used to represent the current front number of the table. This is an exact value for TABLE MyISAM, but it is an estimate for TABLE InnoDB.

Although it is an estimate, the optimizer is guided by this, that is, the data in one of the explain above is completely not as expected: rows on line 2 of the staight_join result.

Stage of the conclusion

We found that the entire logic of the error was this: the execution plan for the table driven by a, due to the index ES125en.S's rows estimated at 1038165, led the optimizer to conclude that the cost was greater than the table driven by b. (Of course, those familiar with the explan results will find that the type field and Extra field 1 in line 2 are weird.)

In other words, every row obtained by straight_join goes to the b query with a full table scan. The most common example of this occurring in MySQL is type casting. For example, a string field contains all Numbers, but is not passed in as a string when querying.

In this case, both of them are strings. So, it's character set dependent.

Going back to the two table structures, the difference in the declaration of the S field is COLLATE utf8_bin -- this is the root cause of the case: the S value from the a table is utf8_bin, which the optimizer considers to be of a different type and cannot be filtered directly using the index ES152en.IX_ES154en.

As for why indexes are still used, this is because overwriting indexes is a "misunderstanding."

MySQL Tips: If all the results of a query can be obtained completely from an index, then the traversal index is preferred to the traversal data.

As a validation,


mysql> explain SELECT * FROM a straight_JOIN b ON binary a.`S` = b.`S` WHERE a.`L` > '2014-03-30 00:55:00' AND a.`L` < '2014-03-30 01:00:00' ;

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + - + -- - + - + -- - + -- -- -- -- -- -- -- -- +

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + - + -- - + - + -- - + -- -- -- -- -- -- -- -- +

| 1 | SIMPLE | a | range | IX_L | IX_L | 4 | NULL | 63 | Using where |

| 1 | SIMPLE | b | ALL | IX_S | NULL | NULL | NULL | 1038165 | Range checked for each record (index map: 0x4) |

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + - + -- - + - + -- - + -- -- -- -- -- -- -- -- +

Since the result is select *, the overwritten index cannot be used, so line 2, key, shows NULL. (Tears: it would have been easier to look up earlier.)

To optimize the

The most straightforward idea, of course, is to change the definition of the S field for both tables to be the same. This approach avoids changing the business code, but DDL is slightly more expensive. Here are two optimizations for the SQL statement.


1 , select count(*) from b join (select s from a WHERE a.`L` > '2014-03-30 00:55:00' AND a.`L` < '2014-03-30 01:00:00') ta on b.S=ta.s;

This is a straightforward way to write it, so pay attention to the order of b.S and ta.S


2 , SELECT count(*) FROM a JOIN b ON binary a.`S` = b.`S` WHERE a.`L` > '2014-03-30 00:55:00' AND a.`L` < '2014-03-30 01:00:00' ;

This is known from the previous analysis because b.S is defined as utf8_bin.

MySQL Tips: In the character set naming rules of MySQL, the difference between XXX_bin and XXX is case sensitive.

Here we add A.s to all binary qualifiers, first to lower case, that is, the temporary result set is converted to utf8_bin, and then when using b.S matching, the index can be directly used.

In fact, the two rewriting methods are essentially the same, the difference is that method 1 is an implicit transformation. In theory, it would be faster to write 2.

summary

Do join as much as possible to design the fields to be of exactly the same type.