Some of the optimizations that MySQL can make in complex cases of associations

2020-09-28 09:12:15
OfStack

Yesterday, I dealt with the optimization of 1 complex correlation SQL. Such optimization of SQL usually considers the following four points:

1. The result set returned by the query, usually the result set returned by the query is very small, it is confident to optimize;

2. The selection of driver table is very important. By looking at the execution plan, we can see the driver table selected by the optimizer, and rows in the execution plan can roughly reflect the problem;

3. Clarify the association between the tables, and pay attention to whether there are appropriate indexes on the associated fields;

4. Use the straight_join keyword to enforce the association order between tables, which can facilitate us to verify some conjecture;

SQL:
Execution time:


mysql> select c.yh_id,
-> c.yh_dm,
-> c.yh_mc,
-> c.mm,
-> c.yh_lx,
-> a.jg_id,
-> a.jg_dm,
-> a.jg_mc,
-> a.jgxz_dm,
-> d.js_dm yh_js
-> from a, b, c
-> left join d on d.yh_id = c.yh_id
-> where a.jg_id = b.jg_id
-> and b.yh_id = c.yh_id
-> and a.yx_bj =  ' Y'
-> and c.sc_bj =  ' N'
-> and c.yx_bj =  ' Y'
-> and c.sc_bj =  ' N'
-> and c.yh_dm = '006939748XX' ;

1 row in set (0.75 sec)

This SQL query actually returns only 1 row of data, but it took 750ms to execute. Look at the execution plan:


mysql> explain
-> select c.yh_id,
-> c.yh_dm,
-> c.yh_mc,
-> c.mm,
-> c.yh_lx,
-> a.jg_id,
-> a.jg_dm,
-> a.jg_mc,
-> a.jgxz_dm,
-> d.js_dm yh_js
-> from a, b, c
-> left join d on d.yh_id = c.yh_id
-> where a.jg_id = b.jg_id
-> and b.yh_id = c.yh_id
-> and a.yx_bj =  ' Y'
-> and c.sc_bj =  ' N'
-> and c.yx_bj =  ' Y'
-> and c.sc_bj =  ' N'
-> and c.yh_dm = '006939748XX' ;

+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | test.b.YH_ID | 1 | Using where |
| 1 | SIMPLE | d | index | NULL | PRIMARY | 196 | NULL | 54584 | Using index |
+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+

You can see two prominent performance bottlenecks in the execution plan:


| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |

| 1 | SIMPLE | d | index | NULL | PRIMARY | 196 | NULL | 54584 | Using index |

Since d is the table of left join, the driver table will not select d table. Let's take a look at the size of a,b,c3 table:


mysql> select count(*) from c;
+ -- - -+
| count(*) |
+ -- - -+
| 53731 |
+ -- - -+

mysql> select count(*) from a;
+ -- - -+
| count(*) |
+ -- - -+
| 53335 |
+ -- - -+

mysql> select count(*) from b;
+ -- - -+
| count(*) |
+ -- - -+
| 105809 |
+ -- - -+

Since the data volume of b table is larger than that of the other two tables, and there is basically no query filtering condition on b table, the possibility of selecting B in the table driven is excluded.

The optimizer actually chose the a table as the driver table, so why not the c table as the driver table? Let's analyze 1:

Stage 1: The a table is used as the driver table
a � > b � > c � > d:
(1) : a. jg_id = b. jg_id - > b index :PRIMARY KEY (' JG_ID ', 'YH_ID')

(2) : b. yh_id = c. yh_id - > (c index :PRIMARY KEY (' YH_ID '))

(3) : c. yh_id = d. yh_id - > (d index :PRIMARY KEY (' JS_DM ', 'YH_ID')
Since there is no index to yh_id on d table, the index adds an index to d table:


alter table d add index ind_yh_id(yh_id);

Execution plan:


+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- � + -- - + -- -- � + - -+ -- -- -- -+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- � + -- - + -- -- � + - -+ -- -- -- -+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | test.b.YH_ID | 1 | Using where |
| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.b.YH_ID | 272 | Using index |
+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- � + -- - + -- -- � + - -+ -- -- -- -+

Execution time:


1 row in set (0.77 sec)

After adding an index to the d table, the number of rows scanned in the d table dropped to 272 rows (starting with: 54584)


| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.b.YH_ID | 272 | Using index |

Stage 2: The c table is used as the driver table

d
^
|
c � > b � > a
Since there is a highly filtered filter condition of yh_dm on c table, we create an index on yh_dm:


mysql> select count(*) from c where yh_dm = '006939748XX';
+ -- - -+
| count(*) |
+ -- - -+
| 2 |
+ -- - -+

Add index:


alter table c add index ind_yh_dm(yh_dm)

View the execution plan:


mysql> explain
-> select c.yh_id,
-> c.yh_dm,
-> c.yh_mc,
-> c.mm,
-> c.yh_lx,
-> a.jg_id,
-> a.jg_dm,
-> a.jg_mc,
-> a.jgxz_dm,
-> d.js_dm yh_js
-> from a, b, c
-> left join d on d.yh_id = c.yh_id
-> where a.jg_id = b.jg_id
-> and b.yh_id = c.yh_id
-> and a.yx_bj =  ' Y'
-> and c.sc_bj =  ' N'
-> and c.yx_bj =  ' Y'
-> and c.sc_bj =  ' N'
-> and c.yh_dm = '006939748XX' ;

+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | test.b.YH_ID | 1 | Using where |
| 1 | SIMPLE | d | index | NULL | PRIMARY | 196 | NULL | 54584 | Using index |
+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+

Execution time:


mysql> explain
-> select c.yh_id,
-> c.yh_dm,
-> c.yh_mc,
-> c.mm,
-> c.yh_lx,
-> a.jg_id,
-> a.jg_dm,
-> a.jg_mc,
-> a.jgxz_dm,
-> d.js_dm yh_js
-> from a, b, c
-> left join d on d.yh_id = c.yh_id
-> where a.jg_id = b.jg_id
-> and b.yh_id = c.yh_id
-> and a.yx_bj =  ' Y'
-> and c.sc_bj =  ' N'
-> and c.yx_bj =  ' Y'
-> and c.sc_bj =  ' N'
-> and c.yh_dm = '006939748XX' ;

+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | test.b.YH_ID | 1 | Using where |
| 1 | SIMPLE | d | index | NULL | PRIMARY | 196 | NULL | 54584 | Using index |
+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+

After adding an index to c table, the index is still not on, and the execution plan still takes a table as the driver table. Therefore, let's analyze why a table is still taken as the driver table here.

1) : c. yh_id = b. yh_id - > ( PRIMARY KEY (`JG_ID`,`YH_ID`) )

a. If c table, for the driver table, then c table with b table at the time of association, because in the b yh_id no field index, because b table the amount of data is very large, so the optimizer think if here with c table as the driver table, will create greater associated with b table (in this case you can use straight_join force c table as the driver table).
b. If the a table is used as the driver table, when the a table is associated with the b table, the optimizer believes that the cost of using a as the driver table is less than the cost of using c as the driver board because there is an index of the jg_id field on the b table.
Therefore, if we want to drive the C table, we only need to add the index of yh_id to b:


mysql> explain
-> select c.yh_id,
-> c.yh_dm,
-> c.yh_mc,
-> c.mm,
-> c.yh_lx,
-> a.jg_id,
-> a.jg_dm,
-> a.jg_mc,
-> a.jgxz_dm,
-> d.js_dm yh_js
-> from a, b, c
-> left join d on d.yh_id = c.yh_id
-> where a.jg_id = b.jg_id
-> and b.yh_id = c.yh_id
-> and a.yx_bj =  ' Y'
-> and c.sc_bj =  ' N'
-> and c.yx_bj =  ' Y'
-> and c.sc_bj =  ' N'
-> and c.yh_dm = '006939748XX' ;

+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | test.b.YH_ID | 1 | Using where |
| 1 | SIMPLE | d | index | NULL | PRIMARY | 196 | NULL | 54584 | Using index |
+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+

2) : b. jg_id = a. jg_id - > ( PRIMARY KEY (`JG_ID`) )

3) : c. yh_id = d. yh_id - > ( KEY `ind_yh_id` (`YH_ID`) )
Execution plan:


+ - -+ -- -- -- -+ - -+ - � + -- -- -- -+ -- � + -- - + -- -- � + - + -- -- -- -+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ - -+ -- -- -- -+ - -+ - � + -- -- -- -+ -- � + -- - + -- -- � + - + -- -- -- -+
| 1 | SIMPLE | c | ref | PRIMARY,ind_yh_dm | ind_yh_dm | 57 | const | 2 | Using where |
| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.c.YH_ID | 272 | Using index |
| 1 | SIMPLE | b | ref | PRIMARY,ind_yh_id | ind_yh_id | 98 | test.c.YH_ID | 531 | Using index |
| 1 | SIMPLE | a | eq_ref | PRIMARY,INDEX_JG | PRIMARY | 98 | test.b.JG_ID | 1 | Using where |
+ - -+ -- -- -- -+ - -+ - � + -- -- -- -+ -- � + -- - + -- -- � + - + -- -- -- -+

Execution time:


mysql> explain
-> select c.yh_id,
-> c.yh_dm,
-> c.yh_mc,
-> c.mm,
-> c.yh_lx,
-> a.jg_id,
-> a.jg_dm,
-> a.jg_mc,
-> a.jgxz_dm,
-> d.js_dm yh_js
-> from a, b, c
-> left join d on d.yh_id = c.yh_id
-> where a.jg_id = b.jg_id
-> and b.yh_id = c.yh_id
-> and a.yx_bj =  ' Y'
-> and c.sc_bj =  ' N'
-> and c.yx_bj =  ' Y'
-> and c.sc_bj =  ' N'
-> and c.yh_dm = '006939748XX' ;

+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | test.b.YH_ID | 1 | Using where |
| 1 | SIMPLE | d | index | NULL | PRIMARY | 196 | NULL | 54584 | Using index |
+ - -+ -- -- -- -+ - -+ - � + -- -- -- + -- - + -- - + -- -- � + - -+ -- -- -- -+

As you can see, rows in the implementation plan has been greatly reduced, and the execution time has been reduced from the original 750ms to the level of 0 ms.