The primary key function in MySQL is analyzed

  • 2020-09-16 07:49:49
  • OfStack

In 5.1.46, the optimizer made one change in the selection of primary key:

Performance: While looking for the shortest index for a covering index scan, the optimizer did not consider the full row length for a clustered primary key, as in InnoDB Secondary indexes will now be preferred, making full table less likely.

The find_shortest_key function has been added to this release, which can be thought of as selecting the minimum key length

Index to satisfy our query.

How the function works:

What find_shortest_key should do is the following. If the primary key is a covering index
and is clustered, like in MyISAM, then the behavior today should remain the same. If the
primary key is clustered, like in InnoDB, then it should not consider using the primary
key because then the storage engine will have to scan through much more data.

Call Primary_key_is_clustered(), and when the return value is true, execute find_shortest_key: Select the smallest covered index (Secondary length covering indexes) to satisfy the query.

First test in 5.1.45:


$mysql -V

mysql Ver 14.14 Distrib 5.1.45, for unknown-linux-gnu (x86_64) using EditLine wrapper

root@test 03:49:45>create table test(id int,name varchar(20),name2 varchar(20),d datetime,primary key(id)) engine=innodb;

Query OK, 0 rows affected (0.16 sec)

root@test 03:49:47>insert into test values(1,'xc','sds',now()),(2,'xcx','dd',now()),(3,'sdds','ddd',now()),(4,'sdsdf','dsd',now()),(5,'sdsdaa','sds',now());

Query OK, 5 rows affected (0.00 sec)

Records: 5 Duplicates: 0 Warnings: 0

root@test 03:49:51>

root@test 03:49:51>insert into test values(6,'xce','sdsd',now()),(7,'xcx','sdsd',now()),(8,'sdds','sds',now()),(9,'sdsdsdf','sdsdsd',now()),(10,'sdssdfdaa','sdsdsd',now());

Query OK, 5 rows affected (0.00 sec)

Records: 5 Duplicates: 0 Warnings: 0

Create index ind_1:


root@test 03:49:53>alter table test add index ind_1(name,d);

Query OK, 0 rows affected (0.09 sec)

Records: 0 Duplicates: 0 Warnings: 0

root@test 03:50:08>explain select count(*) from test;

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + -- - + -- - + - + - + -- -- -- -+

| id | select_type | table | type | possible_keys | key   | key_len | ref | rows | Extra    |

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + -- - + -- - + - + - + -- -- -- -+

| 1 | SIMPLE   | test | index | NULL     | PRIMARY | 4    | NULL |  10 | Using index |

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + -- - + -- - + - + - + -- -- -- -+

1 row in set (0.00 sec)

Add ind_2:


root@test 08:04:35>alter table test add index ind_2(d);

Query OK, 0 rows affected (0.07 sec)

Records: 0 Duplicates: 0 Warnings: 0

root@test 08:04:45>explain select count(*) from test;

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + -- - + -- - + - + - + -- -- -- -+

| id | select_type | table | type | possible_keys | key   | key_len | ref | rows | Extra    |

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + -- - + -- - + - + - + -- -- -- -+

| 1 | SIMPLE   | test | index | NULL     | PRIMARY | 4    | NULL |  10 | Using index |

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + -- - + -- - + - + - + -- -- -- -+

1 row in set (0.00 sec)

In the above version [5.1.45], you can see that the optimizer chooses to use the primary key to complete the scan, instead of using ind_1 and ind_2 to complete the query.

Next up: 5.1.48


$mysql -V

mysql Ver 14.14 Distrib 5.1.48, for unknown-linux-gnu (x86_64) using EditLine wrapper

root@test 03:13:15> create table test(id int,name varchar(20),name2 varchar(20),d datetime,primary key(id)) engine=innodb;

Query OK, 0 rows affected (0.00 sec)

root@test 03:48:04>insert into test values(1,'xc','sds',now()),(2,'xcx','dd',now()),(3,'sdds','ddd',now()),(4,'sdsdf','dsd',now()),(5,'sdsdaa','sds',now());

Query OK, 5 rows affected (0.00 sec)

Records: 5 Duplicates: 0 Warnings: 0

root@test 03:48:05>insert into test values(6,'xce','sdsd',now()),(7,'xcx','sdsd',now()),(8,'sdds','sds',now()),(9,'sdsdsdf','sdsdsd',now()),(10,'sdssdfdaa','sdsdsd',now());

Query OK, 5 rows affected (0.01 sec)

Records: 5 Duplicates: 0 Warnings: 0

Create index ind_1:


root@test 03:13:57>alter table test add index ind_1(name,d);

Query OK, 0 rows affected (0.01 sec)

Records: 0 Duplicates: 0 Warnings: 0

root@test 03:15:55>explain select count(*) from test;

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + - -+ -- - + - + - + -- -- -- -+

| id | select_type | table | type | possible_keys | key  | key_len | ref | rows | Extra    |

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + - -+ -- - + - + - + -- -- -- -+

| 1 | SIMPLE   | test | index | NULL     | ind_1 | 52   | NULL |  10 | Using index |

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + - -+ -- - + - + - + -- -- -- -+

root@test 08:01:56>alter table test add index ind_2(d);

Query OK, 0 rows affected (0.03 sec)

Records: 0 Duplicates: 0 Warnings: 0

 add ind_2 : 

root@test 08:02:09>explain select count(*) from test;

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + - -+ -- - + - + - + -- -- -- -+

| id | select_type | table | type | possible_keys | key  | key_len | ref | rows | Extra    |

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + - -+ -- - + - + - + -- -- -- -+

| 1 | SIMPLE   | test | index | NULL     | ind_2 | 9    | NULL |  10 | Using index |

+ - -+ -- -- -- -+ - -+ - -+ -- -- - + - -+ -- - + - + - + -- -- -- -+

1 row in set (0.00 sec)

Version [5.1.48] first wisely chose ind_1 to complete the scan, without considering the use of primary key (full index scan) to complete the query, then added ind_2. Since the LENGTH of ind_1's key is larger than ind_2's key, mysql chose the better ind_2 to complete the query, it can be seen that mysql is also gradually intelligent in the selection method.

Observation performance:


5.1.48

root@test 08:49:32>set profiling =1;

Query OK, 0 rows affected (0.00 sec)

root@test 08:49:41>select count(*) from test;

+ -- - -+

| count(*) |

+ -- - -+

| 5242880 |

+ -- - -+

1 row in set (1.18 sec)

root@test 08:56:30>show profile cpu,block io for query 1;

+ -- -- -- -- -- � + -- - -+ -- - -+ -- -- -- + -- -- � + -- -- - +

| Status             | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out |

+ -- -- -- -- -- � + -- - -+ -- - -+ -- -- -- + -- -- � + -- -- - +

| starting            | 0.000035 | 0.000000 |  0.000000 |      0 |       0 |

| checking query cache for query | 0.000051 | 0.000000 |  0.000000 |      0 |       0 |

| Opening tables         | 0.000014 | 0.000000 |  0.000000 |      0 |       0 |

| System lock          | 0.000005 | 0.000000 |  0.000000 |      0 |       0 |

| Table lock           | 0.000010 | 0.000000 |  0.000000 |      0 |       0 |

| init              | 0.000015 | 0.000000 |  0.000000 |      0 |       0 |

| optimizing           | 0.000007 | 0.000000 |  0.000000 |      0 |       0 |

| statistics           | 0.000015 | 0.000000 |  0.000000 |      0 |       0 |

| preparing           | 0.000012 | 0.000000 |  0.000000 |      0 |       0 |

| executing           | 0.000007 | 0.000000 |  0.000000 |      0 |       0 |

| Sending data          | 1.178452 | 1.177821 |  0.000000 |      0 |       0 |

| end              | 0.000016 | 0.000000 |  0.000000 |      0 |       0 |

| query end           | 0.000005 | 0.000000 |  0.000000 |      0 |       0 |

| freeing items         | 0.000040 | 0.000000 |  0.000000 |      0 |       0 |

| logging slow query       | 0.000002 | 0.000000 |  0.000000 |      0 |       0 |

| logging slow query       | 0.000086 | 0.000000 |  0.000000 |      0 |       0 |

| cleaning up          | 0.000006 | 0.000000 |  0.000000 |      0 |       0 |

+ -- -- -- -- -- � + -- - -+ -- - -+ -- -- -- + -- -- � + -- -- - +

Comparison of performance:


5.1.45

root@test 08:57:18>set profiling =1;

Query OK, 0 rows affected (0.00 sec)

root@test 08:57:21>select count(*) from test;

+ -- - -+

| count(*) |

+ -- - -+

| 5242880 |

+ -- - -+

1 row in set (1.30 sec)

root@test 08:57:27>show profile cpu,block io for query 1;

+ -- -- -- -- -- � + -- - -+ -- - -+ -- -- -- + -- -- � + -- -- - +

| Status             | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out |

+ -- -- -- -- -- � + -- - -+ -- - -+ -- -- -- + -- -- � + -- -- - +

| starting            | 0.000026 | 0.000000 |  0.000000 |      0 |       0 |

| checking query cache for query | 0.000041 | 0.000000 |  0.000000 |      0 |       0 |

| Opening tables         | 0.000014 | 0.000000 |  0.000000 |      0 |       0 |

| System lock          | 0.000005 | 0.000000 |  0.000000 |      0 |       0 |

| Table lock           | 0.000008 | 0.000000 |  0.000000 |      0 |       0 |

| init              | 0.000015 | 0.000000 |  0.000000 |      0 |       0 |

| optimizing           | 0.000006 | 0.000000 |  0.000000 |      0 |       0 |

| statistics           | 0.000014 | 0.000000 |  0.000000 |      0 |       0 |

| preparing           | 0.000012 | 0.000000 |  0.000000 |      0 |       0 |

| executing           | 0.000007 | 0.000000 |  0.000000 |      0 |       0 |

| Sending data          | 1.294178 | 1.293803 |  0.000000 |      0 |       0 |

| end              | 0.000016 | 0.000000 |  0.000000 |      0 |       0 |

| query end           | 0.000004 | 0.000000 |  0.000000 |      0 |       0 |

| freeing items         | 0.000040 | 0.000000 |  0.001000 |      0 |       0 |

| logging slow query       | 0.000002 | 0.000000 |  0.000000 |      0 |       0 |

| logging slow query       | 0.000080 | 0.000000 |  0.000000 |      0 |       0 |

| cleaning up          | 0.000006 | 0.000000 |  0.000000 |      0 |       0 |

+ -- -- -- -- -- � + -- - -+ -- - -+ -- -- -- + -- -- � + -- -- - +

It can be seen from profile above that on Sending data, the difference is still quite obvious. mysql does not need to scan the page block of the whole table, but the shortest index page block of the index key in the table to complete the query, which reduces a lot of unnecessary data.

PS:innodb is a transaction engine, so in addition to storing the row record in the leaf node, there will also be one more record about the transaction information (DB_TRX_ID,DB_ROLL_PTR, etc.), so the single-row length will cost about 20 bytes. The most intuitive method is to convert myisam to innodb, and the storage space will increase significantly. When the primary table is t(id,name,pk(id)) and the second-level index ind_name(name,id), it is easy to confuse. Even if there are only two fields, the first index is still larger than the second index (you can observe the internal structure of the table through innodb_table_monitor). When querying all id, the optimizer still chooses the second index ind_name.


Related articles: