An in depth analysis of MySQL index

2021-12-13 17:34:38
OfStack

Preface

We know that index selection is the work of the optimizer stage, but the optimizer is not omnipotent, and it may choose the wrong index to use. 1 The optimizer selects indexes based on the number of rows scanned, whether to sort, and whether to use temporary tables.

Analyzing sql with explain

explain is a good self-test command, and using explain frequently helps us write more reasonable sql statements and build more reasonable indexes:


mysql> explain select * from t where (a between 1 and 1000) and (b between 50000 and 100000) order by b limit 1;
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra               |
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------+
| 1 | SIMPLE   | t   | NULL    | range | a,b      | b  | 5    | NULL | 50223 |   1.00 | Using index condition; Using where |
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------+
1 row in set, 1 warning (0.01 sec)

Among them:

table field: Indicates which table is about;
type fields: system, const, eq_reg, ref, range, index, all. 1 Generally speaking, it is necessary to reach range level or above;

system, const: You can change the query variables into constants, such as id=1; id is the primary key or only 1 key;
eq_ref: Access the index and return the data of a single 1 row, which usually appears during the connection. The index used in the query is the primary key or the only 1 key;
ref: Access the index and return some worthwhile data (possibly multiple rows), which usually occurs when = is used;
range: Returns row information within 1 range using an index, such as using > , < , between
index: Scan the whole table in the order of indexes. Although there are indexes that do not need sorting, scan the whole table;
all: Full table scan

key Field: The index actually used;

key_len field: The length of the index used (the shorter the length, the better without loss of precision);

ref field: Shows which 1 column of the index is used;

rows field: The number of rows of data that MySQL considers necessary for retrieval;

Extra field: Additional information for queries, mainly of the following types:

using index: Indexing used
using where: where condition used
using tmporary: Use a temporary table to process the current query
using filesort: Additional sorting is used, such as no index in order field
range checked for eache record (index map: N): Index not available
using index for group-by: The table name can find all the data needed for grouping in the index without querying the actual table

1 encounter Using temporary and Using filesort to find a way to optimize 1, because there is no index.

MySQL How to Calculate the Number of Rows to Retrieve

In practice, the number of scanned lines counted by MySQL is not an accurate value, and sometimes it even varies greatly, while the number of scanned lines is calculated based on the cardinality of the index.

In MySQL, the index cardinality is obtained by sampling statistics: the system selects N data pages by default, counts the average values of different data pages, and then multiplies them by the number of index pages to obtain the cardinality, and MySQL will trigger the operation of redoing index statistics when the number of changed data rows exceeds 1/M.

In MySQL, there are two ways to store index statistics, which can be selected by setting the innodb_stats_persistent parameter:

When set to on, statistics are persisted. At this point, the default N is 20 and M is 10.

When set to off, it means that statistics are only stored in memory. At this point, the default N is 8 and M is 16.

1 Generally speaking, There is not much difference between the cardinality statistics and the real number of rows, But when it comes to data tables with frequent deletion and addition of data, There may be a situation where the data table has 100,000 pieces of data but the base statistics have 200,000 pieces of data. This may be MVCC at work, Because of the transactional support of InnoDB for MySQL, If multiple data versions need to be maintained, it is possible that some transactions are not over yet, and the deleted data space cannot be released due to the use of deleted data for a long time, and the newly added data opens up new space, which may lead to errors in the number of data pages in cardinality statistics and large errors.

A good fix is to execute the analyze table table name, which is used to recount index information.

The index is wrong. What are we going to do

When we build the necessary indexes correctly, in most cases, the optimizer will not choose the wrong index. What should we do when we encounter the wrong index?

1. Force an index using force index.

2. Change the thinking, and the sql statement under Optimization 1 may use the index used.

3. Create more suitable indexes or delete unreasonable indexes that are misused. Sometimes, it may be that this index is redundant and not optimal, and the optimizer just uses it.

Summarize