In depth analysis of explain for MySQL query optimization

  • 2020-05-17 06:48:20
  • OfStack

It is also useful to consider the EXPLAIN keyword when analyzing query performance. The EXPLAIN keyword 1 is generally placed before the SELECT query statement to describe how MySQL performs the query operation and the number of rows that MySQL needs to execute to successfully return the result set. explain can help us analyze select statements, let us know the reasons for the inefficiency of the query, so as to improve our query, so that the query optimizer can work better.

1. How does the MySQL query optimizer work
The MySQL query optimizer has several goals, but the main one is to use indexes as much as possible and to eliminate as many rows of data as possible with the strictest indexes. The ultimate goal is to submit the SELECT statement to find the rows, not to exclude them. The reason the optimizer tries to exclude rows is that the faster it can exclude rows, the faster it can find rows that match the condition. Queries can be executed faster if the most rigorous testing can be done first.
Each output row of EXPLAIN provides information about one table, and each row includes the following columns:

item instructions id The sequence number of the query in the execution plan selected by MySQL Query Optimizer. Represents the order in which the select clause or operation table is executed in the query. The higher the id value, the higher the priority, and the earlier it is executed. id is the same, executed from top to bottom.


select_type query type instructions SIMPLE Simple select queries, union and subqueries are not used PRIMARY The outermost select query UNION The second or subsequent select query in UNION does not depend on the result set of the external query DEPENDENT UNION The second or subsequent select query in UNION depends on the result set of the external query SUBQUERY The first select query in the subquery does not depend on the result set of the outer query DEPENDENT SUBQUERY The first select query in the subquery depends on the result set of the external query DERIVED Used when there are subqueries in the from clause. MySQL executes these subqueries recursively, placing the results in the temporary table. UNCACHEABLE SUBQUERY Subqueries for which the result set cannot be cached must be re-evaluated for each row of the outer query. UNCACHEABLE UNION The second or subsequent select query in UNION is a subquery that is not cacheable


item instructions table The table referenced by the output row


type important item that shows the type used for the connection, sorted from best to worst instructions system Table has only 1 row (= system table). This is a special case of the const connection type. const const is used when PRIMARY KEY is compared with constant values. System is used when the table being queried has only 1 row. eq_ref const is used when PRIMARY KEY is compared with constant values. System is used when the table of the query has only 1 row. ref Instead of selecting a single row based on the keyword, the join may find multiple qualified rows. It's called ref because the index is compared to some reference value. The reference value is either a constant or the result of a multi-table query from one table. ref_or_null Just like ref, but MySQL must find the null entry in the result of the initial lookup, and then do the lookup twice. index_merge Indicates that index merge optimization is used. unique_subquery Use this type in some IN queries instead of the regular ref:value IN (SELECT primary_key single_table some_expr) index_subquery Use this class type in some IN queries, similar to unique_subquery, but for a non-exclusive index: value IN (SELECT key_column FROM single_table WHERE some_expr) range Retrieve only rows from a given range, using 1 index to select rows. The key column shows which index is used. When using =, < > , > , > =, < , < =, IS NULL, < = > , BETWEEN or IN operators, range is used when comparing keyword columns with constants. index Full table scan, except that the table is scanned in index order instead of rows. The main advantage is that you avoid sorting, but it's still very expensive. all In the worst case, scan the entire table from start to finish.



item instructions possible_keys Indicate which indexes MySQL can use in the table to aid the query. If it is empty, no indexes are available.


item instructions key MySQL actually selects the index to use from possible_key. If it is NULL, no index is used. Very rarely,MYSQL will choose an index that is not optimized enough. In this case, you can use USE INDEX (indexname) in an SELECT statement to force 1 index or IGNORE INDEX(indexname) to force MYSQL to ignore the index


item instructions key_len The length of the index used. The shorter the length, the better, without losing accuracy.


item instructions ref Shows which column of the index is used


item instructions rows The number of rows that MYSQL considers necessary to check to return the requested data


item instructions rows The number of rows that MYSQL considers necessary to check to return the requested data


The presence of the following two items in extra means that MYSQL cannot use the index at all, and efficiency will be significantly affected. This should be optimized as much as possible.

extra item instructions Using filesort Means that MySQL will use an external index to sort the results, rather than reading them in index order from the table. It may be sorted in memory or on disk. The sort operation in MySQL that cannot be done with an index is called "file sort" Using temporary Means that MySQL USES temporary tables when sorting query results. Common in sorting order by and grouping queries group by.

Here is an example to illustrate the use of explain.
Let's start with a table:


CREATE TABLE IF NOT EXISTS `article` (`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`author_id` int(10) unsigned NOT NULL,
`category_id` int(10) unsigned NOT NULL,
`views` int(10) unsigned NOT NULL,
`comments` int(10) unsigned NOT NULL,
`title` varbinary(255) NOT NULL,
`content` text NOT NULL,
PRIMARY KEY (`id`)
);

A few more data:

INSERT INTO `article`
(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES
(1, 1, 1, 1, '1', '1'),
(2, 2, 2, 2, '2', '2'),
(1, 1, 3, 3, '3', '3');

Requirements:
If category_id is 1 and comments is greater than 1, article_id has the most views.
Check it out first:

EXPLAIN
SELECT author_id
FROM `article`
WHERE category_id = 1 AND comments > 1
ORDER BY views DESC
LIMIT 1\G

Look at some of the output:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
        Extra: Using where; Using filesort
1 row in set (0.00 sec)

Obviously,type is ALL, which is the worst case. Using filesort also appears in Extra, which is also the worst case scenario. Optimization is a must.

Well, then the easiest solution is to index. Ok, let's try 1. In the condition of the query, category_id,comments and views are used after where. So it's easiest to have a joint index.


ALTER TABLE `article` ADD INDEX x ( `category_id` , `comments`, `views` );

The result was a definite improvement, but it was still bad:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article
         type: range
possible_keys: x
          key: x
      key_len: 8
          ref: NULL
         rows: 1
        Extra: Using where; Using filesort
1 row in set (0.00 sec)

type becomes range, which is tolerable. But using Using filesort in extra is still unacceptable. But we already have an index, so why not? This is because, according to the way the BTree index works, category_id is ordered first, comments is ordered again if the same category_id is encountered, views is ordered again if the same comments is encountered. When the comments field is in the middle of the union index, comments is in the middle > The first condition is that there is a range value (so-called range), and MySQL cannot use the index to retrieve the views part after, that is, the index after the range type query field is invalid.
So we need to discard comments and delete the old index:

 DROP INDEX x ON article;

Then create a new index:

ALTER TABLE `article` ADD INDEX y ( `category_id` , `views` ) ;

Then run the query:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article
         type: ref
possible_keys: y
          key: y
      key_len: 4
          ref: const
         rows: 1
        Extra: Using where
1 row in set (0.00 sec)

As you can see,type becomes ref, Using filesort in Extra also disappears, and the result is very good.
Consider another example of a multi-table query.
First, define three tables, class and room.

CREATE TABLE IF NOT EXISTS `class` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`card` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE IF NOT EXISTS `book` (
`bookid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`card` int(10) unsigned NOT NULL,
PRIMARY KEY (`bookid`)
);
CREATE TABLE IF NOT EXISTS `phone` (
`phoneid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`card` int(10) unsigned NOT NULL,
PRIMARY KEY (`phoneid`)
) engine = innodb;

They then insert large amounts of data separately. php script to insert data:

INSERT INTO `article`
(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES
(1, 1, 1, 1, '1', '1'),
(2, 2, 2, 2, '2', '2'),
(1, 1, 3, 3, '3', '3');
0
Then look at a left join query:

INSERT INTO `article`
(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES
(1, 1, 1, 1, '1', '1'),
(2, 2, 2, 2, '2', '2'),
(1, 1, 3, 3, '3', '3');
1
The analysis results are as follows:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: class
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20000
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: book
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20000
        Extra: 
2 rows in set (0.00 sec)

Obviously, we need to optimize the second ALL.
Try creating an index:

ALTER TABLE `book` ADD INDEX y ( `card`);


INSERT INTO `article`
(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES
(1, 1, 1, 1, '1', '1'),
(2, 2, 2, 2, '2', '2'),
(1, 1, 3, 3, '3', '3');
4
You can see that type in line 2 has been changed to ref, and rows has also been changed to 1741*18, which is an obvious optimization. This is determined by the left join feature. The LEFT JOIN condition is used to determine how to search for rows from the right table. There must be 1 on the left, so the right is our key point. 1 must be indexed.
Delete old index:

INSERT INTO `article`
(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES
(1, 1, 1, 1, '1', '1'),
(2, 2, 2, 2, '2', '2'),
(1, 1, 3, 3, '3', '3');
5
Create a new index.

INSERT INTO `article`
(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES
(1, 1, 1, 1, '1', '1'),
(2, 2, 2, 2, '2', '2'),
(1, 1, 3, 3, '3', '3');
6
The results of

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: class
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20000
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: book
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20000
        Extra: 
2 rows in set (0.00 sec)

Basically no change.
Then look at a right join query:

INSERT INTO `article`
(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES
(1, 1, 1, 1, '1', '1'),
(2, 2, 2, 2, '2', '2'),
(1, 1, 3, 3, '3', '3');
8
The analysis results are as follows:

INSERT INTO `article`
(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES
(1, 1, 1, 1, '1', '1'),
(2, 2, 2, 2, '2', '2'),
(1, 1, 3, 3, '3', '3');
9
The optimization is obvious. This is because the RIGHT JOIN condition is used to determine how to search for rows from the left table. There must be 1 on the right, so the left is our key point. 1 must be indexed.
Delete old index:

EXPLAIN
SELECT author_id
FROM `article`
WHERE category_id = 1 AND comments > 1
ORDER BY views DESC
LIMIT 1\G
0
Create a new index.

ALTER TABLE `book` ADD INDEX y ( `card`);

The results of

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: class
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20000
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: book
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20000
        Extra: 
2 rows in set (0.00 sec)

Basically no change.

Finally, let's look at inner join:


explain select * from class inner join book on class.card = book.card;

Results:

INSERT INTO `article`
(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES
(1, 1, 1, 1, '1', '1'),
(2, 2, 2, 2, '2', '2'),
(1, 1, 3, 3, '3', '3');
9
Delete old index:

INSERT INTO `article`
(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES
(1, 1, 1, 1, '1', '1'),
(2, 2, 2, 2, '2', '2'),
(1, 1, 3, 3, '3', '3');
5
The results of

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: class
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20000
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: book
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20000
        Extra: 
2 rows in set (0.00 sec)

Create a new index.

INSERT INTO `article`
(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES
(1, 1, 1, 1, '1', '1'),
(2, 2, 2, 2, '2', '2'),
(1, 1, 3, 3, '3', '3');
6
The results of

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: class
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20000
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: book
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20000
        Extra: 
2 rows in set (0.00 sec)

To sum up,inner join and left join are similar, and both need to optimize the right table. While right join needs to optimize the left table.

Let's look at the 3 table query example again

Add 1 new index:


ALTER TABLE `phone` ADD INDEX z ( `card`);
ALTER TABLE `book` ADD INDEX y ( `card`);


*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
        Extra: Using where; Using filesort
1 row in set (0.00 sec)
0

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
        Extra: Using where; Using filesort
1 row in set (0.00 sec)
1
The type of the last two lines is ref and the total rows is well optimized, which has a good effect.

The explain syntax in MySql helps us to rewrite queries, optimize table structure and index Settings, and maximize query efficiency. Of course, indexes are expensive to build and maintain on a large scale, often taking a long time and a large amount of space, which can be more expensive if indexes are built on different column combinations. Therefore, the index is best set in fields that require frequent queries.


Related articles: