Optimization Analysis of Limit Query in Mysql Optimization Skills

  • 2021-12-11 09:25:38
  • OfStack

Preface

Paging is a common business requirement in real business. Then we will use limit query. When we use Limit query, the efficiency is very high when the data is relatively small or only the first part of the data is queried. However, when the amount of data is large, or when the number of offset queries is relatively large, such as limit 100000, 20 efficiency is often unsatisfactory. A common way is to cooperate Limit with order by. If order by has an index to users, the efficiency is usually relatively good.

In this case, the simplest query is to use the override index to query some required columns. This effect is very good

Such as the following one


mysql> SELECT * FROM student LIMIT 1000000,1;
+---------+------------+------------+------------+-------+---------------------+
| id  | first_name | last_name | created_at | score | updated_at   |
+---------+------------+------------+------------+-------+---------------------+
| 1000001 | kF9DxBgnUi | yLXnPSHJpH | 2019-07-11 | 97 | 2019-07-11 14:29:59 | |
+---------+------------+------------+------------+-------+---------------------+
1 rows in set (0.31 sec)

You can see the time


mysql> EXPLAIN SELECT score,first_name FROM student ORDER BY created_at LIMIT 1000000,20 \G
*************************** 1. row ***************************
   id: 1
 select_type: SIMPLE
  table: student
 partitions: NULL
   type: index
possible_keys: NULL
   key: time_sorce_name
  key_len: 69
   ref: NULL
   rows: 1000001
  filtered: 100.00
  Extra: Using index
1 row in set, 1 warning (0.00 sec)

mysql>

In this case, the columns of the query use the overlay index, and the number of scanned rows will be reduced a lot, but the effect is not very satisfactory, but if there are other queries, such queries will become very slow.

For example, we add the column last_name.

As follows


mysql> SELECT score,first_name,last_name FROM student ORDER BY created_at LIMIT 1000000,1;
+-------+------------+------------+
| score | first_name | last_name |
+-------+------------+------------+
| 86 | knKsV2g2fY | WB5qJeLZuk |
+-------+------------+------------+
1 row in set (4.81 sec)

mysql>

This query takes more than 4 seconds to execute. Through analysis, we can see that there is no way to use indexes for this query


mysql> explain SELECT score,first_name,last_name FROM student ORDER BY created_at LIMIT 1000000,1\G
*************************** 1. row ***************************
   id: 1
 select_type: SIMPLE
  table: student
 partitions: NULL
   type: ALL
possible_keys: NULL
   key: NULL
  key_len: NULL
   ref: NULL
   rows: 6489221
  filtered: 100.00
  Extra: Using filesort
1 row in set, 1 warning (0.00 sec)

mysql>

So we now modify the query as follows


mysql> SELECT student.score,student.first_name FROM student INNER JOIN (SELECT id FROM student ORDER BY created_at LIMIT 1000000,1 ) AS temp USING(id);
+-------+------------+
| score | first_name |
+-------+------------+
| 15 | 2QWZ  |
+-------+------------+
1 row in set (0.18 sec)

mysql> EXPLAIN SELECT student.score,student.first_name,last_name FROM student INNER JOIN (SELECT id FROM student ORDER BY created_at LIMIT 1000000,1 ) AS temp USING(id);
+----+-------------+------------+------------+--------+---------------+-----------------+---------+---------+---------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys | key    | key_len | ref  | rows | filtered | Extra  |
+----+-------------+------------+------------+--------+---------------+-----------------+---------+---------+---------+----------+-------------+
| 1 | PRIMARY  | <derived2> | NULL  | ALL | NULL   | NULL   | NULL | NULL | 1000001 | 100.00 | NULL  |
| 1 | PRIMARY  | student | NULL  | eq_ref | PRIMARY  | PRIMARY   | 4  | temp.id |  1 | 100.00 | NULL  |
| 2 | DERIVED  | student | NULL  | index | NULL   | time_sorce_name | 69  | NULL | 1000001 | 100.00 | Using index |
+----+-------------+------------+------------+--------+---------------+-----------------+---------+---------+---------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

From the analysis results, we can see that only 1000011 data records were queried at this time. Why is there such a change? This is called delay association, which first returns the required primary key by using overlay index query, and then obtains the required data according to the primary key association with the original table, thus reducing the number of rows to be scanned as much as possible.

In some specific occasions, there is actually another optimization scheme. For example, to get the latest insert records. Then we can record the primary key ID (last_id) of the last record in the last query.
Then the query can be changed to


SELECT score,first_name,last_name,id FROM student WHERE id>=last_id ORDER BY id ASC LIMIT 1

For example, last_id=1000000, then the query starts at 1000000. In this scenario, offset performance will be very good no matter how big the data is.

Summarize


Related articles: