Optimization Analysis of Limit Query in Mysql Optimization Skills
- 2021-12-11 09:25:38
- OfStack
Preface
Paging is a common business requirement in real business. Then we will use limit query. When we use Limit query, the efficiency is very high when the data is relatively small or only the first part of the data is queried. However, when the amount of data is large, or when the number of offset queries is relatively large, such as limit 100000, 20 efficiency is often unsatisfactory. A common way is to cooperate Limit with order by. If order by has an index to users, the efficiency is usually relatively good.
In this case, the simplest query is to use the override index to query some required columns. This effect is very good
Such as the following one
mysql> SELECT * FROM student LIMIT 1000000,1;
+---------+------------+------------+------------+-------+---------------------+
| id | first_name | last_name | created_at | score | updated_at |
+---------+------------+------------+------------+-------+---------------------+
| 1000001 | kF9DxBgnUi | yLXnPSHJpH | 2019-07-11 | 97 | 2019-07-11 14:29:59 | |
+---------+------------+------------+------------+-------+---------------------+
1 rows in set (0.31 sec)
You can see the time
mysql> EXPLAIN SELECT score,first_name FROM student ORDER BY created_at LIMIT 1000000,20 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: student
partitions: NULL
type: index
possible_keys: NULL
key: time_sorce_name
key_len: 69
ref: NULL
rows: 1000001
filtered: 100.00
Extra: Using index
1 row in set, 1 warning (0.00 sec)
mysql>
In this case, the columns of the query use the overlay index, and the number of scanned rows will be reduced a lot, but the effect is not very satisfactory, but if there are other queries, such queries will become very slow.
For example, we add the column last_name.
As follows
mysql> SELECT score,first_name,last_name FROM student ORDER BY created_at LIMIT 1000000,1;
+-------+------------+------------+
| score | first_name | last_name |
+-------+------------+------------+
| 86 | knKsV2g2fY | WB5qJeLZuk |
+-------+------------+------------+
1 row in set (4.81 sec)
mysql>
This query takes more than 4 seconds to execute. Through analysis, we can see that there is no way to use indexes for this query
mysql> explain SELECT score,first_name,last_name FROM student ORDER BY created_at LIMIT 1000000,1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: student
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 6489221
filtered: 100.00
Extra: Using filesort
1 row in set, 1 warning (0.00 sec)
mysql>
So we now modify the query as follows
mysql> SELECT student.score,student.first_name FROM student INNER JOIN (SELECT id FROM student ORDER BY created_at LIMIT 1000000,1 ) AS temp USING(id);
+-------+------------+
| score | first_name |
+-------+------------+
| 15 | 2QWZ |
+-------+------------+
1 row in set (0.18 sec)
mysql> EXPLAIN SELECT student.score,student.first_name,last_name FROM student INNER JOIN (SELECT id FROM student ORDER BY created_at LIMIT 1000000,1 ) AS temp USING(id);
+----+-------------+------------+------------+--------+---------------+-----------------+---------+---------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+---------------+-----------------+---------+---------+---------+----------+-------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 1000001 | 100.00 | NULL |
| 1 | PRIMARY | student | NULL | eq_ref | PRIMARY | PRIMARY | 4 | temp.id | 1 | 100.00 | NULL |
| 2 | DERIVED | student | NULL | index | NULL | time_sorce_name | 69 | NULL | 1000001 | 100.00 | Using index |
+----+-------------+------------+------------+--------+---------------+-----------------+---------+---------+---------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)
From the analysis results, we can see that only 1000011 data records were queried at this time. Why is there such a change? This is called delay association, which first returns the required primary key by using overlay index query, and then obtains the required data according to the primary key association with the original table, thus reducing the number of rows to be scanned as much as possible.
In some specific occasions, there is actually another optimization scheme. For example, to get the latest insert records. Then we can record the primary key ID (last_id) of the last record in the last query.
Then the query can be changed to
SELECT score,first_name,last_name,id FROM student WHERE id>=last_id ORDER BY id ASC LIMIT 1
For example, last_id=1000000, then the query starts at 1000000. In this scenario, offset performance will be very good no matter how big the data is.
Summarize