mysql optimizes five methods for limit query statements

  • 2020-06-19 11:55:04
  • OfStack

The paging of mysql is relatively simple, only limit offset and length are needed to obtain data, but when offset and length are relatively large, mysql significantly degrades in performance

1. Subquery optimization method

First find the first data, and then id greater than or equal to this data is the data to get
Disadvantages: The data must be continuous. It can be said that where condition cannot exist. where condition will filter the data and cause the data to lose continuity.


mysql> set profiling=1;
Query OK, 0 rows affected (0.00 sec) mysql> select count(*) from Member;
+----------+
| count(*) |
+----------+
|   169566 |
+----------+
1 row in set (0.00 sec) mysql> pager grep !~-
PAGER set to 'grep !~-' mysql> select * from Member limit 10, 100;
100 rows in set (0.00 sec) mysql> select * from Member where MemberID >= (select MemberID from Member limit 10,1) limit 100;
100 rows in set (0.00 sec) mysql> select * from Member limit 1000, 100;
100 rows in set (0.01 sec) mysql> select * from Member where MemberID >= (select MemberID from Member limit 1000,1) limit 100;
100 rows in set (0.00 sec) mysql> select * from Member limit 100000, 100;
100 rows in set (0.10 sec) mysql> select * from Member where MemberID >= (select MemberID from Member limit 100000,1) limit 100;
100 rows in set (0.02 sec) mysql> nopager
PAGER set to stdout
mysql> show profiles\G
*************************** 1. row ***************************
Query_ID: 1
Duration: 0.00003300
   Query: select count(*) from Member *************************** 2. row ***************************
Query_ID: 2
Duration: 0.00167000
   Query: select * from Member limit 10, 100
*************************** 3. row ***************************
Query_ID: 3
Duration: 0.00112400
   Query: select * from Member where MemberID >= (select MemberID from Member limit 10,1) limit 100 *************************** 4. row ***************************
Query_ID: 4
Duration: 0.00263200
   Query: select * from Member limit 1000, 100
*************************** 5. row ***************************
Query_ID: 5
Duration: 0.00134000
   Query: select * from Member where MemberID >= (select MemberID from Member limit 1000,1) limit 100 *************************** 6. row ***************************
Query_ID: 6
Duration: 0.09956700
   Query: select * from Member limit 100000, 100
*************************** 7. row ***************************
Query_ID: 7
Duration: 0.02447700
   Query: select * from Member where MemberID >= (select MemberID from Member limit 100000,1) limit 100

From the results, we can know that the use of subquery method can effectively improve the performance when the offset is more than 1000.

2. Inversion list optimization method

Inversion lists are similar to indexing, where pages are maintained in one table and then data is retrieved through efficient joins

Disadvantages: only suitable for fixed number of data, data cannot be deleted, difficult to maintain page table

Inversion lists :(and inversion indexes are the algorithmic building blocks of search engines)

An inversion list is an inversion index in memory that can append inversion records. Inversion lists are mini inversion indexes.

A temporary inverted file is an inverted index stored on disk as a file that cannot be appended to an inverted record. Temporary inverted files are medium-sized inverted indexes.

A final inverted file is an inverted index obtained by merging temporary inverted files stored on disk as files. The final inverted file is the larger inverted index.

Inverted index is an abstract concept, and inverted table, temporary inverted file and final inverted file are three different forms of inverted index.

3. Reverse search optimization method

When the offset exceeds one and a half records, the offset is reversed by sorting first

Disadvantages: order by optimization is troublesome, it needs to add indexes, indexes affect the modification efficiency of data, and it needs to know the total number of records, the deviation is greater than 1 half of the data

limit migration algorithm:
Forward lookup: (current page-1) * page length
Reverse lookup: Total record - current page * page length

Let's experiment and see how it works

Total records: 1,628,775
Records per page: 40
Total pages: 1,628,775/40 = 40720
Middle pages: 40720/2 = 20360

On page 21000
Looking forward for SQL:

SELECT * FROM `abc` WHERE `BatchID` = 123 LIMIT 839960, 40  

Time: 1.8696 seconds

Reverse lookup sql:

SELECT * FROM `abc` WHERE `BatchID` = 123 ORDER BY InputDate DESC LIMIT 788775, 40 

Time: 1.8336 seconds

On page 30000
Forward lookup SQL:

SELECT * FROM `abc` WHERE `BatchID` = 123 LIMIT 1199960, 40  

Time: 2.6493 seconds

Reverse search sql:

SELECT * FROM `abc` WHERE `BatchID` = 123 ORDER BY InputDate DESC LIMIT 428775, 40  

Time: 1.0035 seconds

Note that the reverse lookup results in descending ORDER desc, and that InputDate is the insertion time of the record. You can also use the primary key to co-index, but it is inconvenient.

4.limit limited optimization method

Limit the limit offset below a certain number. Any more than that is equal to no data, and I think dba of alibaba said they do that

5. Index only

The limit of MySQL works by reading n records first, then discarding the previous n and reading what m wants, so the larger THE n, the worse the performance.
SQL before optimization:

SELECT * FROM member ORDER BY last_active LIMIT 50,5

The optimized SQL:
SELECT * FROM member INNER JOIN (SELECT member_id FROM member ORDER BY last_active LIMIT 50, 5) USING (member_id)

The difference is that the pre-optimization SQL required more I/O waste because the index was read first, then the data was read, and then the unnecessary rows were discarded. The optimized SQL(subquery) read-only index (Cover index) will do, and then read the required columns from member_id.

Conclusion: limit has many optimization restrictions, so it can only be analyzed on a case-by-case basis. Pages after that, few people read the basic...


Related articles: