Talking about the performance of MySQL paging Limit

  • 2021-12-11 19:19:05
  • OfStack

Paging queries for MySQL are typically implemented through limit. limit accepts 1 or 2 integer type parameters. If it is 2 parameters, the first parameter specifies the offset of the first return record row, and the second parameter is the maximum number of return record rows. The offset of the initial record line is 0. For compatibility with PostgreSQL, limit also supports limit # offset #.

Question:

For small offsets, there is no problem using limit directly to query, but as the amount of data increases, the later paging, the larger the offset of limit statement will be, and the speed will obviously slow down.

Optimization idea: avoid scanning too many records when the amount of data is large

Solution: Paging method of subquery or JOIN paging method. The efficiency of JOIN paging and subquery paging is basically 1 level, and the time consumed is basically 1.

Here's an example. 1 The primary key of MySQL is a self-increasing numeric type, in which case it can be optimized in the following way.

Taking a table of 60,000 pieces of data in a real production environment as an example, compare the query time before and after optimization under 1:


--  Tradition limit File scanning 
[SQL]
SELECT * FROM tableName ORDER BY id LIMIT 50000,2;
 Affected rows : 0
 Time : 0.171s

--  Subquery mode, index scan 
[SQL]
SELECT * FROM tableName
WHERE id >= (SELECT id FROM tableName ORDER BY id LIMIT 50000 , 1)
LIMIT 2;
 Affected rows : 0
 Time : 0.035s

-- JOIN Paging mode 
[SQL]
SELECT * FROM tableName AS t1
JOIN (SELECT id FROM tableName ORDER BY id LIMIT 50000, 1) AS t2
WHERE t1.id <= t2.id ORDER BY t1.id LIMIT 2;
 Affected rows : 0
 Time : 0.036s

You can see that the optimized performance is improved many times.

Optimization principle:

Subqueries are completed on indexes, while ordinary queries are completed on data files. Generally, index files are much smaller than data files, so they are more efficient to operate. Because to take out all the field contents, the first one needs to cross a large number of data blocks and take out the corresponding contents, while the second one basically takes out the corresponding contents after directly positioning according to the index fields, which naturally greatly improves the efficiency. Therefore, the optimization of limit is not to use limit directly, but to obtain id of offset first, and then use limit size directly to obtain data.

In actual projects, paging can be handled in a way similar to policy mode. For example, if there are 100 pieces of data per page, it is judged that if it is within 100 pages, the most basic paging method will be used; If it is greater than 100, the paging method of subquery is used.


Related articles: