Optimization scheme of MySQL million level data paging query

2021-09-20 21:48:19
OfStack

When there are tens of thousands of records in the table that needs to be queried from the database, all the results of a one-time query will become slow, especially with the increase of data volume, so paging query is needed. There are also many methods and optimization points for database paging query. Let's briefly talk about some methods I know.

Preparatory work

In order to test some of the optimizations listed below, the following is a description of the existing table.

Table name: order_history
Description: Order history table for a business
Main fields: unsigned int id, tinyint (4) int type
Fields: There are 37 fields in Table 1, excluding large arrays such as text, the maximum is varchar (500), and id fields are indexed and incremented.
Data volume: 5709294
MySQL version: 5.7. 16
It is not easy to find a million-level test table offline. If you need to test yourself, you can write shell script or other inserted data for testing.
For the following sql, the environment in which all statements are executed has not changed. Here are the basic test results:

select count(*) from orders_history;

Return result: 5709294

The three query times are:


8903 ms
8323 ms
8401 ms

1 general paged query

A paging query like limit can be realized using a simple limit clause. The limit clause is declared as follows:

SELECT * FROM table LIMIT [offset,] rows | rows OFFSET offset

The LIMIT clause can be used to specify the number of records returned by the SELECT statement. Attention should be paid to the following points:

The first parameter specifies the offset of the first returned record row
The second parameter specifies the maximum number of record rows returned

If only one parameter is given: it means that the maximum number of record rows is returned
The second parameter-1 indicates that all record rows are retrieved from a certain offset to the end of the recordset
The offset of the initial record line is 0 (instead of 1)

The following is an application example:

select * from orders_history where type=8 limit 1000,10;

This statement will query the table orders_history for 10 pieces of data after the 1000, that is, 1001 to 10010 pieces of data.

The records in the data table are sorted by default with the primary key (1 is generally id), and the above result is equivalent to:

select * from orders_history where type=8 order by id limit 10000,10;

The three query times are:


3040 ms
3063 ms
3018 ms

For this query method, the following tests the impact of query record volume on time:


select * from orders_history where type=8 limit 10000,1;
select * from orders_history where type=8 limit 10000,10;
select * from orders_history where type=8 limit 10000,100;
select * from orders_history where type=8 limit 10000,1000;
select * from orders_history where type=8 limit 10000,10000;

The time of 3 inquiries is as follows:


 Query 1 Record: 3072ms 3092ms 3002ms
 Query 10 Record: 3081ms 3077ms 3032ms
 Query 100 Record: 3118ms 3200ms 3128ms
 Query 1000 Record: 3412ms 3468ms 3394ms
 Query 10000 Record: 3749ms 3802ms 3696ms

In addition, I have done 10 queries. From the query time, it can be basically determined that when the query records are less than 100, there is basically no difference in the query time. With the increasing query records, the time spent will be more and more.

Testing for query offsets:


select * from orders_history where type=8 limit 100,100;
select * from orders_history where type=8 limit 1000,100;
select * from orders_history where type=8 limit 10000,100;
select * from orders_history where type=8 limit 100000,100;
select * from orders_history where type=8 limit 1000000,100;

The time of 3 inquiries is as follows:


 Query 100 Offset: 25ms 24ms 24ms
 Query 1000 Offset: 78ms 76ms 77ms
 Query 10000 Offset: 3092ms 3212ms 3128ms
 Query 100000 Offset: 3878ms 3812ms 3798ms
 Query 1000000 Offset: 14608ms 14062ms 14700ms

With the increase of query offset, especially when the query offset is greater than 100,000, the query time increases dramatically.

This paging query starts with the first record in the database, so the later the query is, the slower the query speed is, and the more data is queried, the slower the overall query speed will be.

Optimize with subqueries

This method first locates the offset id, and then queries it later. This method is suitable for the case where id is incremented.


select * from orders_history where type=8 limit 100000,1;

select id from orders_history where type=8 limit 100000,1;

select * from orders_history where type=8 and 
id>=(select id from orders_history where type=8 limit 100000,1) 
limit 100;

select * from orders_history where type=8 limit 100000,100;

The query time of the four statements is as follows:


 No. 1 1 Bar statement: 3674ms
 No. 1 2 Bar statement: 1315ms
 No. 1 3 Bar statement: 1327ms
 No. 1 4 Bar statement: 3710ms

Attention should be paid to the above query:

Compare Statement 1 with Statement 2: Using select id instead of select * Speed increased by 3 times
Compare statement 2 with statement 3: What is the speed difference of 10 milliseconds
Compare Statement 3 with Statement 4: The query speed of Statement 3 has increased by 3 times thanks to the increase in select id speed
Compared with the original 1-like query method, this method will be several times faster.

Use id to qualify optimization

In this way, it is assumed that the id of the data table is continuously increasing, so we can calculate the range of id of the query according to the number of pages and records of the query, and we can use id between and to query:


select * from orders_history where type=2 and id between 1000000 and 1000100 limit 100;

Query time: 15ms 12ms 9ms

This query method can greatly optimize the query speed, which can be basically completed within a few 10 milliseconds. The limitation is that it can only be used when id is clearly known, but when creating a table, the basic id field will be added, which brings a lot of traversal for paging queries.

There is another way to write it:

select * from orders_history where id >= 1000001 limit 100;

Of course, you can also use in to query, which is often used when multiple tables are associated, and use id sets of other table queries to query:


select * from orders_history where id in (select order_id from trade_2 where goods = 'pen') limit 100;

Note the way in is queried: Some versions of mysql do not support the use of limit in the in clause.

Tuning with temporary tables

This method is no longer a query optimization, and it is mentioned here.

For the problem of using id to limit optimization, it is necessary for id to increase continuously. However, in some scenarios, such as using history table or data missing problem, we can consider using temporary storage table to record paged id and paged id for in query. This can greatly improve the traditional paging query speed, especially when the amount of data is tens of millions.

id Description for Data Sheets

In general, when creating tables in the database, id increment fields are forced to be added for every table, which is convenient for query.

If the amount of data such as order library is very large, 1 will be divided into libraries and tables. It is not recommended to use the database's id as the only one identity at this time. Instead, you should use a distributed, highly concurrent only one id generator to generate it, and use another field in the data table to store this only one identity.

Using a range query to locate the id (or index) first and then the index to locate the data can improve the query speed several times. That is, select id first, then select *;