MySQL million page optimization of Mysql million page fast

  • 2020-05-13 03:42:59
  • OfStack

Here's one of my lessons

When I first learned SQL, I would write it like this
 
SELECT * FROM table ORDER BY id LIMIT 1000, 10; 

But when the data is in the millions, it will be slow to write
 
SELECT * FROM table ORDER BY id LIMIT 1000000, 10; 

Maybe a few seconds

A lot of optimization on the web goes like this
 
SELECT * FROM table WHERE id >= (SELECT id FROM table LIMIT 1000000, 1) LIMIT 10; 

Yes, the speed has been increased to 0.x seconds. It looks ok
But it's not perfect!

The following sentence is perfect!
 
SELECT * FROM table WHERE id BETWEEN 1000000 AND 1000010; 

Five to ten times faster than that

In addition, if you need to query id if it is not a contiguous 1 segment, the best method is to first find id and then use in to query
 
SELECT * FROM table WHERE id IN(10000, 100000, 1000000...); 


Share one more point
When querying field 1 for a long string, the table is designed to add an extra field to that field, such as the field for storing the url
When querying, do not directly query the string, it is inefficient, should look up the string crc32 or md5

How to optimize Mysql for fast pagination at the 10 million level

Limit 1,111 does have some performance issues with large data, and where id is used in various ways > = XX, so it might be faster to use the id number of the index. By: jack
Slow paging solution for Mysql limit (Mysql limit optimization for quick paging of millions to tens of millions of records)

How high can MySql perform? After using php for more than half a year, I really started to think about this problem from the day before yesterday. There have been pain and despair, now full of confidence! MySql this database is absolutely suitable for dba level ace to play, 1 do 11 point 10 thousand news small system how to write, xx framework can achieve rapid development. But the data volume to 100 thousand, million to 10 million, his performance can still be so high? 1 point small error, may cause the entire system to rewrite, or even the system can not run normally! All right, less bullshit. Use facts, see examples:
The data table collect (id, title,info,vtype) has four fields, of which title is fixed length, info is text, id is gradual, vtype is tinyint, vtype is index. This is a simple model of a basic news system. Now fill it with data, fill it with 100,000 stories.
Finally, collect is 100,000 records, and the database table occupies 1.6G hard disk. OK, see the following sql statement:

select id,title from collect limit 1000,10; Soon; Basically 0.01 seconds on OK and look at the following
select id,title from collect limit 90000,10; Starting at 90,000 pages, the result?
8-9 seconds to finish, what's wrong with my god?? In fact, to optimize this data, the Internet to find the answer. Take a look at the following statement:
select id from collect order by id limit limit 90000,10; Very soon, 0.04 seconds on OK. Why is that? Because using the id primary key for indexing is certainly fast. The online changes are:
select id,title from collect where id > =(select id from collect order by id limit 90000,1) limit 10;
This is the result of using id as an index. But the problem is so complicated that a little bit, it's over. Look at the following statement
select id from collect where vtype=1 order by id limit 900,10; Very slow, 8-9 seconds!
Here I believe many people will be like me 1, have a breakdown feeling! Is vtype indexed yet? How slow? vtype is good to index, you directly select id from collect where vtype=1 limit 1000,10; Is very fast, basically 0.05 seconds, but increased by 90 times, from 90,000, that is the speed of 0.05*90=4.5 seconds. And test results of 8-9 seconds to an order of magnitude. From here on some people put forward the idea of sub-table, which is the same as dis #cuz forum. The idea is as follows:
Create an index table: t (id,title,vtype) and set it to a fixed length, then page through it. Page out the results and go to collect to find info. Is it feasible? You'll see that experimentally.
One hundred thousand pieces were recorded in t(id,title,vtype), and the data table size was about 20M. with
select id t where vtype=1 order by id limit 90000,10; Very soon. Basically, you can do it in 0.1-0.2 seconds. Why is that? I guess it's because there's so much data on collect that it takes a long way to page through it. limit is all about the size of the data table. In fact, this is still a full table scan, only because the data volume is small, only 100,000 fast. OK, do a crazy experiment, add a million pieces, test the performance.
Add 10 times more data, and immediately the t table is over 200 M, with a fixed length. Or just the query statement, the time is 0.1-0.2 seconds to complete! Sub-table performance ok? Wrong! Because our limit is still 90,000, so it's fast. Let's get a big one. Let's start with 900,000
select id from t where vtype=1 order by id limit limit 900000,10; Look at the results. The time is 1-2 seconds!
why?? Minutes of the time or so long, very depressed! Some people said that the fixed length would improve the performance of limit. At first, I also thought that since the length of a record is fixed, mysql should be able to calculate the position of 900,000, right? However, we overestimate the intelligence of mysql, which is not a commercial database. It turns out that fixed length and non-fixed length have little influence on limit. No wonder some people say that discuz will be slow after 1 million records. I believe this is true. It has something to do with database design!
Can't MySQL break the 1 million limit?? At a million pages you really hit the limit??
The answer is: NO!! The reason why we can't break 1 million is because we can't design mysql. The following non-fractional table method, to a crazy test! 1 table to handle 1 million records, and 10G database, how to page quickly!
Ok, so our test goes back to the collect table, and the test starts with the conclusion: 300,000 data, it's ok to use the sub-table method, more than 300,000 data will be slow and you can't stand it! Of course, if you use the score table + me method, it is absolutely perfect. But after using me this kind of method, need not cent table also can be solved perfectly!
Answer: compound index! Once when designing the mysql index, I happened to find that the index name could be arbitrarily chosen and several fields could be selected. What's the use of this? select id from collect by id limit 90000,10; So fast because I'm out of the index, but if I add where, I'm out of the index. Add an index like search(vtype,id) with the idea of trying. Then test
select id from collect where vtype=1 limit 90000,10; Very fast! Done in 0.04 seconds!
Retesting: select id,title from collect where vtype=1 limit 90000,10; Very sorry, 8-9 seconds, did not walk search index!
Test again: search(id,vtype) or select id, sorry, 0.5 seconds.
To sum up: if you have the where condition and want to access limit, you must design an index, put where in the first bit, limit in the second bit, and only select primary key!
It solves the paging problem perfectly. You can quickly return id and hopefully optimize limit. By this logic, the millions of limit should be finished in 0.0x seconds. It seems that the optimization and indexing of mysql statements is very important!
Ok, so back to the original question, how do you quickly and successfully apply this to development? If I use composite queries, my lightweight framework is useless. You have to write your own pagination strings, how much trouble is that? Here's another example, and here's the idea:
select * from collect where id in (9000,12,50,7000); Zero seconds!
The index of mygod, mysql is also valid for in statements! It seems that the Internet is wrong to say that in cannot use the index!
With this in mind, it's easy to apply to lightweight frameworks:

The code is as follows:
 
$db=dblink(); 
$db->pagesize=20; 
$sql= " select id from collect where vtype=$vtype " ; 
$db->execute($sql); 
$strpage=$db->strpage(); // Save the paging string in a temporary variable for easy output  
while($rs=$db->fetch_array()){ 
$strid.=$rs['id'].','; 
} 
$strid=substr($strid,0,strlen($strid)-1); // construct id string  
$db->pagesize=0; // It is critical that you empty the page without unlogging the class so that it is only needed 1 Secondary database connection, no need to open;  
$db->execute( " select id,title,url,sTime,gTime,vtype,tag from collect where id in ($strid) " ); 
<?php while($rs=$db->fetch_array()): ?> 
<tr> 
<td> <?php echo $rs['id'];?></td> 
<td> <?php echo $rs['url'];?></td> 
<td> <?php echo $rs['sTime'];?></td> 
<td> <?php echo $rs['gTime'];?></td> 
<td> <?php echo $rs['vtype'];?></td> 
<td> <a href= " ?act=show&id=<?php echo $rs['id'];?> "  target= " _blank " ><?php echo $rs['title'];?></a></td> 
<td> <?php echo $rs['tag'];?></td> 
</tr> 
<?php endwhile; ?> 
</table> 
<?php 
echo $strpage; 

The idea is simple: 1) find id by optimizing the index, and spell it into a string like "123,9000012,000". 2) find the result in the second query.
Small index +1 changes make mysql capable of millions or even tens of millions of pages!
Through the example here, I reflected on one point: for large systems, PHP should never use a framework, especially one that you can't even see sql statements! Because at first my lightweight framework almost collapsed! It is only suitable for the rapid development of small applications. For ERP,OA, large websites, data layer and logic layer, there is no framework. If the programmer loses control of the sql statement, the risk of the project will increase exponentially! Especially when mysql is used, mysql 1 must need professional dba to give full play to its best performance. One index can make a performance difference of thousands of times!
PS: after the actual test, to 1 million data, 1.6 million data, 15G table, 190M index, even walk the index, limit 0.49 seconds. So paging is best kept to yourself

Related articles: