mysql's method of randomly querying several pieces of data

  • 2020-05-30 21:13:02
  • OfStack

Query the five non-repeating data in mysql, using the following:


SELECT * FROM `table` ORDER BY RAND() LIMIT 5

That's it. But the real test 1 found that this was very inefficient. A library of more than 150,000 pieces of data, query 5 pieces of data, it takes more than 8 seconds
Search for Google, and almost all of them search for max(id) * rand() to get data at random.


SELECT * 
FROM `table` AS t1 JOIN (SELECT ROUND(RAND() * (SELECT MAX(id) FROM `table`)) AS id) AS t2 
WHERE t1.id >= t2.id 
ORDER BY t1.id ASC LIMIT 5;

But this will result in five consecutive records. The solution is to query one item at a time, five times. Even that is worth it, because with 150,000 tables, the query takes less than 0.01 seconds.
The above statement is JOIN, which is used in the forums of mysql


SELECT * 
FROM `table` 
WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table` ) 
ORDER BY id LIMIT 1;

I tested 1 time, need 0.5 seconds, the speed is also good, but with the statement is still a big gap. I always feel something is wrong.
So I just rewrote 1.


SELECT * FROM `table` 
WHERE id >= (SELECT floor(RAND() * (SELECT MAX(id) FROM `table`)))  
ORDER BY id LIMIT 1;

This time, the efficiency is improved again, the query time is only 0.01 seconds
Finally, complete the sentence 1, and add MIN(id). At the beginning of the test, because I did not add MIN(id), the result was that the first few rows in the table were always queried for a half of the time.
The full query statement is:


SELECT * FROM `table` 
WHERE id >= (SELECT floor( RAND() * ((SELECT MAX(id) FROM `table`)-(SELECT MIN(id) FROM `table`)) + (SELECT MIN(id) FROM `table`)))  
ORDER BY id LIMIT 1;
SELECT * 
FROM `table` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `table`)-(SELECT MIN(id) FROM `table`))+(SELECT MIN(id) FROM `table`)) AS id) AS t2 
WHERE t1.id >= t2.id 
ORDER BY t1.id LIMIT 1;

Finally, query these two statements 10 times respectively,
The former takes 0.147433 seconds
The latter takes 0.015130 seconds
It seems that the syntax of JOIN is much more efficient than using functions directly in WHERE.


Related articles: