MYSQL random selection method and efficiency analysis

  • 2020-05-07 20:34:16
  • OfStack


 How to read randomly from the database 15 Records?  
order by rand() limit 0,15 
 How do I read all records randomly from the database?  
order by rand() 

However, when I checked the official MYSQL manual, the hint for RAND() was that you should not use the RAND() function in the ORDER BY clause because it would cause the data column to be scanned multiple times. In MYSQL 3.23, however, randomization is still possible with ORDER BY RAND().

But the real test 1 found that this is very inefficient. 1 more than 150,000 pieces of library, query 5 pieces of data, it took more than 8 seconds. Looking at the official manual, it also says that rand() is executed multiple times in ORDER BY clause, which is naturally inefficient and inefficient.
You cannot use a column with RAND() values in an ORDER BY clause, because ORDER BY would evaluate the column multiple times.
Search Google, the Internet is basically looking up max(id) * rand() to obtain data at random.
SELECT *
FROM `table` AS t1 JOIN (SELECT ROUND(RAND() * (SELECT MAX(id) FROM `table`)) AS id) AS t2
WHERE t1.id > = t2.id
ORDER BY t1.id ASC LIMIT 5;

But this produces five consecutive records. The solution can only be one query per time, query 5 times. Even that is worth it, because with 150,000 tables, the query takes less than 0.01 seconds.

The following statement is in JOIN, which is used in mysql's forum
SELECT *
FROM `table`
WHERE id > = (SELECT FLOOR( MAX(id) * RAND()) FROM `table` )
ORDER BY id LIMIT 1;

I tested 1, need 0.5 seconds, the speed is also good, but with the statement is still a big gap. I always feel something is wrong.

So I'm going to rewrite this statement by 1.
SELECT * FROM `table`
WHERE id > = (SELECT floor(RAND() * (SELECT MAX(id) FROM `table`)))
ORDER BY id LIMIT 1;

In this case, the efficiency was improved, and the query time was only 0.01 seconds

Finally, add MIN(id). At the beginning of the test, because I did not add MIN(id) judgment, the result is 1 and a half of the time always query the first few rows in the table.
The full query statement is:
SELECT * FROM `table`
WHERE id > = (SELECT floor( RAND() * ((SELECT MAX(id) FROM `table`)-(SELECT MIN(id) FROM `table`)) + (SELECT MIN(id) FROM `table`)))
ORDER BY id LIMIT 1;

SELECT *
FROM `table` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `table`)-(SELECT MIN(id) FROM `table`))+(SELECT MIN(id) FROM `table`)) AS id) AS t2
WHERE t1.id > = t2.id
ORDER BY t1.id LIMIT 1;

Finally, the two statements were queried 10 times respectively in php.
The former took 0.147433 seconds
The latter takes 0.015130 seconds
It seems that using JOIN's syntax is much more efficient than using functions directly in WHERE.

Related articles: