MySQL Order By Rand of efficiency analysis

  • 2020-05-13 03:39:12
  • OfStack

Recently, the random selection implementation method of MYSQL was roughly studied due to the need. For example, to randomly extract a record from the tablename table, you can write it like this: SELECT * FROM tablename ORDER BY RAND() LIMIT 1.

However, later I checked the official MYSQL manual, and the hint for RAND() in the manual basically meant that the RAND() function could not be used in ORDER BY clause, because it would cause the data column to be scanned multiple times. However, in version 3.23 of MYSQL, randomization is still possible through ORDER BY RAND().

But the real test 1 found that this was very inefficient. A library of more than 150,000 pieces of data, query 5 pieces of data, it takes more than 8 seconds. Looking at the official manual, it also says that rand() is executed multiple times in ORDER BY clause, which is naturally inefficient and inefficient.
You cannot use a column with RAND() values in an ORDER BY clause, because ORDER BY would evaluate the column multiple times.
Search for Google, the Internet is basically to query max(id) * rand() to obtain data at random.
 
SELECT * 
FROM `table` AS t1 JOIN (SELECT ROUND(RAND() * (SELECT MAX(id) FROM `table`)) AS id) AS t2 
WHERE t1.id >= t2.id 
ORDER BY t1.id ASC LIMIT 5; 

But this will result in five consecutive records. The solution is to query one item at a time, five times. Even that is worth it, because with 150,000 tables, the query takes less than 0.01 seconds.

The following statement is used on JOIN, which is used on mysql's forums
 
SELECT * 
FROM `table` 
WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table` ) 
ORDER BY id LIMIT 1; 

I tested 1 time, need 0.5 seconds, the speed is also good, but with the statement is still a big gap. I always feel something is wrong.

So I just rewrote 1.
SELECT * FROM `table`
WHERE id > = (SELECT floor(RAND() * (SELECT MAX(id) FROM `table`)))
ORDER BY id LIMIT 1;

This time, the efficiency is improved again, the query time is only 0.01 seconds

Finally, complete the sentence 1, and add MIN(id) judgment. At the beginning of the test, because I did not add MIN(id), the result was that the first few rows in the table were always queried for a half of the time.
The full query statement is:
 
SELECT * FROM `table` 
WHERE id >= (SELECT floor( RAND() * ((SELECT MAX(id) FROM `table`)-(SELECT MIN(id) FROM `table`)) + (SELECT MIN(id) FROM `table`))) 
ORDER BY id LIMIT 1; 

 
SELECT * 
FROM `table` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `table`)-(SELECT MIN(id) FROM `table`))+(SELECT MIN(id) FROM `table`)) AS id) AS t2 
WHERE t1.id >= t2.id 
ORDER BY t1.id LIMIT 1; 

Finally, the two statements were queried 10 times respectively in php.
The former takes 0.147433 seconds
The latter takes 0.015130 seconds
It seems that using JOIN's syntax is much more efficient than using functions directly in WHERE.

Related articles: