mysql optimizes the method of taking random data slowly

  • 2020-06-03 08:34:40
  • OfStack

The day before yesterday because of work need I to 5 W records from one database randomly selected a few records, here is my direct use mysql rand by function to direct, thousands of records it doesn't matter, but if by tens of thousands of pairs of feeling to be a few seconds, this is very slow, below to check out this site and all 1 mysql take random data optimization process.
In many cases, MySQL needs to obtain random data. For example, one record should be extracted randomly from tablename table.

SELECT * FROM tablename ORDER BY RAND() LIMIT 1
 

However, later I checked the official MYSQL manual, and the prompt for RAND() roughly means that you cannot use the RAND() function in the ORDER BY clause, as this would cause the data column to be scanned multiple times. However, in version 3.23 of MYSQL, randomization is still possible via ORDER BY RAND().
It was only on test 1 that the efficiency was very low. A more than 150,000 pieces of library, query 5 pieces of data, unexpectedly to more than 8 seconds. Looking at the official manual, it also says that rand() in the ORDER BY clause will be executed many times, which is naturally inefficient and inefficient.
You cannot use a column with RAND() values in an ORDER BY clause, because ORDER BY would evaluate the column multiple times.
Search Google, the Internet is basically query max(id) * rand() to randomly get data.

SELECT * 
FROM 'table' AS t1 JOIN (SELECT ROUND(RAND() * (SELECT MAX(id) FROM 'table')) AS id) AS t2 
WHERE t1.id >= t2.id 
ORDER BY t1.id ASC LIMIT 5;
 

But this produces five records in a row. The solution can only be one query at a time and five queries at a time. Even so, it's worth it, because with 150,000 tables, the query takes less than 0.01 seconds.
The following statement is from JOIN, which is used on the mysql forum:

SELECT * 
FROM 'table'
WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM 'table' ) 
ORDER BY id LIMIT 1;
 

I have tested it for 1 time and it takes 0.5 seconds. The speed is not bad, but there is still a big gap between me and the following sentence. There's always something wrong.
So I rewrote the statement 1 time.

SELECT * FROM 'table'
WHERE id >= (SELECT floor(RAND() * (SELECT MAX(id) FROM 'table')))  
ORDER BY id LIMIT 1;
 

This time, the efficiency is improved again, the query time is only 0.01 seconds.
Finally, add MIN(id) to complete the sentence 1. When I first tested it, it was because I didn't add the MIN(id) judgment that I always found the first few rows in the table half the time.
The complete query statement is:

SELECT * FROM 'table' WHERE id >= ( 
SELECT floor( 
RAND() * ((SELECT MAX(id) FROM 'table')-(SELECT MIN(id) FROM 'table')) 
+ (SELECT MIN(id) FROM 'table') 
) 
)  
ORDER BY id LIMIT 1;

SELECT * FROM 'table' AS t1 JOIN ( 
SELECT ROUND( 
#  The minimum value  + (1  to   The difference between minimum and maximum ) 
RAND() * ( 
(SELECT MAX(id) FROM 'table')-(SELECT MIN(id) FROM 'table') 
) 
+(SELECT MIN(id) FROM 'table') 
) AS id 
) AS t2 
WHERE t1.id >= t2.id 
ORDER BY t1.id LIMIT 1;
 

Finally, the two statements were queried 10 times in php, with the former taking 0.147433 seconds and the latter 0.015130 seconds. It seems that using the syntax of JOIN is much more efficient than using functions directly in WHERE.

SELECT * 
FROM `table` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `table`)  �  (SELECT MIN(id) FROM `table`)) + (SELECT MIN(id) FROM `table`)) AS id) AS t2 
WHERE t1.id >= t2.id 
ORDER BY t1.id LIMIT 10;
 

This is my own choice, from the previous 5 seconds to surface 0.0003 seconds time do not need to find 10 records.

Related articles: