Four methods and performance comparisons for MySQL to query random data

  • 2020-06-15 10:23:17
  • OfStack

The following are the advantages and disadvantages of each of the four schemes.
Plan 1:
SELECT * FROM `table` ORDER BY RAND() LIMIT 0,1;

The problem with this approach is that it is very slow. The reason is that MySQL creates a zero-time table to hold all result sets, gives each result a random index, and then sorts and returns.
There are a few ways to make it faster.
The basic idea is to get a random number and then use that random number to get the specified row.
Since all rows have a 1-only id, we will just take the random number between the minimum and maximum id and get id to be the row. To make this method work even when id is discontinuous, we use it in the final query. > = instead of =.
To get the minimum and maximum id for the entire table, we use two aggregate functions, MAX() and MIN(). These two methods return the maximum and minimum values of the specified group. In this case, this group is all the id field values in our table.
Scheme 2:
<?php
$range_result = mysql_query( " SELECT MAX(`id`) AS max_id , MIN(`id`) AS min_id FROM `table` ");
$range_row = mysql_fetch_object( $range_result );
$random = mt_rand( $range_row->min_id , $range_row->max_id );
$result = mysql_query( " SELECT * FROM `table` WHERE `id` >= $random LIMIT 0,1 ");

As we mentioned earlier, this method limits every row of the table to a value of only 1 id. So what if that's not the case?
The following scenario USES the LIMIT clause of MySQL. LIMIT receives two parameter values. The first parameter specifies the offset of the first row of the returned result, and the second parameter specifies the maximum number of rows to return the result. The offset specifies that line 1 is 0 instead of 1.
To calculate the offset at line 1, we used the RAND() method of MySQL to generate a random number from 0 to 1. We then multiply this number by the number of table records we got inverted using the COUNT() method. Since the argument to LIMIT must be of type int and not float, we use FLOOR() to handle the result. FLOOR() evaluates to less than the maximum value of the expression. The final code looks like this:
Solution 3:
<?php
$offset_result = mysql_query( " SELECT FLOOR(RAND() * COUNT(*)) AS `offset` FROM `table` ");
$offset_row = mysql_fetch_object( $offset_result );
$offset = $offset_row->offset;
$result = mysql_query( " SELECT * FROM `table` LIMIT $offset, 1 " );

After MySQL 4.1, we can combine the above two methods with subquery:
Solution 4:
SELECT * FROM `table` WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table` ) ORDER BY id LIMIT 1;

This scenario has the same weakness as scenario 2 and is only valid for tables with only 1id values.
Remember the reason we were looking for an alternative to choosing random rows in the first place, speed! So, how do these scenarios compare in execution time? I won't point out hardware and software configurations or give specific Numbers. The general result is this:
The slowest is Solution 1 (we assume it takes 100% of the time).
Plan 2 USES 79%
Option 3:13%
Option 4 is 16%
so, plan 3 wins!

Related articles: