MySQL select 10 random rows from 600K rows fast

Question

How can I best write a query that selects 10 rows randomly from a total of 600k

User · Answer

How to select random rows from a table   From here  Select random rows in MySQL  A quick improvement over  table scan  is to use the index to pick up random ids   SELECT   FROM random            SELECT id AS sid         FROM random         ORDER BY RAND            LIMIT 10       tmp WHERE random id   tmp sid

User · Answer

All the best answers have been already posted (mainly those referencing the link http://jan.kneschke.de/projects/mysql/order-by-rand/).

I want to pinpoint another speed-up possibility - caching. Think of why you need to get random rows. Probably you want display some random post or random ad on a website. If you are getting 100 req/s, is it really needed that each visitor gets random rows? Usually it is completely fine to cache these X random rows for 1 second (or even 10 seconds). It doesn't matter if 100 unique visitors in the same 1 second get the same random posts, because the next second another 100 visitors will get different set of posts.

When using this caching you can use also some of the slower solution for getting the random data as it will be fetched from MySQL only once per second regardless of your req/s.

User · Answer

If you want one random record  no matter if there are gapes between ids    PREPARE stmt FROM  SELECT   FROM  table name  LIMIT 1 OFFSET     SET  count    SELECT         FLOOR RAND     COUNT         FROM  table name     EXECUTE stmt USING  count    Source  https   www warpconduit net 2011 03 23 selecting-a-random-record-using-mysql-benchmark-results  comment-1266

User · Answer

I am getting fast queries  around 0 5 seconds  with a slow cpu  selecting 10 random rows in a 400K registers MySQL database non-cached 2Gb size  See here my code  Fast selection of random rows in MySQL  time  microtime float      sql  SELECT COUNT    FROM pages    rquery  BD Ejecutar  sql   list  num records  mysql fetch row  rquery   mysql free result  rquery     sql  quot SELECT id FROM pages WHERE RAND    num records lt 20    ORDER BY RAND   LIMIT 0 10 quot    rquery  BD Ejecutar  sql   while list  id  mysql fetch row  rquery        if  id in   id in   quot   id quot       else  id in  quot  id quot     mysql free result  rquery     sql  quot SELECT id url FROM pages WHERE id IN  id in  quot    rquery  BD Ejecutar  sql   while list  id  url  mysql fetch row  rquery        logger  quot  id   url quot  1     mysql free result  rquery     time  microtime float  - time   logger  quot num records  num records quot  1   logger  quot  id in quot  1   logger  quot Time elapsed   lt b gt  time segundos lt  b gt  quot  1

User · Answer

I Use this query   select floor RAND      SELECT MAX key  FROM table   from table limit 10   query time 0 016s

User · Answer

SELECT column FROM table ORDER BY RAND   LIMIT 10   Not the efficient solution but works

User · Answer

Well if you have no gaps in your keys and they are all numeric you can calculate random numbers and select those lines. but this will probably not be the case.

So one solution would be the following:

SELECT * FROM table WHERE key >= FLOOR(RAND()*MAX(id)) LIMIT 1

which will basically ensure that you get a random number in the range of your keys and then you select the next best which is greater. you have to do this 10 times.

however this is NOT really random because your keys will most likely not be distributed evenly.

It's really a big problem and not easy to solve fulfilling all the requirements, MySQL's rand() is the best you can get if you really want 10 random rows.

There is however another solution which is fast but also has a trade off when it comes to randomness, but may suit you better. Read about it here: How can i optimize MySQL's ORDER BY RAND() function?

Question is how random do you need it to be.

Can you explain a bit more so I can give you a good solution.

For example a company I worked with had a solution where they needed absolute randomness extremely fast. They ended up with pre-populating the database with random values that were selected descending and set to different random values afterwards again.

If you hardly ever update you could also fill an incrementing id so you have no gaps and just can calculate random keys before selecting... It depends on the use case!

User · Answer

I improved the answer  Riedsio had  This is the most efficient query I can find on a large  uniformly distributed table with gaps  tested on getting 1000 random rows from a table that has   2 6B rows      SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max     SELECT MAX id  FROM table     1 as rand  r on id  gt  rand LIMIT 1  UNION  SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION  SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION  SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION  SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION  SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION  SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION  SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION  SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION  SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1    Let me unpack what s going on     max     SELECT MAX id  FROM table    I m calculating and saving the max  For very large tables  there is a slight overhead for calculating MAX id  each time you need a row  SELECT FLOOR rand      max    1 as rand    Gets a random id  SELECT id FROM table INNER JOIN       on id  gt  rand LIMIT 1   This fills in the gaps  Basically if you randomly select a number in the gaps  it will just pick the next id  Assuming the gaps are uniformly distributed  this shouldn t be a problem     Doing the union helps you fit everything into 1 query so you can avoid doing multiple queries  It also lets you save the overhead of calculating MAX id   Depending on your application  this might matter a lot or very little   Note that this gets only the ids and gets them in random order  If you want to do anything more advanced I recommend you do this   SELECT t id  t name -- etc  etc FROM table t INNER JOIN        SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max     SELECT MAX id  FROM table     1 as rand  r on id  gt  rand LIMIT 1  UNION      SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION      SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION      SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION      SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION      SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION      SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION      SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION      SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1  UNION      SELECT id FROM table INNER JOIN  SELECT FLOOR RAND      max    1 as rand  r on id  gt  rand LIMIT 1    x ON x id   t id ORDER BY t id

User · Answer

I ve looked through all of the answers  and I don t think anyone mentions this possibility at all  and I m not sure why  If you want utmost simplicity and speed  at a minor cost  then to me it seems to make sense to store a random number against each row in the DB   Just create an extra column  random number  and set it s default to RAND     Create an index on this column  Then when you want to retrieve a row generate a random number in your code  PHP  Perl  whatever  and compare that to the column  SELECT FROM tbl WHERE random number  gt    random LIMIT 1  I guess although it s very neat for a single row  for ten rows like the OP asked you d have to call it ten separate times  or come up with a clever tweak that escapes me immediately

User · Answer

The following should be fast  unbiased and independent of id column  However it does not guarantee that the number of rows returned will match the number of rows requested   SELECT   FROM t WHERE RAND    lt   SELECT 10   COUNT    FROM t    Explanation  assuming you want 10 rows out of 100 then each row has 1 10 probability of getting SELECTed which could be achieved by WHERE RAND    lt  0 1  This approach does not guarantee 10 rows  but if the query is run enough times the average number of rows per execution will be around 10 and each row in the table will be selected evenly

User · Answer

Simple query that has excellent performance and works with gaps   SELECT   FROM tbl AS t1 JOIN  SELECT id FROM tbl ORDER BY RAND   LIMIT 10  as t2 ON t1 id t2 id   This query on a 200K table takes 0 08s and the normal version  SELECT   FROM tbl ORDER BY RAND   LIMIT 10  takes 0 35s on my machine    This is fast because the sort phase only uses the indexed ID column  You can see this behaviour in the explain   SELECT   FROM tbl ORDER BY RAND   LIMIT 10    SELECT   FROM tbl AS t1 JOIN  SELECT id FROM tbl ORDER BY RAND   LIMIT 10  as t2 ON t1 id t2 id   Weighted Version  https   stackoverflow com a 41577458 893432

User · Answer

I used this http   jan kneschke de projects mysql order-by-rand  posted by Riedsio  i used the case of a stored procedure that returns one or more random values       DROP TEMPORARY TABLE IF EXISTS rands     CREATE TEMPORARY TABLE rands   rand id INT         loop me  LOOP         IF cnt  lt  1 THEN           LEAVE loop me          END IF           INSERT INTO rands            SELECT r1 id              FROM random AS r1 JOIN                    SELECT  RAND                                      SELECT MAX id                                     FROM random   AS id                     AS r2             WHERE r1 id  gt   r2 id             ORDER BY r1 id ASC             LIMIT 1           SET cnt   cnt - 1        END LOOP loop me    In the article he solves the problem of gaps in ids causing not so random results by maintaining a table  using triggers  etc   see the article   I m solving the problem by adding another column to the table  populated with contiguous numbers  starting from 1  edit  this  column is added to the temporary table created by the subquery at runtime  doesn t affect your permanent table       DROP TEMPORARY TABLE IF EXISTS rands     CREATE TEMPORARY TABLE rands   rand id INT         loop me  LOOP         IF cnt  lt  1 THEN           LEAVE loop me          END IF           SET  no gaps id    0           INSERT INTO rands            SELECT r1 id              FROM  SELECT id   no gaps id     no gaps id   1 AS no gaps id FROM random  AS r1 JOIN                    SELECT  RAND                                      SELECT COUNT                                       FROM random   AS id                     AS r2             WHERE r1 no gaps id  gt   r2 id             ORDER BY r1 no gaps id ASC             LIMIT 1           SET cnt   cnt - 1        END LOOP loop me    In the article i can see he went to great lengths to optimize the code  i have no ideea if how much my changes impact the performance but works very well for me

User · Answer

I guess this is the best possible way    SELECT id  id   RAND    AS random no  first name  last name FROM user ORDER BY random no

User · Answer

I think here is a simple and yet faster way, I tested it on the live server in comparison with a few above answer and it was faster.

 SELECT * FROM `table_name` WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table_name` ) ORDER BY id LIMIT 30;

//Took 0.0014secs against a table of 130 rows

SELECT * FROM `table_name` WHERE 1 ORDER BY RAND() LIMIT 30

//Took 0.0042secs against a table of 130 rows

 SELECT name
FROM random AS r1 JOIN
   (SELECT CEIL(RAND() *
                 (SELECT MAX(id)
                    FROM random)) AS id)
    AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 30

//Took 0.0040secs against a table of 130 rows

User · Answer

A great post handling several cases  from simple  to gaps  to non-uniform with gaps   http   jan kneschke de projects mysql order-by-rand   For most general case  here is how you do it   SELECT name   FROM random AS r1 JOIN         SELECT CEIL RAND                           SELECT MAX id                          FROM random   AS id          AS r2  WHERE r1 id  gt   r2 id  ORDER BY r1 id ASC  LIMIT 1   This supposes that the distribution of ids is equal  and that there can be gaps in the id list  See the article for more advanced examples

User · Answer

One way that i find pretty good if there's an autogenerated id is to use the modulo operator '%'. For Example, if you need 10,000 random records out 70,000, you could simplify this by saying you need 1 out of every 7 rows. This can be simplified in this query:

SELECT * FROM 
    table 
WHERE 
    id % 
    FLOOR(
        (SELECT count(1) FROM table) 
        / 10000
    ) = 0;

If the result of dividing target rows by total available is not an integer, you will have some extra rows than what you asked for, so you should add a LIMIT clause to help you trim the result set like this:

SELECT * FROM 
    table 
WHERE 
    id % 
    FLOOR(
        (SELECT count(1) FROM table) 
        / 10000
    ) = 0
LIMIT 10000;

This does require a full scan, but it is faster than ORDER BY RAND, and in my opinion simpler to understand than other options mentioned in this thread. Also if the system that writes to the DB creates sets of rows in batches you might not get such a random result as you where expecting.

User · Answer

This is how I do it   select    from table with 600k rows where rand    lt  10 600000 limit 10   I like it because does not require other tables  it is simple to write  and it is very fast to execute

User · Answer

Here is a game changer that may be helpfully for many;

I have a table with 200k rows, with sequential id's, I needed to pick N random rows, so I opt to generate random values based in the biggest ID in the table, I created this script to find out which is the fastest operation:

logTime();
query("SELECT COUNT(id) FROM tbl");
logTime();
query("SELECT MAX(id) FROM tbl");
logTime();
query("SELECT id FROM tbl ORDER BY id DESC LIMIT 1");
logTime();

The results are:

Count: 36.8418693542479 ms
Max: 0.241041183472 ms
Order: 0.216960906982 ms

Based in this results, order desc is the fastest operation to get the max id,
Here is my answer to the question:

SELECT GROUP_CONCAT(n SEPARATOR ',') g FROM (
    SELECT FLOOR(RAND() * (
        SELECT id FROM tbl ORDER BY id DESC LIMIT 1
    )) n FROM tbl LIMIT 10) a

...
SELECT * FROM tbl WHERE id IN ($result);

FYI: To get 10 random rows from a 200k table, it took me 1.78 ms (including all the operations in the php side)

User · Answer

Another simple solution would be ranking the rows and fetch one of them randomly and with this solution you won't need to have any 'Id' based column in the table.

SELECT d.* FROM (
SELECT  t.*,  @rownum := @rownum + 1 AS rank
FROM mytable AS t,
    (SELECT @rownum := 0) AS r,
    (SELECT @cnt := (SELECT RAND() * (SELECT COUNT(*) FROM mytable))) AS n
) d WHERE rank >= @cnt LIMIT 10;

You can change the limit value as per your need to access as many rows as you want but that would mostly be consecutive values.

However, if you don't want consecutive random values then you can fetch a bigger sample and select randomly from it. something like ...

SELECT * FROM (
SELECT d.* FROM (
    SELECT  c.*,  @rownum := @rownum + 1 AS rank
    FROM buildbrain.`commits` AS c,
        (SELECT @rownum := 0) AS r,
        (SELECT @cnt := (SELECT RAND() * (SELECT COUNT(*) FROM buildbrain.`commits`))) AS rnd
) d 
WHERE rank >= @cnt LIMIT 10000 
) t ORDER BY RAND() LIMIT 10;

User · Answer

Its very simple and single line query   SELECT   FROM Table Name ORDER BY RAND   LIMIT 0 10

User · Answer

This is super fast and is 100  random even if you have gaps    Count the number x of rows that you have available SELECT COUNT    as rows FROM TABLE Pick 10 distinct random numbers a 1 a 2     a 10 between 0 and x Query your rows like this  SELECT   FROM TABLE LIMIT 1 offset a i for i 1     10   I found this hack in the book SQL Antipatterns from Bill Karwin

User · Answer

I needed a query to return a large number of random rows from a rather large table. This is what I came up with. First get the maximum record id:

SELECT MAX(id) FROM table_name;

Then substitute that value into:

SELECT * FROM table_name WHERE id > FLOOR(RAND() * max) LIMIT n;

Where max is the maximum record id in the table and n is the number of rows you want in your result set. The assumption is that there are no gaps in the record id's although I doubt it would affect the result if there were (haven't tried it though). I also created this stored procedure to be more generic; pass in the table name and number of rows to be returned. I'm running MySQL 5.5.38 on Windows 2008, 32GB, dual 3GHz E5450, and on a table with 17,361,264 rows it's fairly consistent at ~.03 sec / ~11 sec to return 1,000,000 rows. (times are from MySQL Workbench 6.1; you could also use CEIL instead of FLOOR in the 2nd select statement depending on your preference)

DELIMITER $$

USE [schema name] $$

DROP PROCEDURE IF EXISTS `random_rows` $$

CREATE PROCEDURE `random_rows`(IN tab_name VARCHAR(64), IN num_rows INT)
BEGIN

SET @t = CONCAT('SET @max=(SELECT MAX(id) FROM ',tab_name,')');
PREPARE stmt FROM @t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

SET @t = CONCAT(
    'SELECT * FROM ',
    tab_name,
    ' WHERE id>FLOOR(RAND()*@max) LIMIT ',
    num_rows);

PREPARE stmt FROM @t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END
$$

then

CALL [schema name].random_rows([table name], n);

User · Answer

Use the below simple query to get random data from a table   SELECT user firstname   COUNT DISTINCT usr fk id  cnt FROM userdetails  GROUP BY usr fk id  ORDER BY cnt ASC   LIMIT 10

User · Answer

If you have just one Read-Request

Combine the answer of @redsio with a temp-table (600K is not that much):

DROP TEMPORARY TABLE IF EXISTS tmp_randorder;
CREATE TABLE tmp_randorder (id int(11) not null auto_increment primary key, data_id int(11));
INSERT INTO tmp_randorder (data_id) select id from datatable;

And then take a version of @redsios Answer:

SELECT dt.*
FROM
       (SELECT (RAND() *
                     (SELECT MAX(id)
                        FROM tmp_randorder)) AS id)
        AS rnd
 INNER JOIN tmp_randorder rndo on rndo.id between rnd.id - 10 and rnd.id + 10
 INNER JOIN datatable AS dt on dt.id = rndo.data_id
 ORDER BY abs(rndo.id - rnd.id)
 LIMIT 1;

If the table is big, you can sieve on the first part:

INSERT INTO tmp_randorder (data_id) select id from datatable where rand() < 0.01;

If you have many read-requests

Version: You could keep the table tmp_randorder persistent, call it datatable_idlist. Recreate that table in certain intervals (day, hour), since it also will get holes. If your table gets really big, you could also refill holes

select l.data_id as whole from datatable_idlist l left join datatable dt on dt.id = l.data_id where dt.id is null;
Version: Give your Dataset a random_sortorder column either directly in datatable or in a persistent extra table datatable_sortorder. Index that column. Generate a Random-Value in your Application (I'll call it $rand).
```
select l.*
from datatable l 
order by abs(random_sortorder - $rand) desc 
limit 1;
```

This solution discriminates the 'edge rows' with the highest and the lowest random_sortorder, so rearrange them in intervals (once a day).

User · Answer

From book   Choose a Random Row Using an Offset Still another technique that avoids problems found in the preceding alternatives is to count the rows in the data set and return a random number between 0 and the count  Then use this number as an offset when querying the data set  rand    quot SELECT ROUND RAND      SELECT COUNT    FROM Bugs   quot    offset    pdo- gt query  rand - gt fetch PDO  FETCH ASSOC    sql    quot SELECT   FROM Bugs LIMIT 1 OFFSET  offset quot    stmt    pdo- gt prepare  sql    stmt- gt execute   offset     rand bug    stmt- gt fetch     Use this solution when you can   t assume contiguous key values and you need to make sure each row has an even chance of being selected

User · Answer

You can easily use a random offset with a limit

PREPARE stm from 'select * from table limit 10 offset ?';
SET @total = (select count(*) from table);
SET @_offset = FLOOR(RAND() * @total);
EXECUTE stm using @_offset;

You can also apply a where clause like so

PREPARE stm from 'select * from table where available=true limit 10 offset ?';
SET @total = (select count(*) from table where available=true);
SET @_offset = FLOOR(RAND() * @total);
EXECUTE stm using @_offset;

Tested on 600,000 rows (700MB) table query execution took ~0.016sec HDD drive.

EDIT: The offset might take a value close to the end of the table, which will result in the select statement returning less rows (or maybe only 1 row), to avoid this we can check the offset again after declaring it, like so

SET @rows_count = 10;
PREPARE stm from "select * from table where available=true limit ? offset ?";
SET @total = (select count(*) from table where available=true);
SET @_offset = FLOOR(RAND() * @total);
SET @_offset = (SELECT IF(@total-@_offset<@rows_count,@_offset-@rows_count,@_offset));
SET @_offset = (SELECT IF(@_offset<0,0,@_offset));
EXECUTE stm using @rows_count,@_offset;

[mysql] MySQL select 10 random rows from 600K rows fast

The answer is

If you have just one Read-Request

If you have many read-requests

You can easily use a random offset with a limit

Tested on 600,000 rows (700MB) table query execution took ~0.016sec HDD drive.

Examples related to mysql

Examples related to sql

Examples related to random

Tags