How to request a random row in SQL

Question

How can I request a random row  or as close to truly random as is possible  in pure SQL

User · Answer

For SQL Server 2005 and above  extending  GreyPanther s answer for the cases when num value has not continuous values  This works too for cases when we have not evenly distributed datasets and when num value is not a number but a unique identifier   WITH CTE Table  SelRow  num value   AS        SELECT ROW NUMBER   OVER ORDER BY ID  AS SelRow  num value FROM table     SELECT   FROM table Where num value          SELECT TOP 1 num value FROM CTE Table  WHERE SelRow  gt   RAND      SELECT MAX SelRow  FROM CTE Table

User · Answer

select r id  r name from table AS r INNER JOIN select CEIL RAND      select MAX id  from table   as id  as r1 ON r id  gt   r1 id ORDER BY r id ASC LIMIT 1  This will require a lesser computation time

User · Answer

For Firebird   Select FIRST 1 column from table ORDER BY RAND

User · Answer

Most of the solutions here aim to avoid sorting  but they still need to make a sequential scan over a table   There is also a way to avoid the sequential scan by switching to index scan  If you know the index value of your random row you can get the result almost instantially  The problem is - how to guess an index value   The following solution works on PostgreSQL 8 4   explain analyze select   from cms refs where rec id in     select  random    select last value from cms refs rec id seq    bigint     from generate series 1 10     limit 1    I above solution you guess 10 various random index values from range 0     last value of id     The number 10 is arbitrary - you may use 100 or 1000 as it  amazingly  doesn t have a big impact on the response time    There is also one problem - if you have sparse ids you might miss  The solution is to have a backup plan    In this case an pure old order by random   query  When combined id looks like this   explain analyze select   from cms refs where rec id in       select  random    select last value from cms refs rec id seq    bigint       from generate series 1 10       union all  select   from cms refs order by random   limit 1      limit 1    Not the union ALL clause  In this case if the first part returns any data the second one is NEVER executed

User · Answer

A simple and efficient way from http   akinas com pages en blog mysql random row   SET  i    SELECT FLOOR RAND     COUNT     FROM table   PREPARE get stmt FROM  SELECT   FROM table LIMIT    1   EXECUTE get stmt USING  i

User · Answer

I have to agree with CD-MaN  Using  ORDER BY RAND    will work nicely for small tables or when you do your SELECT only a few times   I also use the  num value    RAND          technique  and if I really want to have random results I have a special  random  column in the table that I update once a day or so  That single UPDATE run will take some time  especially because you ll have to have an index on that column   but it s much faster than creating random numbers for every row each time the select is run

User · Answer

ORDER BY NEWID     takes 7 4 milliseconds  WHERE num value  gt   RAND      SELECT MAX num value  FROM table    takes 0 0065 milliseconds   I will definitely go with latter method

User · Answer

Be careful because TableSample doesn t actually return a random sample of rows  It directs your query to look at a random sample of the 8KB pages that make up your row  Then  your query is executed against the data contained in these pages  Because of how data may be grouped on these pages  insertion order  etc   this could lead to data that isn t actually a random sample    See  http   www mssqltips com tip asp tip 1308  This MSDN page for TableSample includes an example of how to generate an actualy random sample of data   http   msdn microsoft com en-us library ms189108 aspx

User · Answer

For MySQL to get random record   SELECT name   FROM random AS r1 JOIN         SELECT  RAND                           SELECT MAX id                          FROM random   AS id          AS r2  WHERE r1 id  gt   r2 id  ORDER BY r1 id ASC  LIMIT 1   More detail http   jan kneschke de projects mysql order-by-rand

User · Answer

I don t know how efficient this is  but I ve used it before   SELECT TOP 1   FROM MyTable ORDER BY newid     Because GUIDs are pretty random  the ordering means you get a random row

User · Answer

You didn t say which server you re using  In older versions of SQL Server  you can use this   select top 1   from mytable order by newid     In SQL Server 2005 and up  you can use TABLESAMPLE to get a random sample that s repeatable   SELECT FirstName  LastName FROM Contact  TABLESAMPLE  1 ROWS

User · Answer

You may also try using new id   function   Just write a your query and use order by new id   function  It quite random

User · Answer

See this post  SQL to Select a random row from a database table  It goes through methods for doing this in MySQL  PostgreSQL  Microsoft SQL Server  IBM DB2 and Oracle  the following is copied from that link    Select a random row with MySQL   SELECT column FROM table ORDER BY RAND   LIMIT 1   Select a random row with PostgreSQL   SELECT column FROM table ORDER BY RANDOM   LIMIT 1   Select a random row with Microsoft SQL Server   SELECT TOP 1 column FROM table ORDER BY NEWID     Select a random row with IBM DB2  SELECT column  RAND   as IDX  FROM table  ORDER BY IDX FETCH FIRST 1 ROWS ONLY   Select a random record with Oracle   SELECT column FROM   SELECT column FROM table ORDER BY dbms random value   WHERE rownum   1

User · Answer

In SQL Server you can combine TABLESAMPLE with NEWID   to get pretty good randomness and still have speed   This is especially useful if you really only want 1  or a small number  of rows   SELECT TOP 1   FROM  table   TABLESAMPLE  500 ROWS   ORDER BY NEWID

User · Answer

Random function from the sql could help  Also if you would like to limit to just one row  just add that in the end   SELECT column FROM table ORDER BY RAND   LIMIT 1

User · Answer

For SQL Server  newid   order by will work  but will be very expensive for large result sets because it has to generate an id for every row  and then sort them   TABLESAMPLE   is good from a performance standpoint  but you will get clumping of results  all rows on a page will be returned    For a better performing true random sample  the best way is to filter out rows randomly  I found the following code sample in the SQL Server Books Online article Limiting Results Sets by Using TABLESAMPLE      If you really want a random sample of   individual rows  modify your query to   filter out rows randomly  instead of   using TABLESAMPLE  For example  the   following query uses the NEWID   function to return approximately one   percent of the rows of the   Sales SalesOrderDetail table   SELECT   FROM Sales SalesOrderDetail WHERE 0 01  gt   CAST CHECKSUM NEWID   SalesOrderID   amp  0x7fffffff AS float                  CAST  0x7fffffff AS int        The SalesOrderID column is included in   the CHECKSUM expression so that   NEWID   evaluates once per row to   achieve sampling on a per-row basis    The expression CAST CHECKSUM NEWID      SalesOrderID   amp  0x7fffffff AS float     CAST  0x7fffffff AS int  evaluates to   a random float value between 0 and 1    When run against a table with 1 000 000 rows  here are my results   SET STATISTICS TIME ON SET STATISTICS IO ON     newid      rows returned  10000    logical reads  3359    CPU time  3312 ms    elapsed time   3359 ms    SELECT TOP 1 PERCENT Number FROM Numbers ORDER BY newid       TABLESAMPLE    rows returned  9269  varies     logical reads  32    CPU time  0 ms    elapsed time  5 ms    SELECT Number FROM Numbers TABLESAMPLE  1 PERCENT      Filter    rows returned  9994  varies     logical reads  3359    CPU time  641 ms    elapsed time  627 ms        SELECT Number FROM Numbers WHERE 0 01  gt   CAST CHECKSUM NEWID    Number   amp  0x7fffffff AS float                   CAST  0x7fffffff AS int   SET STATISTICS IO OFF SET STATISTICS TIME OFF   If you can get away with using TABLESAMPLE  it will give you the best performance  Otherwise use the newid   filter method  newid   order by should be last resort if you have a large result set

User · Answer

In MSSQL  tested on 11 0 5569  using   SELECT TOP 100   FROM employee ORDER BY CRYPT GEN RANDOM 10    is significantly faster than  SELECT TOP 100   FROM employee ORDER BY NEWID

User · Answer

There is better solution for Oracle instead of using dbms random value  while it requires full scan to order rows by dbms random value and it is quite slow for large tables   Use this instead   SELECT   FROM employee sample 1  WHERE rownum 1

User · Answer

As pointed out in  BillKarwin s comment on  cnu s answer     When combining with a LIMIT  I ve found that it performs much better  at least with PostgreSQL 9 1  to JOIN with a random ordering rather than to directly order the actual rows  e g   SELECT   FROM tbl post AS t JOIN     JOIN   SELECT id  CAST -2147483648   RANDOM   AS integer  AS rand        FROM tbl post        WHERE create time    1349928000        r ON r id   t id WHERE create time    1349928000 AND     ORDER BY r rand LIMIT 100   Just make sure that the  r  generates a  rand  value for every possible key value in the complex query which is joined with it but still limit the number of rows of  r  where possible   The CAST as Integer is especially helpful for PostgreSQL 9 2 which has specific sort optimisation for integer and single precision floating types

User · Answer

Solutions like Jeremies   SELECT   FROM table ORDER BY RAND   LIMIT 1   work  but they need a sequential scan of all the table  because the random value associated with each row needs to be calculated - so that the smallest one can be determined   which can be quite slow for even medium sized tables  My recommendation would be to use some kind of indexed numeric column  many tables have these as their primary keys   and then write something like   SELECT   FROM table WHERE num value  gt   RAND            SELECT MAX  num value   FROM table    ORDER BY num value LIMIT 1   This works in logarithmic time  regardless of the table size  if num value is indexed  One caveat  this assumes that num value is equally distributed in the range 0  MAX num value   If your dataset strongly deviates from this assumption  you will get skewed results  some rows will appear more often than others

User · Answer

For SQL Server 2005 and 2008  if we want a random sample of individual rows  from Books Online    SELECT   FROM Sales SalesOrderDetail WHERE 0 01  gt   CAST CHECKSUM NEWID    SalesOrderID   amp  0x7fffffff AS float    CAST  0x7fffffff AS int

User · Answer

If possible  use stored statements to avoid the inefficiency of both indexes on RND   and creating a record number field    PREPARE RandomRecord FROM  SELECT   FROM table LIMIT   1   SET  n FLOOR RAND    SELECT COUNT    FROM table    EXECUTE RandomRecord USING  n

User · Answer

Insted of using RAND    as it is not encouraged  you may simply get max ID   Max    SELECT MAX ID  FROM TABLE    get a random between 1  Max   My Generated Random   My Generated Random   rand in your programming lang function 1  Max     and then run this SQL   SELECT ID FROM TABLE WHERE ID  gt   My Generated Random ORDER BY ID LIMIT 1   Note that it will check for any rows which Ids are EQUAL or HIGHER than chosen value  It s also possible to hunt for the row down in the table  and get an equal or lower ID than the My Generated Random  then modify the query like this   SELECT ID FROM TABLE WHERE ID  lt   My Generated Random ORDER BY ID DESC LIMIT 1

User · Answer

It seems that many of the ideas listed still use ordering  However  if you use a temporary table  you are able to assign a random index  like many of the solutions have suggested   and then grab the first one that is greater than an arbitrary number between 0 and 1   For example  for DB2    WITH TEMP AS   SELECT COMLUMN  RAND   AS IDX FROM TABLE  SELECT COLUMN FROM TABLE WHERE IDX  gt   5 FETCH FIRST 1 ROW ONLY

User · Answer

For SQL Server and needing  quot a single random row quot    If not needing a true sampling  generate a random value  0  max rows  and use the ORDER BY  OFFSET  FETCH from SQL Server 2012   This is very fast if the COUNT and ORDER BY are over appropriate indexes - such that the data is  already sorted  along the query lines  If these operations are covered it s a quick request and does not suffer from the horrid scalability of using ORDER BY NEWID   or similar  Obviously  this approach won t scale well on a non-indexed HEAP table  declare  rows int select  rows   count 1  from t  -- Other issues if row counts in the bigint range   -- This is also not  true random   although such is likely not required  declare  skip int   convert int   rows   rand     select t   from t order by t id -- Make sure this is clustered PK or IX UCL axis  offset   skip  rows fetch first 1 row only  Make sure that the appropriate transaction isolation levels are used and or account for 0 results   For SQL Server and needing a  quot general row sample quot  approach   Note  This is an adaptation of the answer as found on a SQL Server specific question about fetching a sample of rows  It has been tailored for context  While a general sampling approach should be used with caution here  it s still potentially useful information in context of other answers  and the repetitious suggestions of non-scaling and or questionable implementations   Such a sampling approach is less efficient than the first code shown and is error-prone if the goal is to find a  quot single random row quot    Here is an updated and improved form of sampling a percentage of rows  It is based on the same concept of some other answers that use CHECKSUM   BINARY CHECKSUM and modulus   It is relatively fast over huge data sets and can be efficiently used in with derived queries  Millions of pre-filtered rows can be sampled in seconds with no tempdb usage and  if aligned with the rest of the query  the overhead is often minimal   Does not suffer from CHECKSUM      BINARY CHECKSUM    issues with runs of data  When using the CHECKSUM    approach  the rows can be selected in  quot chunks quot  and not  quot random quot  at all  This is because CHECKSUM prefers speed over distribution   Results in a stable repeatable row selection and can be trivially changed to produce different rows on subsequent query executions  Approaches that use NEWID   can never be stable repeatable   Does not use ORDER BY NEWID   of the entire input set  as ordering can become a significant bottleneck with large input sets  Avoiding unnecessary sorting also reduces memory and tempdb usage   Does not use TABLESAMPLE and thus works with a WHERE pre-filter    Here is the gist  See this answer for additional details and notes  Na  ve try  declare  sample percent decimal 7  4  -- Looking at this value should be an indicator of why a -- general sampling approach can be error-prone to select 1 row  select  sample percent   100 0   count 1  from t  -- BAD  -- When choosing appropriate sample percent of  quot approximately 1 row quot  -- it is very reasonable to expect 0 rows  which definitely fails the ask  -- If choosing a larger sample size the distribution is heavily skewed forward  -- and is very much NOT  true random   select top 1     t   from t where 1 1     and   -- sample          sample percent   100         or abs              convert bigint  hashbytes  SHA1   convert varbinary 32   t rowguid                 1000   100   lt   1000    sample percent         This can be largely remedied by a hybrid query  by mixing sampling and ORDER BY selection from the much smaller sample set  This limits the sorting operation to the sample size  not the size of the original table  -- Sample  quot approximately 1000 rows quot  from the table  -- dealing with some edge-cases  declare  rows int select  rows   count 1  from t  declare  sample size int   1000 declare  sample percent decimal 7  4    case     when  rows  lt   1000 then 100                              -- not enough rows     when  100 0    sample size    rows   lt  0 0001 then 0 0001 -- min sample percent     else 100 0    sample size    rows                        -- everything else     end  -- There is a statistical  quot guarantee quot  of having sampled a limited-yet-non-zero number of rows  -- The limited rows are then sorted randomly before the first is selected  select top 1     t   from t where 1 1     and   -- sample          sample percent   100         or abs              convert bigint  hashbytes  SHA1   convert varbinary 32   t rowguid                 1000   100   lt   1000    sample percent        -- ONLY the sampled rows are ordered  which improves scalability  order by newid

User · Answer

SELECT   FROM table ORDER BY RAND   LIMIT 1

User · Answer

In late  but got here via Google  so for the sake of posterity  I ll add an alternative solution    Another approach is to use TOP twice  with alternating orders  I don t know if it is  pure SQL   because it uses a variable in the TOP  but it works in SQL Server 2008  Here s an example I use against a table of dictionary words  if I want a random word   SELECT TOP 1   word FROM     SELECT TOP  idx      word    FROM     dbo DictionaryAbridged WITH NOLOCK    ORDER BY     word DESC   AS D ORDER BY   word ASC   Of course   idx is some randomly-generated integer that ranges from 1 to COUNT    on the target table  inclusively  If your column is indexed  you ll benefit from it too  Another advantage is that you can use it in a function  since NEWID   is disallowed   Lastly  the above query runs in about 1 10 of the exec time of a NEWID  -type of query on the same table  YYMV

User · Answer

With SQL Server 2012  you can use the OFFSET FETCH query to do this for a single random row  select    from MyTable ORDER BY id OFFSET n ROW FETCH NEXT 1 ROWS ONLY   where id is an identity column  and n is the row you want - calculated as a random number between 0 and count  -1 of the table  offset 0 is the first row after all   This works with holes in the table data  as long as you have an index to work with for the ORDER BY clause  Its also very good for the randomness - as you work that out yourself to pass in but the niggles in other methods are not present  In addition the performance is pretty good  on a smaller dataset it holds up well  though I ve not tried serious performance tests against several million rows

User · Answer

Best way is putting a random value in a new column just for that purpose  and using something like this  pseude code   SQL    randomNo   random   execSql  SELECT TOP 1   FROM MyTable WHERE MyTable Randomness  gt   randomNo     This is the solution employed by the MediaWiki code  Of course  there is some bias against smaller values  but they found that it was sufficient to wrap the random value around to zero when no rows are fetched   newid   solution may require a full table scan so that each row can be assigned a new guid  which will be much less performant   rand   solution may not work at all  i e  with MSSQL  because the function will be evaluated just once  and every row will be assigned the same  random  number

User · Answer

Didn t quite see this variation in the answers yet  I had an additional constraint where I needed  given an initial seed  to select the same set of rows each time   For MS SQL   Minimum example   select top 10 percent   from table name order by rand checksum       Normalized execution time  1 00  NewId   example   select top 10 percent   from table name order by newid     Normalized execution time  1 02  NewId   is insignificantly slower than rand checksum      so you may not want to use it against large record sets   Selection with Initial Seed   declare  seed int set  seed   Year getdate      month getdate       any other initial seed here     select top 10 percent   from table name order by rand checksum      seed     any other math function here      If you need to select the same set given a seed  this seems to work

[sql] How to request a random row in SQL?

Examples related to sql

Examples related to random