[sql] Select a random sample of results from a query result

This question asks about getting a random(ish) sample of records on SQL Server and the answer was to use TABLESAMPLE. Is there an equivalent in Oracle 10?

If there isn't, is there a standard way to get a random sample of results from a query set? For example how can one get 1,000 random rows from a query that will return millions normally?

This question is related to sql oracle random-sample

The answer is


Sample function is used for sample data in ORACLE. So you can try like this:-

SELECT * FROM TABLE_NAME SAMPLE(50);

Here 50 is the percentage of data contained by the table. So if you want 1000 rows from 100000. You can execute a query like:

SELECT * FROM TABLE_NAME SAMPLE(1);

Hope this can help you.


The SAMPLE clause will give you a random sample percentage of all rows in a table.

For example, here we obtain 25% of the rows:

SELECT * FROM emp SAMPLE(25)

The following SQL (using one of the analytical functions) will give you a random sample of a specific number of each occurrence of a particular value (similar to a GROUP BY) in a table.

Here we sample 10 of each:

SELECT * FROM (
SELECT job, sal, ROW_NUMBER()
OVER (
PARTITION BY job ORDER BY job
) SampleCount FROM emp
)
WHERE SampleCount <= 10

I know this has already been answered, but seeing so many visits here I'd like to add one version that uses the SAMPLE clause but still allows to filter the rows first:

with cte1 as (
    select *
    from t_your_table
    where your_column = 'ABC'
)
select * from cte1 sample (5)

Note however that the base select needs a ROWID column, which means it may not work for some views for example.


We were given and assignment to select only two records from the list of agents..i.e 2 random records for each agent over the span of a week etc.... and below is what we got and it works

with summary as (
Select Dbms_Random.Random As Ran_Number,
             colmn1,
             colm2,
             colm3
             Row_Number() Over(Partition By col2 Order By Dbms_Random.Random) As Rank
    From table1, table2
 Where Table1.Id = Table2.Id
 Order By Dbms_Random.Random Asc)
Select tab1.col2,
             tab1.col4,
             tab1.col5,
    From Summary s
 Where s.Rank <= 2;

Suppose you are trying to select exactly 1,000 random rows from a table called my_table. This is one way to do it:

select
    *
from
    (
        select
            row_number() over(order by dbms_random.value) as random_id,
            x.*
        from
            my_table x
    )
where
    random_id <= 1000
;

This is a slight deviation from the answer posted by @Quassnoi. They both have the same costs and execution times. The only difference is that you can select the random number used to fetch the sample.


Something like this should work:

SELECT * 
FROM table_name
WHERE primary_key IN (SELECT primary_key 
                      FROM
                      (
                        SELECT primary_key, SYS.DBMS_RANDOM.RANDOM 
                        FROM table_name 
                        ORDER BY 2
                      )
                      WHERE rownum <= 10 );

SELECT * FROM TABLE_NAME SAMPLE(1)

Will give you olny an approximate 1% share rather than exactly 1/100 of the number of observations. The likely reason is than Oracle generates a random flag for each observation as to whether include in in the sample that it generates. The argument 1 (1%) in such a generation process takes the role of probability of each observation's being selected into the sample.

If this is true, the actual distribution of sample sizes will be binomial.


This in not a perfect answer but will get much better performance.

SELECT  *
FROM    (
    SELECT  *
    FROM    mytable sample (0.01)
    ORDER BY
            dbms_random.value
    )
WHERE rownum <= 1000

Sample will give you a percent of your actual table, if you really wanted a 1000 rows you would need to adjust that number. More often I just need an arbitrary number of rows anyway so I don't limit my results. On my database with 2 million rows I get 2 seconds vs 60 seconds.

select * from mytable sample (0.01)