[sql] Best way to select random rows PostgreSQL

A variation of the materialized view "Possible alternative" outlined by Erwin Brandstetter is possible.

Say, for example, that you don't want duplicates in the randomized values that are returned. So you will need to set a boolean value on the primary table containing your (non-randomized) set of values.

Assuming this is the input table:

id_values  id  |   used
           ----+--------
           1   |   FALSE
           2   |   FALSE
           3   |   FALSE
           4   |   FALSE
           5   |   FALSE
           ...

Populate the ID_VALUES table as needed. Then, as described by Erwin, create a materialized view that randomizes the ID_VALUES table once:

CREATE MATERIALIZED VIEW id_values_randomized AS
  SELECT id
  FROM id_values
  ORDER BY random();

Note that the materialized view does not contain the used column, because this will quickly become out-of-date. Nor does the view need to contain other columns that may be in the id_values table.

In order to obtain (and "consume") random values, use an UPDATE-RETURNING on id_values, selecting id_values from id_values_randomized with a join, and applying the desired criteria to obtain only relevant possibilities. For example:

UPDATE id_values
SET used = TRUE
WHERE id_values.id IN 
  (SELECT i.id
    FROM id_values_randomized r INNER JOIN id_values i ON i.id = r.id
    WHERE (NOT i.used)
    LIMIT 5)
RETURNING id;

Change LIMIT as necessary -- if you only need one random value at a time, change LIMIT to 1.

With the proper indexes on id_values, I believe the UPDATE-RETURNING should execute very quickly with little load. It returns randomized values with one database round-trip. The criteria for "eligible" rows can be as complex as required. New rows can be added to the id_values table at any time, and they will become accessible to the application as soon as the materialized view is refreshed (which can likely be run at an off-peak time). Creation and refresh of the materialized view will be slow, but it only needs to be executed when new id's are added to the id_values table.

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder

Examples related to postgresql

Subtracting 1 day from a timestamp date pgadmin4 : postgresql application server could not be contacted. Psql could not connect to server: No such file or directory, 5432 error? How to persist data in a dockerized postgres database using volumes input file appears to be a text format dump. Please use psql Postgres: check if array field contains value? Add timestamp column with default NOW() for new rows only Can't connect to Postgresql on port 5432 How to insert current datetime in postgresql insert query Connecting to Postgresql in a docker container from outside

Examples related to random

How can I get a random number in Kotlin? scikit-learn random state in splitting dataset Random number between 0 and 1 in python In python, what is the difference between random.uniform() and random.random()? Generate random colors (RGB) Random state (Pseudo-random number) in Scikit learn How does one generate a random number in Apple's Swift language? How to generate a random string of a fixed length in Go? Generate 'n' unique random numbers within a range What does random.sample() method in python do?