[sql-server] How can I fill a column with random numbers in SQL? I get the same value in every row

UPDATE CattleProds
SET SheepTherapy=(ROUND((RAND()* 10000),0))
WHERE SheepTherapy IS NULL

If I then do a SELECT I see that my random number is identical in every row. Any ideas how to generate unique random numbers?

This question is related to sql-server

The answer is


While I do love using CHECKSUM, I feel that a better way to go is using NEWID(), just because you don't have to go through a complicated math to generate simple numbers .

ROUND( 1000 *RAND(convert(varbinary, newid())), 0)

You can replace the 1000 with whichever number you want to set as the limit, and you can always use a plus sign to create a range, let's say you want a random number between 100 and 200, you can do something like :

100 + ROUND( 100 *RAND(convert(varbinary, newid())), 0)

Putting it together in your query :

UPDATE CattleProds 
SET SheepTherapy= ROUND( 1000 *RAND(convert(varbinary, newid())), 0)
WHERE SheepTherapy IS NULL

require_once('db/connect.php');

//rand(1000000 , 9999999);

$products_query = "SELECT id FROM products";
$products_result = mysqli_query($conn, $products_query);
$products_row = mysqli_fetch_array($products_result);
$ids_array = [];

do
{
    array_push($ids_array, $products_row['id']);
}
while($products_row = mysqli_fetch_array($products_result));

/*
echo '<pre>';
print_r($ids_array);
echo '</pre>';
*/
$row_counter = count($ids_array);

for ($i=0; $i < $row_counter; $i++)
{ 
    $current_row = $ids_array[$i];
    $rand = rand(1000000 , 9999999);
    mysqli_query($conn , "UPDATE products SET code='$rand' WHERE id='$current_row'");
}

If you are on SQL Server 2008 you can also use

 CRYPT_GEN_RANDOM(2) % 10000

Which seems somewhat simpler (it is also evaluated once per row as newid is - shown below)

DECLARE @foo TABLE (col1 FLOAT)

INSERT INTO @foo SELECT 1 UNION SELECT 2

UPDATE @foo
SET col1 =  CRYPT_GEN_RANDOM(2) % 10000

SELECT *  FROM @foo

Returns (2 random probably different numbers)

col1
----------------------
9693
8573

Mulling the unexplained downvote the only legitimate reason I can think of is that because the random number generated is between 0-65535 which is not evenly divisible by 10,000 some numbers will be slightly over represented. A way around this would be to wrap it in a scalar UDF that throws away any number over 60,000 and calls itself recursively to get a replacement number.

CREATE FUNCTION dbo.RandomNumber()
RETURNS INT
AS
  BEGIN
      DECLARE @Result INT

      SET @Result = CRYPT_GEN_RANDOM(2)

      RETURN CASE
               WHEN @Result < 60000
                     OR @@NESTLEVEL = 32 THEN @Result % 10000
               ELSE dbo.RandomNumber()
             END
  END  

I tested 2 set based randomization methods against RAND() by generating 100,000,000 rows with each. To level the field the output is a float between 0-1 to mimic RAND(). Most of the code is testing infrastructure so I summarize the algorithms here:

-- Try #1 used
(CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)%500000000000000000+500000000000000000.0)/1000000000000000000 AS Val
-- Try #2 used
RAND(Checksum(NewId()))
-- and to have a baseline to compare output with I used
RAND() -- this required executing 100000000 separate insert statements

Using CRYPT_GEN_RANDOM was clearly the most random since there is only a .000000001% chance of seeing even 1 duplicate when plucking 10^8 numbers FROM a set of 10^18 numbers. IOW we should not have seen any duplicates and this had none! This set took 44 seconds to generate on my laptop.

Cnt     Pct
-----   ----
 1      100.000000  --No duplicates

SQL Server Execution Times: CPU time = 134795 ms, elapsed time = 39274 ms.

IF OBJECT_ID('tempdb..#T0') IS NOT NULL DROP TABLE #T0;
GO
WITH L0   AS (SELECT c FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c))  -- 2^4  
    ,L1   AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B)    -- 2^8  
    ,L2   AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B)    -- 2^16  
    ,L3   AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B)    -- 2^32  
SELECT TOP 100000000 (CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)%500000000000000000+500000000000000000.0)/1000000000000000000 AS Val
  INTO #T0
  FROM L3;

 WITH x AS (
     SELECT Val,COUNT(*) Cnt
      FROM #T0
     GROUP BY Val
)
SELECT x.Cnt,COUNT(*)/(SELECT COUNT(*)/100 FROM #T0) Pct
  FROM X
 GROUP BY x.Cnt;

At almost 15 orders of magnitude less random this method was not quite twice as fast, taking only 23 seconds to generate 100M numbers.

Cnt  Pct
---- ----
1    95.450254    -- only 95% unique is absolutely horrible
2    02.222167    -- If this line were the only problem I'd say DON'T USE THIS!
3    00.034582
4    00.000409    -- 409 numbers appeared 4 times
5    00.000006    -- 6 numbers actually appeared 5 times 

SQL Server Execution Times: CPU time = 77156 ms, elapsed time = 24613 ms.

IF OBJECT_ID('tempdb..#T1') IS NOT NULL DROP TABLE #T1;
GO
WITH L0   AS (SELECT c FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c))  -- 2^4  
    ,L1   AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B)    -- 2^8  
    ,L2   AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B)    -- 2^16  
    ,L3   AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B)    -- 2^32  
SELECT TOP 100000000 RAND(Checksum(NewId())) AS Val
  INTO #T1
  FROM L3;

WITH x AS (
    SELECT Val,COUNT(*) Cnt
     FROM #T1
    GROUP BY Val
)
SELECT x.Cnt,COUNT(*)*1.0/(SELECT COUNT(*)/100 FROM #T1) Pct
  FROM X
 GROUP BY x.Cnt;

RAND() alone is useless for set-based generation so generating the baseline for comparing randomness took over 6 hours and had to be restarted several times to finally get the right number of output rows. It also seems that the randomness leaves a lot to be desired although it's better than using checksum(newid()) to reseed each row.

Cnt  Pct
---- ----
1    99.768020
2    00.115840
3    00.000100  -- at least there were comparitively few values returned 3 times

Because of the restarts, execution time could not be captured.

IF OBJECT_ID('tempdb..#T2') IS NOT NULL DROP TABLE #T2;
GO
CREATE TABLE #T2 (Val FLOAT);
GO
SET NOCOUNT ON;
GO
INSERT INTO #T2(Val) VALUES(RAND());
GO 100000000

WITH x AS (
    SELECT Val,COUNT(*) Cnt
     FROM #T2
    GROUP BY Val
)
SELECT x.Cnt,COUNT(*)*1.0/(SELECT COUNT(*)/100 FROM #T2) Pct
  FROM X
 GROUP BY x.Cnt;