I tested 2 set based randomization methods against RAND() by generating 100,000,000 rows with each. To level the field the output is a float between 0-1 to mimic RAND(). Most of the code is testing infrastructure so I summarize the algorithms here:
-- Try #1 used
(CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)%500000000000000000+500000000000000000.0)/1000000000000000000 AS Val
-- Try #2 used
RAND(Checksum(NewId()))
-- and to have a baseline to compare output with I used
RAND() -- this required executing 100000000 separate insert statements
Using CRYPT_GEN_RANDOM was clearly the most random since there is only a .000000001% chance of seeing even 1 duplicate when plucking 10^8 numbers FROM a set of 10^18 numbers. IOW we should not have seen any duplicates and this had none! This set took 44 seconds to generate on my laptop.
Cnt Pct
----- ----
1 100.000000 --No duplicates
SQL Server Execution Times: CPU time = 134795 ms, elapsed time = 39274 ms.
IF OBJECT_ID('tempdb..#T0') IS NOT NULL DROP TABLE #T0;
GO
WITH L0 AS (SELECT c FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c)) -- 2^4
,L1 AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B) -- 2^8
,L2 AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B) -- 2^16
,L3 AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B) -- 2^32
SELECT TOP 100000000 (CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)%500000000000000000+500000000000000000.0)/1000000000000000000 AS Val
INTO #T0
FROM L3;
WITH x AS (
SELECT Val,COUNT(*) Cnt
FROM #T0
GROUP BY Val
)
SELECT x.Cnt,COUNT(*)/(SELECT COUNT(*)/100 FROM #T0) Pct
FROM X
GROUP BY x.Cnt;
At almost 15 orders of magnitude less random this method was not quite twice as fast, taking only 23 seconds to generate 100M numbers.
Cnt Pct
---- ----
1 95.450254 -- only 95% unique is absolutely horrible
2 02.222167 -- If this line were the only problem I'd say DON'T USE THIS!
3 00.034582
4 00.000409 -- 409 numbers appeared 4 times
5 00.000006 -- 6 numbers actually appeared 5 times
SQL Server Execution Times: CPU time = 77156 ms, elapsed time = 24613 ms.
IF OBJECT_ID('tempdb..#T1') IS NOT NULL DROP TABLE #T1;
GO
WITH L0 AS (SELECT c FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c)) -- 2^4
,L1 AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B) -- 2^8
,L2 AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B) -- 2^16
,L3 AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B) -- 2^32
SELECT TOP 100000000 RAND(Checksum(NewId())) AS Val
INTO #T1
FROM L3;
WITH x AS (
SELECT Val,COUNT(*) Cnt
FROM #T1
GROUP BY Val
)
SELECT x.Cnt,COUNT(*)*1.0/(SELECT COUNT(*)/100 FROM #T1) Pct
FROM X
GROUP BY x.Cnt;
RAND() alone is useless for set-based generation so generating the baseline for comparing randomness took over 6 hours and had to be restarted several times to finally get the right number of output rows. It also seems that the randomness leaves a lot to be desired although it's better than using checksum(newid()) to reseed each row.
Cnt Pct
---- ----
1 99.768020
2 00.115840
3 00.000100 -- at least there were comparitively few values returned 3 times
Because of the restarts, execution time could not be captured.
IF OBJECT_ID('tempdb..#T2') IS NOT NULL DROP TABLE #T2;
GO
CREATE TABLE #T2 (Val FLOAT);
GO
SET NOCOUNT ON;
GO
INSERT INTO #T2(Val) VALUES(RAND());
GO 100000000
WITH x AS (
SELECT Val,COUNT(*) Cnt
FROM #T2
GROUP BY Val
)
SELECT x.Cnt,COUNT(*)*1.0/(SELECT COUNT(*)/100 FROM #T2) Pct
FROM X
GROUP BY x.Cnt;