[sql] Fastest way to update 120 Million records

I need to initialize a new field with the value -1 in a 120 Million record table.

Update table
       set int_field = -1;

I let it run for 5 hours before canceling it.

I tried running it with transaction level set to read uncommitted with the same results.

Recovery Model = Simple.
MS SQL Server 2005

Any advice on getting this done faster?

This question is related to sql sql-server sql-server-2005

The answer is


If you have the disk space, you could use SELECT INTO and create a new table. It's minimally logged, so it would go much faster

select t.*, int_field = CAST(-1 as int)
into mytable_new 
from mytable t

-- create your indexes and constraints

GO

exec sp_rename mytable, mytable_old
exec sp_rename mytable_new, mytable

drop table mytable_old

Sounds like an indexing problem, like Pabla Santa Cruz mentioned. Since your update is not conditional, you can DROP the column and RE-ADD it with a DEFAULT value.


In general, recommendation are next:

  1. Remove or just Disable all INDEXES, TRIGGERS, CONSTRAINTS on the table;
  2. Perform COMMIT more often (e.g. after each 1000 records that were updated);
  3. Use select ... into.

But in particular case you should choose the most appropriate solution or their combination.

Also bear in mind that sometime index could be useful e.g. when you perform update of non-indexed column by some condition.


If your int_field is indexed, remove the index before running the update. Then create your index again...

5 hours seem like a lot for 120 million recs.


I break the task up into smaller units. Test with different batch size intervals for your table, until you find an interval that performs optimally. Here is a sample that I have used in the past.

declare @counter int 
declare @numOfRecords int
declare @batchsize int

set @numOfRecords = (SELECT COUNT(*) AS NumberOfRecords FROM <TABLE> with(nolock))
set @counter = 0 
set @batchsize = 2500

set rowcount @batchsize
while @counter < (@numOfRecords/@batchsize) +1
begin 
set @counter = @counter + 1 
Update table set int_field = -1 where int_field <> -1;
end 
set rowcount 0

If the table has an index which you can iterate over I would put update top(10000) statement in a while loop moving over the data. That would keep the transaction log slim and won't have such a huge impact on the disk system. Also, I would recommend to play with maxdop option (setting it closer to 1).


declare @cnt bigint
set @cnt = 1

while @cnt*100<10000000 
 begin

UPDATE top(100) [Imp].[dbo].[tablename]
   SET [col1] = xxxx 
 WHERE[col1] is null  

  print '@cnt: '+convert(varchar,@cnt)
  set @cnt=@cnt+1
  end

What I'd try first is
to drop all constraints, indexes, triggers and full text indexes first before you update.

If above wasn't performant enough, my next move would be
to create a CSV file with 12 million records and bulk import it using bcp.

Lastly, I'd create a new heap table (meaning table with no primary key) with no indexes on a different filegroup, populate it with -1. Partition the old table, and add the new partition using "switch".


When adding a new column ("initialize a new field") and setting a single value to each existing row, I use the following tactic:

ALTER TABLE MyTable
 add NewColumn  int  not null
  constraint MyTable_TemporaryDefault
   default -1

ALTER TABLE MyTable
 drop constraint MyTable_TemporaryDefault

If the column is nullable and you don't include a "declared" constraint, the column will be set to null for all rows.


set rowcount 1000000
Update table set int_field = -1 where int_field<>-1

see how fast that takes, adjust and repeat as necessary


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to sql-server-2005

Add a row number to result set of a SQL query SQL Server : Transpose rows to columns Select info from table where row has max date How to query for Xml values and attributes from table in SQL Server? How to restore SQL Server 2014 backup in SQL Server 2008 SQL Server 2005 Using CHARINDEX() To split a string Is it necessary to use # for creating temp tables in SQL server? SQL Query to find the last day of the month JDBC connection to MSSQL server in windows authentication mode How to convert the system date format to dd/mm/yy in SQL Server 2008 R2?