Fastest way to update 120 Million records

Question

I need to initialize a new field with the value -1 in a 120 Million record table   Update table        set int field   -1    I let it run for 5 hours before canceling it   I tried running it with transaction level set to read uncommitted with the same results   Recovery Model   Simple  MS SQL Server 2005   Any advice on getting this done faster

User · Answer

set rowcount 1000000 Update table set int field   -1 where int field lt  gt -1   see how fast that takes  adjust and repeat as necessary

User · Answer

When adding a new column   initialize a new field   and setting a single value to each existing row  I use the following tactic   ALTER TABLE MyTable  add NewColumn  int  not null   constraint MyTable TemporaryDefault    default -1  ALTER TABLE MyTable  drop constraint MyTable TemporaryDefault   If the column is nullable and you don t include a  declared  constraint  the column will be set to null for all rows

User · Answer

I break the task up into smaller units   Test with different batch size intervals for your table  until you find an interval that performs optimally   Here is a sample that I have used in the past   declare  counter int  declare  numOfRecords int declare  batchsize int  set  numOfRecords    SELECT COUNT    AS NumberOfRecords FROM  lt TABLE gt  with nolock   set  counter   0  set  batchsize   2500  set rowcount  batchsize while  counter  lt    numOfRecords  batchsize   1 begin  set  counter    counter   1  Update table set int field   -1 where int field  lt  gt  -1  end  set rowcount 0

User · Answer

The only sane way to update a table of 120M records is with a SELECT statement that populates a second table  You have to take care when doing this  Instructions below      Simple Case  For a table w out a clustered index  during a time w out concurrent DML    SELECT    new col   1 INTO clone BaseTable FROM dbo BaseTable recreate indexes  constraints  etc on new table switch old and new w  ALTER SCHEMA     TRANSFER  drop old table   If you can t create a clone schema  a different table name in the same schema will do   Remember to rename all your constraints and triggers  if applicable  after the switch     Non-simple Case  First  recreate your BaseTable with the same name under a different schema  eg clone BaseTable  Using a separate schema will simplify the rename process later    Include the clustered index  if applicable  Remember that primary keys and unique constraints may be clustered  but not necessarily so  Include identity columns and computed columns  if applicable  Include your new INT column  wherever it belongs  Do not include any of the following     triggers foreign key constraints non-clustered indexes primary keys unique constraints check constraints or default constraints  Defaults don t make much of difference  but we re trying to keep things minimal     Then  test your insert w  1000 rows   -- assuming an IDENTITY column in BaseTable SET IDENTITY INSERT clone BaseTable ON GO INSERT clone BaseTable WITH  TABLOCK   Col1  Col2  Col3  SELECT TOP 1000 Col1  Col2  Col3   -1 FROM dbo BaseTable GO SET IDENTITY INSERT clone BaseTable OFF   Examine the results  If everything appears in order    truncate the clone table make sure the database in in bulk-logged or simple recovery model perform the full insert     This will take a while  but not nearly as long as an update  Once it completes  check the data in the clone table to make sure it everything is correct   Then  recreate all non-clustered primary keys unique constraints indexes and foreign key constraints  in that order   Recreate default and check constraints  if applicable  Recreate all triggers  Recreate each constraint  index or trigger in a separate batch  eg   ALTER TABLE clone BaseTable ADD CONSTRAINT UQ BaseTable UNIQUE  Col2  GO -- next constraint index trigger definition here   Finally  move dbo BaseTable to a backup schema and clone BaseTable to the dbo schema  or wherever your table is supposed to live      -- -- perform first true-up operation here  if necessary -- EXEC clone BaseTable TrueUp -- GO -- -- create a backup schema  if necessary -- CREATE SCHEMA backup 20100914 -- GO BEGIN TRY   BEGIN TRANSACTION   ALTER SCHEMA backup 20100914 TRANSFER dbo BaseTable   -- -- perform second true-up operation here  if necessary   -- EXEC clone BaseTable TrueUp   ALTER SCHEMA dbo TRANSFER clone BaseTable   COMMIT TRANSACTION END TRY BEGIN CATCH   SELECT ERROR MESSAGE   -- add more info here if necessary   ROLLBACK TRANSACTION END CATCH GO   If you need to free-up disk space  you may drop your original table at this time  though it may be prudent to keep it around a while longer   Needless to say  this is ideally an offline operation  If you have people modifying data while you perform this operation  you will have to perform a true-up operation with the schema switch  I recommend creating a trigger on dbo BaseTable to log all DML to a separate table  Enable this trigger before you start the insert  Then in the same transaction that you perform the schema transfer  use the log table to perform a true-up  Test this first on a subset of the data  Deltas are easy to screw up

User · Answer

If your int field is indexed  remove the index before running the update  Then create your index again     5 hours seem like a lot for 120 million recs

User · Answer

Sounds like an indexing problem  like Pabla Santa Cruz mentioned  Since your update is not conditional  you can DROP the column and RE-ADD it with a DEFAULT value

User · Answer

What I d try first is  to drop all constraints  indexes  triggers and full text indexes first before you update   If above wasn t performant enough  my next move would be  to create a CSV file with 12 million records and bulk import it using bcp   Lastly  I d create a new heap table  meaning table with no primary key  with no indexes on a different filegroup  populate it with -1   Partition the old table  and add the new partition using  switch

User · Answer

If you have the disk space  you could use SELECT INTO and create a new table  It s minimally logged  so it would go much faster  select t    int field   CAST -1 as int  into mytable new  from mytable t  -- create your indexes and constraints  GO  exec sp rename mytable  mytable old exec sp rename mytable new  mytable  drop table mytable old

User · Answer

If the table has an index which you can iterate over I would put update top 10000  statement in a while loop moving over the data  That would keep the transaction log slim and won t have such a huge impact on the disk system  Also  I would recommend to play with maxdop option  setting it closer to 1

User · Answer

declare  cnt bigint set  cnt   1  while  cnt 100 lt 10000000   begin  UPDATE top 100   Imp   dbo   tablename     SET  col1    xxxx   WHERE col1  is null      print   cnt    convert varchar  cnt    set  cnt  cnt 1   end

User · Answer

In general  recommendation are next    Remove or just Disable all INDEXES  TRIGGERS  CONSTRAINTS on the table  Perform COMMIT more often  e g  after each 1000 records that were updated   Use select     into    But in particular case you should choose the most appropriate solution or their combination   Also bear in mind that sometime index could be useful e g  when you perform update of non-indexed column by some condition

[sql] Fastest way to update 120 Million records

Examples related to sql

Examples related to sql-server

Examples related to sql-server-2005