[sql-server] How to update large table with millions of rows in SQL Server?

I've an UPDATE statement which can update more than million records. I want to update them in batches of 1000 or 10000. I tried with @@ROWCOUNT but I am unable to get desired result.

Just for testing purpose what I did is, I selected table with 14 records and set a row count of 5. This query is supposed to update records in 5, 5 and 4 but it just updates first 5 records.

Query - 1:

SET ROWCOUNT 5

UPDATE TableName 
SET Value = 'abc1' 
WHERE Parameter1 = 'abc' AND Parameter2 = 123

WHILE @@ROWCOUNT > 0
BEGIN
    SET rowcount 5

    UPDATE TableName 
    SET Value = 'abc1' 
    WHERE Parameter1 = 'abc' AND Parameter2 = 123

    PRINT (@@ROWCOUNT)
END

SET rowcount 0

Query - 2:

SET ROWCOUNT  5

WHILE (@@ROWCOUNT > 0)
BEGIN
    BEGIN TRANSACTION

    UPDATE TableName 
    SET Value = 'abc1' 
    WHERE Parameter1 = 'abc' AND Parameter2 = 123

    PRINT (@@ROWCOUNT)

    IF @@ROWCOUNT = 0
    BEGIN
        COMMIT TRANSACTION

        BREAK
    END

    COMMIT TRANSACTION
END

SET ROWCOUNT  0

What am I missing here?

This question is related to sql-server

The answer is


First of all, thank you all for your inputs. I tweak my Query - 1 and got my desired result. Gordon Linoff is right, PRINT was messing up my query so I modified it as following:

Modified Query - 1:

SET ROWCOUNT 5
WHILE (1 = 1)
  BEGIN
    BEGIN TRANSACTION

        UPDATE TableName 
        SET Value = 'abc1' 
        WHERE Parameter1 = 'abc' AND Parameter2 = 123

        IF @@ROWCOUNT = 0
          BEGIN
                COMMIT TRANSACTION
                BREAK
          END
    COMMIT TRANSACTION
  END
SET ROWCOUNT  0

Output:

(5 row(s) affected)

(5 row(s) affected)

(4 row(s) affected)

(0 row(s) affected)

WHILE EXISTS (SELECT * FROM TableName WHERE Value <> 'abc1' AND Parameter1 = 'abc' AND Parameter2 = 123)
BEGIN
UPDATE TOP (1000) TableName
SET Value = 'abc1'
WHERE Parameter1 = 'abc' AND Parameter2 = 123 AND Value <> 'abc1'
END

This is a more efficient version of the solution from @Kramb. The existence check is redundant as the update where clause already handles this. Instead you just grab the rowcount and compare to batchsize.

Also note @Kramb solution didn't filter out already updated rows from the next iteration hence it would be an infinite loop.

Also uses the modern batch size syntax instead of using rowcount.

DECLARE @batchSize INT, @rowsUpdated INT
SET @batchSize = 1000;
SET @rowsUpdated = @batchSize; -- Initialise for the while loop entry

WHILE (@batchSize = @rowsUpdated)
BEGIN
    UPDATE TOP (@batchSize) TableName
    SET Value = 'abc1'
    WHERE Parameter1 = 'abc' AND Parameter2 = 123 and Value <> 'abc1';

    SET @rowsUpdated = @@ROWCOUNT;
END

I want share my experience. A few days ago I have to update 21 million records in table with 76 million records. My colleague suggested the next variant. For example, we have the next table 'Persons':

Id | FirstName | LastName | Email            | JobTitle
1  | John      |  Doe     | [email protected]     | Software Developer
2  | John1     |  Doe1    | [email protected]     | Software Developer
3  | John2     |  Doe2    | [email protected]     | Web Designer

Task: Update persons to the new Job Title: 'Software Developer' -> 'Web Developer'.

1. Create Temporary Table 'Persons_SoftwareDeveloper_To_WebDeveloper (Id INT Primary Key)'

2. Select into temporary table persons which you want to update with the new Job Title:

INSERT INTO Persons_SoftwareDeveloper_To_WebDeveloper SELECT Id FROM
Persons WITH(NOLOCK) --avoid lock 
WHERE JobTitle = 'Software Developer' 
OPTION(MAXDOP 1) -- use only one core

Depends on rows count, this statement will take some time to fill your temporary table, but it would avoid locks. In my situation it took about 5 minutes (21 million rows).

3. The main idea is to generate micro sql statements to update database. So, let's print them:

DECLARE @i INT, @pagesize INT, @totalPersons INT
    SET @i=0
    SET @pagesize=2000
    SELECT @totalPersons = MAX(Id) FROM Persons

    while @i<= @totalPersons
    begin
    Print '
    UPDATE persons 
      SET persons.JobTitle = ''ASP.NET Developer''
      FROM  Persons_SoftwareDeveloper_To_WebDeveloper tmp
      JOIN Persons persons ON tmp.Id = persons.Id
      where persons.Id between '+cast(@i as varchar(20)) +' and '+cast(@i+@pagesize as varchar(20)) +' 
        PRINT ''Page ' + cast((@i / @pageSize) as varchar(20))  + ' of ' + cast(@totalPersons/@pageSize as varchar(20))+'
     GO
     '
     set @i=@i+@pagesize
    end

After executing this script you will receive hundreds of batches which you can execute in one tab of MS SQL Management Studio.

4. Run printed sql statements and check for locks on table. You always can stop process and play with @pageSize to speed up or speed down updating(don't forget to change @i after you pause script).

5. Drop Persons_SoftwareDeveloper_To_AspNetDeveloper. Remove temporary table.

Minor Note: This migration could take a time and new rows with invalid data could be inserted during migration. So, firstly fix places where your rows adds. In my situation I fixed UI, 'Software Developer' -> 'Web Developer'.


Your print is messing things up, because it resets @@ROWCOUNT. Whenever you use @@ROWCOUNT, my advice is to always set it immediately to a variable. So:

DECLARE @RC int;
WHILE @RC > 0 or @RC IS NULL
    BEGIN
        SET rowcount 5;

        UPDATE TableName
            SET Value  = 'abc1'
            WHERE Parameter1  = 'abc' AND Parameter2  = 123 AND Value <> 'abc1';

        SET @RC = @@ROWCOUNT;
        PRINT(@@ROWCOUNT)
    END;

SET rowcount = 0;

And, another nice feature is that you don't need to repeat the update code.