[sql] Efficient way to update all rows in a table

I have a table with a lot of records (could be more than 500 000 or 1 000 000). I added a new column in this table and I need to fill a value for every row in the column, using the corresponding row value of another column in this table.

I tried to use separate transactions for selecting every next chunk of 100 records and update the value for them, but still this takes hours to update all records in Oracle10 for example.

What is the most efficient way to do this in SQL, without using some dialect-specific features, so it works everywhere (Oracle, MSSQL, MySQL, PostGre etc.)?

ADDITIONAL INFO: There are no calculated fields. There are indexes. Used generated SQL statements which update the table row by row.

This question is related to sql performance oracle

The answer is


You could drop any indexes on the table, then do your insert, and then recreate the indexes.


Might not work you for, but a technique I've used a couple times in the past for similar circumstances.

created updated_{table_name}, then select insert into this table in batches. Once finished, and this hinges on Oracle ( which I don't know or use ) supporting the ability to rename tables in an atomic fashion. updated_{table_name} becomes {table_name} while {table_name} becomes original_{table_name}.

Last time I had to do this was for a heavily indexed table with several million rows that absolutely positively could not be locked for the duration needed to make some serious changes to it.


As Marcelo suggests:

UPDATE mytable
SET new_column = <expr containing old_column>;

If this takes too long and fails due to "snapshot too old" errors (e.g. if the expression queries another highly-active table), and if the new value for the column is always NOT NULL, you could update the table in batches:

UPDATE mytable
SET new_column = <expr containing old_column>
WHERE new_column IS NULL
AND ROWNUM <= 100000;

Just run this statement, COMMIT, then run it again; rinse, repeat until it reports "0 rows updated". It'll take longer but each update is less likely to fail.

EDIT:

A better alternative that should be more efficient is to use the DBMS_PARALLEL_EXECUTE API.

Sample code (from Oracle docs):

DECLARE
  l_sql_stmt VARCHAR2(1000);
  l_try NUMBER;
  l_status NUMBER;
BEGIN

  -- Create the TASK
  DBMS_PARALLEL_EXECUTE.CREATE_TASK ('mytask');

  -- Chunk the table by ROWID
  DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('mytask', 'HR', 'EMPLOYEES', true, 100);

  -- Execute the DML in parallel
  l_sql_stmt := 'update EMPLOYEES e 
      SET e.salary = e.salary + 10
      WHERE rowid BETWEEN :start_id AND :end_id';
  DBMS_PARALLEL_EXECUTE.RUN_TASK('mytask', l_sql_stmt, DBMS_SQL.NATIVE,
                                 parallel_level => 10);

  -- If there is an error, RESUME it for at most 2 times.
  l_try := 0;
  l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
  WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED) 
  LOOP
    l_try := l_try + 1;
    DBMS_PARALLEL_EXECUTE.RESUME_TASK('mytask');
    l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
  END LOOP;

  -- Done with processing; drop the task
  DBMS_PARALLEL_EXECUTE.DROP_TASK('mytask');

END;
/

Oracle Docs: https://docs.oracle.com/database/121/ARPLS/d_parallel_ex.htm#ARPLS67333


update Hotels set Discount=30 where Hotelid >= 1 and Hotelid <= 5504


What is the database version? Check out virtual columns in 11g:

Adding Columns with a Default Value http://www.oracle.com/technology/pub/articles/oracle-database-11g-top-features/11g-schemamanagement.html


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder

Examples related to oracle

concat yesterdays date with a specific time ORA-28001: The password has expired how to modify the size of a column How to create a blank/empty column with SELECT query in oracle? Find the number of employees in each department - SQL Oracle Query to display all tablespaces in a database and datafiles When or Why to use a "SET DEFINE OFF" in Oracle Database How to insert date values into table error: ORA-65096: invalid common user or role name in oracle In Oracle SQL: How do you insert the current date + time into a table?