I have a table with a lot of records (could be more than 500 000 or 1 000 000). I added a new column in this table and I need to fill a value for every row in the column, using the corresponding row value of another column in this table.
I tried to use separate transactions for selecting every next chunk of 100 records and update the value for them, but still this takes hours to update all records in Oracle10 for example.
What is the most efficient way to do this in SQL, without using some dialect-specific features, so it works everywhere (Oracle, MSSQL, MySQL, PostGre etc.)?
ADDITIONAL INFO: There are no calculated fields. There are indexes. Used generated SQL statements which update the table row by row.
This question is related to
sql
performance
oracle
You could drop any indexes on the table, then do your insert, and then recreate the indexes.
Might not work you for, but a technique I've used a couple times in the past for similar circumstances.
created updated_{table_name}, then select insert into this table in batches. Once finished, and this hinges on Oracle ( which I don't know or use ) supporting the ability to rename tables in an atomic fashion. updated_{table_name} becomes {table_name} while {table_name} becomes original_{table_name}.
Last time I had to do this was for a heavily indexed table with several million rows that absolutely positively could not be locked for the duration needed to make some serious changes to it.
As Marcelo suggests:
UPDATE mytable
SET new_column = <expr containing old_column>;
If this takes too long and fails due to "snapshot too old" errors (e.g. if the expression queries another highly-active table), and if the new value for the column is always NOT NULL, you could update the table in batches:
UPDATE mytable
SET new_column = <expr containing old_column>
WHERE new_column IS NULL
AND ROWNUM <= 100000;
Just run this statement, COMMIT, then run it again; rinse, repeat until it reports "0 rows updated". It'll take longer but each update is less likely to fail.
EDIT:
A better alternative that should be more efficient is to use the DBMS_PARALLEL_EXECUTE
API.
Sample code (from Oracle docs):
DECLARE
l_sql_stmt VARCHAR2(1000);
l_try NUMBER;
l_status NUMBER;
BEGIN
-- Create the TASK
DBMS_PARALLEL_EXECUTE.CREATE_TASK ('mytask');
-- Chunk the table by ROWID
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('mytask', 'HR', 'EMPLOYEES', true, 100);
-- Execute the DML in parallel
l_sql_stmt := 'update EMPLOYEES e
SET e.salary = e.salary + 10
WHERE rowid BETWEEN :start_id AND :end_id';
DBMS_PARALLEL_EXECUTE.RUN_TASK('mytask', l_sql_stmt, DBMS_SQL.NATIVE,
parallel_level => 10);
-- If there is an error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
LOOP
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.RESUME_TASK('mytask');
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
END LOOP;
-- Done with processing; drop the task
DBMS_PARALLEL_EXECUTE.DROP_TASK('mytask');
END;
/
Oracle Docs: https://docs.oracle.com/database/121/ARPLS/d_parallel_ex.htm#ARPLS67333
update Hotels set Discount=30 where Hotelid >= 1 and Hotelid <= 5504
What is the database version? Check out virtual columns in 11g:
Adding Columns with a Default Value http://www.oracle.com/technology/pub/articles/oracle-database-11g-top-features/11g-schemamanagement.html
Source: Stackoverflow.com