[sql] How to speed up insertion performance in PostgreSQL

I am testing Postgres insertion performance. I have a table with one column with number as its data type. There is an index on it as well. I filled the database up using this query:

insert into aNumber (id) values (564),(43536),(34560) ...

I inserted 4 million rows very quickly 10,000 at a time with the query above. After the database reached 6 million rows performance drastically declined to 1 Million rows every 15 min. Is there any trick to increase insertion performance? I need optimal insertion performance on this project.

Using Windows 7 Pro on a machine with 5 GB RAM.

This question is related to sql postgresql bulkinsert sql-insert

The answer is


In addition to excellent Craig Ringer's post and depesz's blog post, if you would like to speed up your inserts through ODBC (psqlodbc) interface by using prepared-statement inserts inside a transaction, there are a few extra things you need to do to make it work fast:

  1. Set the level-of-rollback-on-errors to "Transaction" by specifying Protocol=-1 in the connection string. By default psqlodbc uses "Statement" level, which creates a SAVEPOINT for each statement rather than an entire transaction, making inserts slower.
  2. Use server-side prepared statements by specifying UseServerSidePrepare=1 in the connection string. Without this option the client sends the entire insert statement along with each row being inserted.
  3. Disable auto-commit on each statement using SQLSetConnectAttr(conn, SQL_ATTR_AUTOCOMMIT, reinterpret_cast<SQLPOINTER>(SQL_AUTOCOMMIT_OFF), 0);
  4. Once all rows have been inserted, commit the transaction using SQLEndTran(SQL_HANDLE_DBC, conn, SQL_COMMIT);. There is no need to explicitly open a transaction.

Unfortunately, psqlodbc "implements" SQLBulkOperations by issuing a series of unprepared insert statements, so that to achieve the fastest insert one needs to code up the above steps manually.


If you happend to insert colums with UUIDs (which is not exactly your case) and to add to @Dennis answer (I can't comment yet), be advise than using gen_random_uuid() (requires PG 9.4 and pgcrypto module) is (a lot) faster than uuid_generate_v4()

=# explain analyze select uuid_generate_v4(),* from generate_series(1,10000);
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=4) (actual time=11.674..10304.959 rows=10000 loops=1)
 Planning time: 0.157 ms
 Execution time: 13353.098 ms
(3 filas)

vs


=# explain analyze select gen_random_uuid(),* from generate_series(1,10000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=4) (actual time=252.274..418.137 rows=10000 loops=1)
 Planning time: 0.064 ms
 Execution time: 503.818 ms
(3 filas)

Also, it's the suggested official way to do it

Note

If you only need randomly-generated (version 4) UUIDs, consider using the gen_random_uuid() function from the pgcrypto module instead.

This droped insert time from ~2 hours to ~10 minutes for 3.7M of rows.


I spent around 6 hours on the same issue today. Inserts go at a 'regular' speed (less than 3sec per 100K) up until to 5MI (out of total 30MI) rows and then the performance sinks drastically (all the way down to 1min per 100K).

I will not list all of the things that did not work and cut straight to the meat.

I dropped a primary key on the target table (which was a GUID) and my 30MI or rows happily flowed to their destination at a constant speed of less than 3sec per 100K.


Use COPY table TO ... WITH BINARY which is according to the documentation is "somewhat faster than the text and CSV formats." Only do this if you have millions of rows to insert, and if you are comfortable with binary data.

Here is an example recipe in Python, using psycopg2 with binary input.


For optimal Insertion performance disable the index if that's an option for you. Other than that, better hardware (disk, memory) is also helpful


I encountered this insertion performance problem as well. My solution is spawn some go routines to finish the insertion work. In the meantime, SetMaxOpenConns should be given a proper number otherwise too many open connection error would be alerted.

db, _ := sql.open() 
db.SetMaxOpenConns(SOME CONFIG INTEGER NUMBER) 
var wg sync.WaitGroup
for _, query := range queries {
    wg.Add(1)
    go func(msg string) {
        defer wg.Done()
        _, err := db.Exec(msg)
        if err != nil {
            fmt.Println(err)
        }
    }(query)
}
wg.Wait()

The loading speed is much faster for my project. This code snippet just gave an idea how it works. Readers should be able to modify it easily.


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to postgresql

Subtracting 1 day from a timestamp date pgadmin4 : postgresql application server could not be contacted. Psql could not connect to server: No such file or directory, 5432 error? How to persist data in a dockerized postgres database using volumes input file appears to be a text format dump. Please use psql Postgres: check if array field contains value? Add timestamp column with default NOW() for new rows only Can't connect to Postgresql on port 5432 How to insert current datetime in postgresql insert query Connecting to Postgresql in a docker container from outside

Examples related to bulkinsert

Insert Multiple Rows Into Temp Table With SQL Server 2012 PostgreSQL - SQL state: 42601 syntax error Import CSV file into SQL Server Cannot bulk load. Operating system error code 5 (Access is denied.) How to Bulk Insert from XLSX file extension? Bulk Insert Correctly Quoted CSV File in SQL Server How to speed up insertion performance in PostgreSQL BULK INSERT with identity (auto-increment) column How do I temporarily disable triggers in PostgreSQL? mongodb: insert if not exists

Examples related to sql-insert

In Oracle SQL: How do you insert the current date + time into a table? SQL Error: ORA-01861: literal does not match format string 01861 "Insert if not exists" statement in SQLite Column count doesn't match value count at row 1 Inserting data to table (mysqli insert) SQL Insert into table only if record doesn't exist C# with MySQL INSERT parameters How to speed up insertion performance in PostgreSQL "column not allowed here" error in INSERT statement Inserting multiple rows in mysql