How to speed up insertion performance in PostgreSQL

Question

I am testing Postgres insertion performance  I have a table with one column with number as its data type  There is an index on it as well  I filled the database up using this query   insert into aNumber  id  values  564   43536   34560        I inserted 4 million rows very quickly 10 000 at a time with the query above  After the database reached 6 million rows performance drastically declined to 1 Million rows every 15 min  Is there any trick to increase insertion performance  I need optimal insertion performance on this project   Using Windows 7 Pro on a machine with 5 GB RAM

User · Answer

If you happend to insert colums with UUIDs (which is not exactly your case) and to add to @Dennis answer (I can't comment yet), be advise than using gen_random_uuid() (requires PG 9.4 and pgcrypto module) is (a lot) faster than uuid_generate_v4()

=# explain analyze select uuid_generate_v4(),* from generate_series(1,10000);
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=4) (actual time=11.674..10304.959 rows=10000 loops=1)
 Planning time: 0.157 ms
 Execution time: 13353.098 ms
(3 filas)

vs


=# explain analyze select gen_random_uuid(),* from generate_series(1,10000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=4) (actual time=252.274..418.137 rows=10000 loops=1)
 Planning time: 0.064 ms
 Execution time: 503.818 ms
(3 filas)

Also, it's the suggested official way to do it

Note

If you only need randomly-generated (version 4) UUIDs, consider using the gen_random_uuid() function from the pgcrypto module instead.

This droped insert time from ~2 hours to ~10 minutes for 3.7M of rows.

User · Answer

For optimal Insertion performance disable the index if that s an option for you  Other than that  better hardware  disk  memory  is also helpful

User · Answer

I encountered this insertion performance problem as well  My solution is spawn some go routines to finish the insertion work  In the meantime  SetMaxOpenConns should be given a proper number otherwise too many open connection error would be alerted    db       sql open    db SetMaxOpenConns SOME CONFIG INTEGER NUMBER   var wg sync WaitGroup for    query    range queries       wg Add 1      go func msg string            defer wg Done              err    db Exec msg          if err    nil               fmt Println err                  query    wg Wait     The loading speed is much faster for my project  This code snippet just gave an idea how it works  Readers should be able to modify it easily

User · Answer

Use COPY table TO     WITH BINARY which is according to the documentation is  somewhat faster than the text and CSV formats   Only do this if you have millions of rows to insert  and if you are comfortable with binary data   Here is an example recipe in Python  using psycopg2 with binary input

User · Answer

In addition to excellent Craig Ringer s post and depesz s blog post  if you would like to speed up your inserts through ODBC  psqlodbc  interface by using prepared-statement inserts inside a transaction  there are a few extra things you need to do to make it work fast    Set the level-of-rollback-on-errors to  Transaction  by specifying Protocol -1 in the connection string  By default psqlodbc uses  Statement  level  which creates a SAVEPOINT for each statement rather than an entire transaction  making inserts slower  Use server-side prepared statements by specifying UseServerSidePrepare 1 in the connection string  Without this option the client sends the entire insert statement along with each row being inserted  Disable auto-commit on each statement using SQLSetConnectAttr conn  SQL ATTR AUTOCOMMIT  reinterpret cast lt SQLPOINTER gt  SQL AUTOCOMMIT OFF   0   Once all rows have been inserted  commit the transaction using SQLEndTran SQL HANDLE DBC  conn  SQL COMMIT    There is no need to explicitly open a transaction    Unfortunately  psqlodbc  implements  SQLBulkOperations by issuing a series of unprepared insert statements  so that to achieve the fastest insert one needs to code up the above steps manually

User · Answer

I spent around 6 hours on the same issue today  Inserts go at a  regular  speed  less than 3sec per 100K  up until to 5MI  out of total 30MI  rows and then the performance sinks drastically  all the way down to 1min per 100K     I will not list all of the things that did not work and cut straight to the meat    I dropped a primary key on the target table  which was a GUID  and my 30MI or rows happily flowed to their destination at a constant speed of less than 3sec per 100K

User · Answer

See populate a database in the PostgreSQL manual  depesz s excellent-as-usual article on the topic  and this SO question    Note that this answer is about bulk-loading data into an existing DB or to create a new one  If you re interested DB restore performance with pg restore or psql execution of pg dump output  much of this doesn t apply since pg dump and pg restore already do things like creating triggers and indexes after it finishes a schema data restore     There s lots to be done  The ideal solution would be to import into an UNLOGGED table without indexes  then change it to logged and add the indexes  Unfortunately in PostgreSQL 9 4 there s no support for changing tables from UNLOGGED to logged  9 5 adds ALTER TABLE     SET LOGGED to permit you to do this   If you can take your database offline for the bulk import  use pg bulkload   Otherwise    Disable any triggers on the table Drop indexes before starting the import  re-create them afterwards   It takes much less time to build an index in one pass than it does to add the same data to it progressively  and the resulting index is much more compact   If doing the import within a single transaction  it s safe to drop foreign key constraints  do the import  and re-create the constraints before committing  Do not do this if the import is split across multiple transactions as you might introduce invalid data  If possible  use COPY instead of INSERTs If you can t use COPY consider using multi-valued INSERTs if practical  You seem to be doing this already  Don t try to list too many values in a single VALUES though  those values have to fit in memory a couple of times over  so keep it to a few hundred per statement  Batch your inserts into explicit transactions  doing hundreds of thousands or millions of inserts per transaction  There s no practical limit AFAIK  but batching will let you recover from an error by marking the start of each batch in your input data  Again  you seem to be doing this already  Use synchronous commit off and a huge commit delay to reduce fsync   costs  This won t help much if you ve batched your work into big transactions  though  INSERT or COPY in parallel from several connections  How many depends on your hardware s disk subsystem  as a rule of thumb  you want one connection per physical hard drive if using direct attached storage  Set a high checkpoint segments value and enable log checkpoints  Look at the PostgreSQL logs and make sure it s not complaining about checkpoints occurring too frequently  If and only if you don t mind losing your entire PostgreSQL cluster  your database and any others on the same cluster  to catastrophic corruption if the system crashes during the import  you can stop Pg  set fsync off  start Pg  do your import  then  vitally  stop Pg and set fsync on again  See WAL configuration  Do not do this if there is already any data you care about in any database on your PostgreSQL install  If you set fsync off you can also set full page writes off  again  just remember to turn it back on after your import to prevent database corruption and data loss  See non-durable settings in the Pg manual    You should also look at tuning your system    Use good quality SSDs for storage as much as possible  Good SSDs with reliable  power-protected write-back caches make commit rates incredibly faster  They re less beneficial when you follow the advice above - which reduces disk flushes   number of fsync  s - but can still be a big help  Do not use cheap SSDs without proper power-failure protection unless you don t care about keeping your data  If you re using RAID 5 or RAID 6 for direct attached storage  stop now  Back your data up  restructure your RAID array to RAID 10  and try again  RAID 5 6 are hopeless for bulk write performance - though a good RAID controller with a big cache can help  If you have the option of using a hardware RAID controller with a big battery-backed write-back cache this can really improve write performance for workloads with lots of commits  It doesn t help as much if you re using async commit with a commit delay or if you re doing fewer big transactions during bulk loading  If possible  store WAL  pg xlog  on a separate disk   disk array  There s little point in using a separate filesystem on the same disk  People often choose to use a RAID1 pair for WAL  Again  this has more effect on systems with high commit rates  and it has little effect if you re using an unlogged table as the data load target    You may also be interested in Optimise PostgreSQL for fast testing

[sql] How to speed up insertion performance in PostgreSQL

Examples related to sql

Examples related to postgresql

Examples related to bulkinsert

Examples related to sql-insert