Improve INSERT-per-second performance of SQLite

Question

Optimizing SQLite is tricky  Bulk-insert performance of a C application can vary from 85 inserts per second to over 96 000 inserts per second  Background  We are using SQLite as part of a desktop application  We have large amounts of configuration data stored in XML files that are parsed and loaded into an SQLite database for further processing when the application is initialized  SQLite is ideal for this situation because it s fast  it requires no specialized configuration  and the database is stored on disk as a single file  Rationale  Initially I was disappointed with the performance I was seeing  It turns-out that the performance of SQLite can vary significantly  both for bulk-inserts and selects  depending on how the database is configured and how you re using the API  It was not a trivial matter to figure out what all of the options and techniques were  so I thought it prudent to create this community wiki entry to share the results with Stack  Overflow readers in order to save others the trouble of the same investigations  The Experiment  Rather than simply talking about performance tips in the general sense  i e   quot Use a transaction  quot    I thought it best to write some C code and actually measure the impact of various options  We re going to start with some simple data   A 28 MB TAB-delimited text file  approximately 865 000 records  of the complete transit schedule for the city of Toronto My test machine is a 3 60 GHz P4 running Windows XP  The code is compiled with Visual C   2005 as  quot Release quot  with  quot Full Optimization quot    Ox  and Favor Fast Code   Ot   I m using the SQLite  quot Amalgamation quot   compiled directly into my test application  The SQLite version I happen to have is a bit older  3 6 7   but I suspect these results will be comparable to the latest release  please leave a comment if you think otherwise    Let s write some code  The Code  A simple C program that reads the text file line-by-line  splits the string into values and then inserts the data into an SQLite database  In this  quot baseline quot  version of the code  the database is created  but we won t actually insert data                                                                     Baseline code to experiment with SQLite performance       Input data is a 28 MB TAB-delimited text file of the     complete Toronto Transit System schedule route info     from http   www toronto ca open datasets ttc-routes                                                                    include  lt stdio h gt   include  lt stdlib h gt   include  lt time h gt   include  lt string h gt   include  quot sqlite3 h quot    define INPUTDATA  quot C   TTC schedule scheduleitem 10-27-2009 txt quot   define DATABASE  quot c   TTC schedule scheduleitem 10-27-2009 sqlite quot   define TABLE  quot CREATE TABLE IF NOT EXISTS TTC  id INTEGER PRIMARY KEY  Route ID TEXT  Branch Code TEXT  Version INTEGER  Stop INTEGER  Vehicle Index INTEGER  Day Integer  Time TEXT  quot   define BUFFER SIZE 256  int main int argc  char   argv         sqlite3   db      sqlite3 stmt   stmt      char   sErrMsg   0      char   tail   0      int nRetCode      int n   0       clock t cStartClock       FILE   pFile      char sInputBuf  BUFFER SIZE     quot  0 quot        char   sRT   0      Route        char   sBR   0      Branch        char   sVR   0      Version        char   sST   0      Stop Number        char   sVI   0      Vehicle        char   sDT   0      Date        char   sTM   0      Time         char sSQL  BUFFER SIZE     quot  0 quot                                                               Open the Database and create the Schema        sqlite3 open DATABASE   amp db       sqlite3 exec db  TABLE  NULL  NULL   amp sErrMsg                                                               Open input file and import into Database       cStartClock   clock         pFile   fopen  INPUTDATA  quot r quot        while   feof pFile              fgets  sInputBuf  BUFFER SIZE  pFile            sRT   strtok  sInputBuf   quot  t quot           Get Route            sBR   strtok  NULL   quot  t quot                  Get Branch            sVR   strtok  NULL   quot  t quot                  Get Version            sST   strtok  NULL   quot  t quot                  Get Stop Number            sVI   strtok  NULL   quot  t quot                  Get Vehicle            sDT   strtok  NULL   quot  t quot                  Get Date            sTM   strtok  NULL   quot  t quot                  Get Time                ACTUAL INSERT WILL GO HERE             n              fclose  pFile        printf  quot Imported  d records in  4 2f seconds n quot   n   clock   - cStartClock     double CLOCKS PER SEC        sqlite3 close db       return 0      The  quot Control quot  Running the code as-is doesn t actually perform any database operations  but it will give us an idea of how fast the raw C file I O and string processing operations are   Imported 864913 records in 0 94 seconds  Great  We can do 920 000 inserts per second  provided we don t actually do any inserts  -   The  quot Worst-Case-Scenario quot  We re going to generate the SQL string using the values read from the file and invoke that SQL operation using sqlite3 exec  sprintf sSQL   quot INSERT INTO TTC VALUES  NULL    s     s     s     s     s     s     s   quot   sRT  sBR  sVR  sST  sVI  sDT  sTM   sqlite3 exec db  sSQL  NULL  NULL   amp sErrMsg    This is going to be slow because the SQL will be compiled into VDBE code for every insert and every insert will happen in its own transaction  How slow   Imported 864913 records in 9933 61 seconds  Yikes  2 hours and 45 minutes  That s only 85 inserts per second  Using a Transaction By default  SQLite will evaluate every INSERT   UPDATE statement within a unique transaction  If performing a large number of inserts  it s advisable to wrap your operation in a transaction  sqlite3 exec db   quot BEGIN TRANSACTION quot   NULL  NULL   amp sErrMsg    pFile   fopen  INPUTDATA  quot r quot    while   feof pFile                 fclose  pFile    sqlite3 exec db   quot END TRANSACTION quot   NULL  NULL   amp sErrMsg     Imported 864913 records in 38 03 seconds  That s better  Simply wrapping all of our inserts in a single transaction improved our performance to 23 000 inserts per second  Using a Prepared Statement Using a transaction was a huge improvement  but recompiling the SQL statement for every insert doesn t make sense if we using the same SQL over-and-over  Let s use sqlite3 prepare v2 to compile our SQL statement once and then bind our parameters to that statement using sqlite3 bind text     Open input file and import into the database    cStartClock   clock     sprintf sSQL   quot INSERT INTO TTC VALUES  NULL   RT   BR   VR   ST   VI   DT   TM  quot    sqlite3 prepare v2 db   sSQL  BUFFER SIZE   amp stmt   amp tail    sqlite3 exec db   quot BEGIN TRANSACTION quot   NULL  NULL   amp sErrMsg    pFile   fopen  INPUTDATA  quot r quot    while   feof pFile          fgets  sInputBuf  BUFFER SIZE  pFile        sRT   strtok  sInputBuf   quot  t quot         Get Route        sBR   strtok  NULL   quot  t quot              Get Branch        sVR   strtok  NULL   quot  t quot              Get Version        sST   strtok  NULL   quot  t quot              Get Stop Number        sVI   strtok  NULL   quot  t quot              Get Vehicle        sDT   strtok  NULL   quot  t quot              Get Date        sTM   strtok  NULL   quot  t quot              Get Time         sqlite3 bind text stmt  1  sRT  -1  SQLITE TRANSIENT       sqlite3 bind text stmt  2  sBR  -1  SQLITE TRANSIENT       sqlite3 bind text stmt  3  sVR  -1  SQLITE TRANSIENT       sqlite3 bind text stmt  4  sST  -1  SQLITE TRANSIENT       sqlite3 bind text stmt  5  sVI  -1  SQLITE TRANSIENT       sqlite3 bind text stmt  6  sDT  -1  SQLITE TRANSIENT       sqlite3 bind text stmt  7  sTM  -1  SQLITE TRANSIENT        sqlite3 step stmt        sqlite3 clear bindings stmt       sqlite3 reset stmt        n      fclose  pFile    sqlite3 exec db   quot END TRANSACTION quot   NULL  NULL   amp sErrMsg    printf  quot Imported  d records in  4 2f seconds n quot   n   clock   - cStartClock     double CLOCKS PER SEC    sqlite3 finalize stmt   sqlite3 close db    return 0    Imported 864913 records in 16 27 seconds  Nice  There s a little bit more code  don t forget to call sqlite3 clear bindings and sqlite3 reset   but we ve more than doubled our performance to 53 000 inserts per second  PRAGMA synchronous   OFF By default  SQLite will pause after issuing a OS-level write command  This guarantees that the data is written to the disk  By setting synchronous   OFF  we are instructing SQLite to simply hand-off the data to the OS for writing and then continue  There s a chance that the database file may become corrupted if the computer suffers a catastrophic crash  or power failure  before the data is written to the platter     Open the database and create the schema    sqlite3 open DATABASE   amp db   sqlite3 exec db  TABLE  NULL  NULL   amp sErrMsg   sqlite3 exec db   quot PRAGMA synchronous   OFF quot   NULL  NULL   amp sErrMsg     Imported 864913 records in 12 41 seconds  The improvements are now smaller  but we re up to 69 600 inserts per second  PRAGMA journal mode   MEMORY Consider storing the rollback journal in memory by evaluating PRAGMA journal mode   MEMORY  Your transaction will be faster  but if you lose power or your program crashes during a transaction you database could be left in a corrupt state with a partially-completed transaction     Open the database and create the schema    sqlite3 open DATABASE   amp db   sqlite3 exec db  TABLE  NULL  NULL   amp sErrMsg   sqlite3 exec db   quot PRAGMA journal mode   MEMORY quot   NULL  NULL   amp sErrMsg     Imported 864913 records in 13 50 seconds  A little slower than the previous optimization at 64 000 inserts per second  PRAGMA synchronous   OFF and PRAGMA journal mode   MEMORY Let s combine the previous two optimizations  It s a little more risky  in case of a crash   but we re just importing data  not running a bank      Open the database and create the schema    sqlite3 open DATABASE   amp db   sqlite3 exec db  TABLE  NULL  NULL   amp sErrMsg   sqlite3 exec db   quot PRAGMA synchronous   OFF quot   NULL  NULL   amp sErrMsg   sqlite3 exec db   quot PRAGMA journal mode   MEMORY quot   NULL  NULL   amp sErrMsg     Imported 864913 records in 12 00 seconds  Fantastic  We re able to do 72 000 inserts per second  Using an In-Memory Database Just for kicks  let s build upon all of the previous optimizations and redefine the database filename so we re working entirely in RAM   define DATABASE  quot  memory  quot    Imported 864913 records in 10 94 seconds  It s not super-practical to store our database in RAM  but it s impressive that we can perform 79 000 inserts per second  Refactoring C Code Although not specifically an SQLite improvement  I don t like the extra char  assignment operations in the while loop  Let s quickly refactor that code to pass the output of strtok   directly into sqlite3 bind text    and let the compiler try to speed things up for us  pFile   fopen  INPUTDATA  quot r quot    while   feof pFile          fgets  sInputBuf  BUFFER SIZE  pFile        sqlite3 bind text stmt  1  strtok  sInputBuf   quot  t quot    -1  SQLITE TRANSIENT      Get Route        sqlite3 bind text stmt  2  strtok  NULL   quot  t quot    -1  SQLITE TRANSIENT         Get Branch        sqlite3 bind text stmt  3  strtok  NULL   quot  t quot    -1  SQLITE TRANSIENT         Get Version        sqlite3 bind text stmt  4  strtok  NULL   quot  t quot    -1  SQLITE TRANSIENT         Get Stop Number        sqlite3 bind text stmt  5  strtok  NULL   quot  t quot    -1  SQLITE TRANSIENT         Get Vehicle        sqlite3 bind text stmt  6  strtok  NULL   quot  t quot    -1  SQLITE TRANSIENT         Get Date        sqlite3 bind text stmt  7  strtok  NULL   quot  t quot    -1  SQLITE TRANSIENT         Get Time         sqlite3 step stmt             Execute the SQL Statement        sqlite3 clear bindings stmt         Clear bindings        sqlite3 reset stmt             Reset VDBE         n      fclose  pFile    Note  We are back to using a real database file  In-memory databases are fast  but not necessarily practical  Imported 864913 records in 8 94 seconds  A slight refactoring to the string processing code used in our parameter binding has allowed us to perform 96 700 inserts per second  I think it s safe to say that this is plenty fast  As we start to tweak other variables  i e  page size  index creation  etc   this will be our benchmark   Summary  so far  I hope you re still with me  The reason we started down this road is that bulk-insert performance varies so wildly with SQLite  and it s not always obvious what changes need to be made to speed-up our operation  Using the same compiler  and compiler options   the same version of SQLite and the same data we ve optimized our code and our usage of SQLite to go from a worst-case scenario of 85 inserts per second to over 96 000 inserts per second   CREATE INDEX then INSERT vs  INSERT then CREATE INDEX Before we start measuring SELECT performance  we know that we ll be creating indices  It s been suggested in one of the answers below that when doing bulk inserts  it is faster to create the index after the data has been inserted  as opposed to creating the index first then inserting the data   Let s try  Create Index then Insert Data sqlite3 exec db   quot CREATE  INDEX  TTC Stop Index  ON  TTC    Stop   quot   NULL  NULL   amp sErrMsg   sqlite3 exec db   quot BEGIN TRANSACTION quot   NULL  NULL   amp sErrMsg         Imported 864913 records in 18 13 seconds  Insert Data then Create Index     sqlite3 exec db   quot END TRANSACTION quot   NULL  NULL   amp sErrMsg   sqlite3 exec db   quot CREATE  INDEX  TTC Stop Index  ON  TTC    Stop   quot   NULL  NULL   amp sErrMsg     Imported 864913 records in 13 66 seconds  As expected  bulk-inserts are slower if one column is indexed  but it does make a difference if the index is created after the data is inserted  Our no-index baseline is 96 000 inserts per second  Creating the index first then inserting data gives us 47 700 inserts per second  whereas inserting the data first then creating the index gives us 63 300 inserts per second   I d gladly take suggestions for other scenarios to try    And will be compiling similar data for SELECT queries soon

User · Answer

Bulk imports seems to perform best if you can chunk your INSERT UPDATE statements   A value of 10 000 or so has worked well for me on a table with only a few rows  YMMV

User · Answer

The answer to your question is that the newer SQLite nbsp 3 has improved performance  use that   This answer Why is SQLAlchemy insert with sqlite 25 times slower than using sqlite3 directly  by SqlAlchemy Orm Author has 100k inserts in 0 5 sec  and I have seen similar results with python-sqlite and SqlAlchemy  Which leads me to believe that performance has improved with SQLite nbsp 3

User · Answer

Use ContentProvider for inserting the bulk data in db  The below method used for inserting bulk data in to database  This should Improve INSERT-per-second performance of SQLite   private SQLiteDatabase database  database   dbHelper getWritableDatabase     public int bulkInsert  NonNull Uri uri   NonNull ContentValues   values     database beginTransaction     for  ContentValues value   values   db insert  TABLE NAME   null  value    database setTransactionSuccessful    database endTransaction         Call bulkInsert method    App getAppContext   getContentResolver   bulkInsert contentUriTable              contentValuesArray     Link  https   www vogella com tutorials AndroidSQLite article html check Using ContentProvider Section for more details

User · Answer

Avoid sqlite3 clear bindings stmt    The code in the test sets the bindings every time through which should be enough   The C API intro from the SQLite docs says      Prior to calling sqlite3 step   for the first time or immediately   after sqlite3 reset    the application can invoke the   sqlite3 bind   interfaces to attach values to the parameters  Each   call to sqlite3 bind   overrides prior bindings on the same parameter   There is nothing in the docs for sqlite3 clear bindings saying you must call it in addition to simply setting the bindings   More detail  Avoid sqlite3 clear bindings

User · Answer

If you care only about reading  somewhat faster  but might read stale data  version is to read from multiple connections from multiple threads  connection per-thread    First find the items  in the table   SELECT COUNT    FROM table   then read in pages  LIMIT OFFSET    SELECT   FROM table ORDER BY  ROWID  LIMIT  lt limit gt  OFFSET  lt offset gt    where  and  are calculated per-thread  like this   int limit    count   n threads - 1  n threads    for each thread   int offset   thread index   limit   For our small  200mb  db this made 50-75  speed-up  3 8 0 2 64-bit on Windows 7   Our tables are heavily non-normalized  1000-1500 columns  roughly 100 000 or more rows    Too many or too little threads won t do it  you need to benchmark and profile yourself   Also for us  SHAREDCACHE made the performance slower  so I manually put PRIVATECACHE  cause it was enabled globally for us

User · Answer

Several tips    Put inserts updates in a transaction  For older versions of SQLite - Consider a less paranoid journal mode  pragma journal mode   There is NORMAL  and then there is OFF  which can significantly increase insert speed if you re not too worried about the database possibly getting corrupted if the OS crashes  If your application crashes the data should be fine  Note that in newer versions  the OFF MEMORY settings are not safe for application level crashes  Playing with page sizes makes a difference as well  PRAGMA page size   Having larger page sizes can make reads and writes go a bit faster as larger pages are held in memory  Note that more memory will be used for your database  If you have indices  consider calling CREATE INDEX after doing all your inserts  This is significantly faster than creating the index and then doing your inserts  You have to be quite careful if you have concurrent access to SQLite  as the whole database is locked when writes are done  and although multiple readers are possible  writes will be locked out  This has been improved somewhat with the addition of a WAL in newer SQLite versions  Take advantage of saving space   smaller databases go faster  For instance  if you have key value pairs  try making the key an INTEGER PRIMARY KEY if possible  which will replace the implied unique row number column in the table  If you are using multiple threads  you can try using the shared page cache  which will allow loaded pages to be shared between threads  which can avoid expensive I O calls  Don t use  feof file     I ve also asked similar questions here and here

User · Answer

Try using SQLITE STATIC instead of SQLITE TRANSIENT for those inserts    SQLITE TRANSIENT will cause SQLite to copy the string data before returning    SQLITE STATIC tells it that the memory address you gave it will be valid until the query has been performed  which in this loop is always the case   This will save you several allocate  copy and deallocate operations per loop  Possibly a large improvement

User · Answer

I coudn t get any gain from transactions until I raised cache size to a higher value i e   PRAGMA cache size 10000

User · Answer

On bulk inserts  Inspired by this post and by the Stack Overflow question that led me here -- Is it possible to insert multiple rows at a time in an SQLite database  -- I ve posted my first Git repository   https   github com rdpoor CreateOrUpdate  which bulk loads an array of ActiveRecords into MySQL  SQLite or PostgreSQL databases  It includes an option to ignore existing records  overwrite them or raise an error  My rudimentary benchmarks show a 10x speed improvement compared to sequential writes -- YMMV   I m using it in production code where I frequently need to import large datasets  and I m pretty happy with it

User · Answer

After reading this tutorial  I tried to implement it to my program  I have 4-5 files that contain addresses  Each file has approx 30 million records  I am using the same configuration that you are suggesting but my number of INSERTs per second is way low   10 000 records per sec   Here is where your suggestion fails  You use a single transaction for all the records and a single insert with no errors fails  Let s say that you are splitting each record into multiple inserts on different tables  What happens if the record is broken  The ON CONFLICT command does not apply  cause if you have 10 elements in a record and you need each element inserted to a different table  if element 5 gets a CONSTRAINT error  then all previous 4 inserts need to go too  So here is where the rollback comes  The only issue with the rollback is that you lose all your inserts and start from the top  How can you solve this  My solution was to use multiple transactions  I begin and end a transaction every 10 000 records  Don t ask why that number  it was the fastest one I tested   I created an array sized 10 000 and insert the successful records there  When the error occurs  I do a rollback  begin a transaction  insert the records from my array  commit and then begin a new transaction after the broken record  This solution helped me bypass the issues I have when dealing with files containing bad duplicate records  I had almost 4  bad records   The algorithm I created helped me reduce my process by 2 hours  Final loading process of file 1hr 30m which is still slow but not compared to the 4hrs that it initially took  I managed to speed the inserts from 10 000 s to  14 000 s If anyone has any other ideas on how to speed it up  I am open to suggestions  UPDATE  In Addition to my answer above  you should keep in mind that inserts per second depending on the hard drive you are using too  I tested it on 3 different PCs with different hard drives and got massive differences in times  PC1  1hr 30m   PC2  6hrs  PC3  14hrs   so I started wondering why would that be  After two weeks of research and checking multiple resources  Hard Drive  Ram  Cache  I found out that some settings on your hard drive can affect the I O rate  By clicking properties on your desired output drive you can see two options in the general tab  Opt1  Compress this drive  Opt2  Allow files of this drive to have contents indexed  By disabling these two options all 3 PCs now take approximately the same time to finish  1hr and 20 to 40min   If you encounter slow inserts check whether your hard drive is configured with these options  It will save you lots of time and headaches trying to find the solution

[c] Improve INSERT-per-second performance of SQLite

Examples related to c

Examples related to performance

Examples related to sqlite

Examples related to optimization