[database] What are the performance characteristics of sqlite with very large database files?

2020 update, about 11 years after the question was posted and later closed, preventing newer answers.

Almost everything written here is obsolete. Once upon a time sqlite was limited to the memory capacity or to 2 GB of storage (32 bits) or other popular numbers... well, that was a long time ago.

Official limitations are listed here. Practically sqlite is likely to work as long as there is storage available. It works well with dataset larger than memory, it was originally created when memory was thin and it was a very important point from the start.

There is absolutely no issue with storing 100 GB of data. It could probably store a TB just fine but eventually that's the point where you need to question whether SQLite is the best tool for the job and you probably want features from a full fledged database (remote clients, concurrent writes, read-only replicas, sharding, etc...).


Original:

I know that sqlite doesn't perform well with extremely large database files even when they are supported (there used to be a comment on the sqlite website stating that if you need file sizes above 1GB you may want to consider using an enterprise rdbms. Can't find it anymore, might be related to an older version of sqlite).

However, for my purposes I'd like to get an idea of how bad it really is before I consider other solutions.

I'm talking about sqlite data files in the multi-gigabyte range, from 2GB onwards. Anyone have any experience with this? Any tips/ideas?

This question is related to database performance sqlite

The answer is


I've experienced problems with large sqlite files when using the vacuum command.

I haven't tried the auto_vacuum feature yet. If you expect to be updating and deleting data often then this is worth looking at.


Besides the usual recommendation:

  1. Drop index for bulk insert.
  2. Batch inserts/updates in large transactions.
  3. Tune your buffer cache/disable journal /w PRAGMAs.
  4. Use a 64bit machine (to be able to use lots of cacheā„¢).
  5. [added July 2014] Use common table expression (CTE) instead of running multiple SQL queries! Requires SQLite release 3.8.3.

I have learnt the following from my experience with SQLite3:

  1. For maximum insert speed, don't use schema with any column constraint. (Alter table later as needed You can't add constraints with ALTER TABLE).
  2. Optimize your schema to store what you need. Sometimes this means breaking down tables and/or even compressing/transforming your data before inserting to the database. A great example is to storing IP addresses as (long) integers.
  3. One table per db file - to minimize lock contention. (Use ATTACH DATABASE if you want to have a single connection object.
  4. SQLite can store different types of data in the same column (dynamic typing), use that to your advantage.

Question/comment welcome. ;-)


Much of the reason that it took > 48 hours to do your inserts is because of your indexes. It is incredibly faster to:

1 - Drop all indexes 2 - Do all inserts 3 - Create indexes again


There used to be a statement in the SQLite documentation that the practical size limit of a database file was a few dozen GB:s. That was mostly due to the need for SQLite to "allocate a bitmap of dirty pages" whenever you started a transaction. Thus 256 byte of RAM were required for each MB in the database. Inserting into a 50 GB DB-file would require a hefty (2^8)*(2^10)=2^18=256 MB of RAM.

But as of recent versions of SQLite, this is no longer needed. Read more here.


We are using DBS of 50 GB+ on our platform. no complains works great. Make sure you are doing everything right! Are you using predefined statements ? *SQLITE 3.7.3

  1. Transactions
  2. Pre made statements
  3. Apply these settings (right after you create the DB)

    PRAGMA main.page_size = 4096;
    PRAGMA main.cache_size=10000;
    PRAGMA main.locking_mode=EXCLUSIVE;
    PRAGMA main.synchronous=NORMAL;
    PRAGMA main.journal_mode=WAL;
    PRAGMA main.cache_size=5000;
    

Hope this will help others, works great here


I think the main complaints about sqlite scaling is:

  1. Single process write.
  2. No mirroring.
  3. No replication.

I have a 7GB SQLite database. To perform a particular query with an inner join takes 2.6s In order to speed this up I tried adding indexes. Depending on which index(es) I added, sometimes the query went down to 0.1s and sometimes it went UP to as much as 7s. I think the problem in my case was that if a column is highly duplicate then adding an index degrades performance :(


I've created SQLite databases up to 3.5GB in size with no noticeable performance issues. If I remember correctly, I think SQLite2 might have had some lower limits, but I don't think SQLite3 has any such issues.

According to the SQLite Limits page, the maximum size of each database page is 32K. And the maximum pages in a database is 1024^3. So by my math that comes out to 32 terabytes as the maximum size. I think you'll hit your file system's limits before hitting SQLite's!


Examples related to database

Implement specialization in ER diagram phpMyAdmin - Error > Incorrect format parameter? Authentication plugin 'caching_sha2_password' cannot be loaded Room - Schema export directory is not provided to the annotation processor so we cannot export the schema SQL Query Where Date = Today Minus 7 Days MySQL Error: : 'Access denied for user 'root'@'localhost' SQL Server date format yyyymmdd How to create a foreign key in phpmyadmin WooCommerce: Finding the products in database TypeError: tuple indices must be integers, not str

Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder

Examples related to sqlite

getting " (1) no such column: _id10 " error Laravel: PDOException: could not find driver auto create database in Entity Framework Core How to open .SQLite files Accessing an SQLite Database in Swift When does SQLiteOpenHelper onCreate() / onUpgrade() run? Attempt to write a readonly database - Django w/ SELinux error Android sqlite how to check if a record exists How can I add the sqlite3 module to Python? "Insert if not exists" statement in SQLite