Bulk insert with SQLAlchemy ORM

Question

Is there any way to get SQLAlchemy to do a bulk insert rather than inserting each individual object  i e    doing   INSERT INTO  foo    bar   VALUES  1    2    3    rather than   INSERT INTO  foo    bar   VALUES  1  INSERT INTO  foo    bar   VALUES  2  INSERT INTO  foo    bar   VALUES  3    I ve just converted some code to use sqlalchemy rather than raw sql and although it is now much nicer to work with it seems to be slower now  up to a factor of 10   I m wondering if this is the reason   May be I could improve the situation using sessions more efficiently  At the moment I have autoCommit False and do a session commit   after I ve added some stuff  Although this seems to cause the data to go stale if the DB is changed elsewhere  like even if I do a new query I still get old results back   Thanks for your help

User · Answer

All Roads Lead to Rome  but some of them crosses mountains  requires ferries but if you want to get there quickly just take the motorway     In this case the motorway is to use the execute batch   feature of psycopg2  The documentation says it the best   The current implementation of executemany   is  using an extremely charitable understatement  not particularly performing  These functions can be used to speed up the repeated execution of a statement against a set of parameters  By reducing the number of server roundtrips the performance can be orders of magnitude better than using executemany     In my own test execute batch   is approximately twice as fast as executemany    and gives the option to configure the page size for further tweaking  if you want to squeeze the last 2-3  of performance out of the driver    The same feature can easily be enabled if you are using SQLAlchemy by setting use batch mode True as a parameter when you instantiate the engine with create engine

User · Answer

As far as I know  there is no way to get the ORM to issue bulk inserts  I believe the underlying reason is that SQLAlchemy needs to keep track of each object s identity  i e   new primary keys   and bulk inserts interfere with that  For example  assuming your foo table contains an id column and is mapped to a Foo class   x   Foo bar 1  print x id   None session add x  session flush     BEGIN   INSERT INTO foo  bar  VALUES 1    COMMIT print x id   1   Since SQLAlchemy picked up the value for x id without issuing another query  we can infer that it got the value directly from the INSERT statement  If you don t need subsequent access to the created objects via the same instances  you can skip the ORM layer for your insert   Foo   table   insert   execute    bar   1     bar   2     bar   3      INSERT INTO foo  bar  VALUES   1     2     3      SQLAlchemy can t match these new rows with any existing objects  so you ll have to query them anew for any subsequent operations   As far as stale data is concerned  it s helpful to remember that the session has no built-in way to know when the database is changed outside of the session  In order to access externally modified data through existing instances  the instances must be marked as expired  This happens by default on session commit    but can be done manually by calling session expire all   or session expire instance   An example  SQL omitted    x   Foo bar 1  session add x  session commit   print x bar   1 foo update   execute bar 42  print x bar   1 session expire x  print x bar   42   session commit   expires x  so the first print statement implicitly opens a new transaction and re-queries x s attributes  If you comment out the first print statement  you ll notice that the second one now picks up the correct value  because the new query isn t emitted until after the update   This makes sense from the point of view of transactional isolation - you should only pick up external modifications between transactions  If this is causing you trouble  I d suggest clarifying or  re-thinking your application s transaction boundaries instead of immediately reaching for session expire all

User · Answer

Direct support was added to SQLAlchemy as of version 0 8  As per the docs  connection execute table insert   values data   should do the trick   Note that this is not the same as connection execute table insert    data  which results in many individual row inserts via a call to executemany   On anything but a local connection the difference in performance can be enormous

User · Answer

This is a way   values    1  2  3  Foo   table   insert   execute    bar   x  for x in values     This will insert like this   INSERT INTO  foo    bar   VALUES  1    2    3    Reference  The SQLAlchemy FAQ includes benchmarks for various commit methods

User · Answer

Piere s answer is correct but one issue is that bulk save objects by default does not return the primary keys of the objects  if that is of concern to you  Set return defaults to True to get this behavior   The documentation is here   foos    Foo bar  a     Foo bar  b    Foo bar  c    session bulk save objects foos  return defaults True  for foo in foos      assert foo id is not None session commit

User · Answer

The sqlalchemy docs have a writeup on the performance of various techniques that can be used for bulk inserts      ORMs are basically not intended for high-performance bulk inserts -   this is the whole reason SQLAlchemy offers the Core in addition to the   ORM as a first-class component       For the use case of fast bulk inserts  the SQL generation and   execution system that the ORM builds on top of is part of the Core    Using this system directly  we can produce an INSERT that is   competitive with using the raw database API directly       Alternatively  the SQLAlchemy ORM offers the Bulk Operations suite of   methods  which provide hooks into subsections of the unit of work   process in order to emit Core-level INSERT and UPDATE constructs with   a small degree of ORM-based automation       The example below illustrates time-based tests for several different   methods of inserting rows  going from the most automated to the least    With cPython 2 7  runtimes observed   classics-MacBook-Pro sqlalchemy classic  python test py SQLAlchemy ORM  Total time for 100000 records 12 0471920967 secs SQLAlchemy ORM pk given  Total time for 100000 records 7 06283402443 secs SQLAlchemy ORM bulk save objects    Total time for 100000 records 0 856323003769 secs SQLAlchemy Core  Total time for 100000 records 0 485800027847 secs sqlite3  Total time for 100000 records 0 487842082977 sec       Script   import time import sqlite3  from sqlalchemy ext declarative import declarative base from sqlalchemy import Column  Integer  String   create engine from sqlalchemy orm import scoped session  sessionmaker  Base   declarative base   DBSession   scoped session sessionmaker    engine   None   class Customer Base         tablename      customer      id   Column Integer  primary key True      name   Column String 255     def init sqlalchemy dbname  sqlite    sqlalchemy db        global engine     engine   create engine dbname  echo False      DBSession remove       DBSession configure bind engine  autoflush False  expire on commit False      Base metadata drop all engine      Base metadata create all engine    def test sqlalchemy orm n 100000       init sqlalchemy       t0   time time       for i in xrange n           customer   Customer           customer name    NAME     str i          DBSession add customer          if i   1000    0              DBSession flush       DBSession commit       print           SQLAlchemy ORM  Total time for     str n              records     str time time   - t0      secs     def test sqlalchemy orm pk given n 100000       init sqlalchemy       t0   time time       for i in xrange n           customer   Customer id i 1  name  NAME     str i           DBSession add customer          if i   1000    0              DBSession flush       DBSession commit       print           SQLAlchemy ORM pk given  Total time for     str n              records     str time time   - t0      secs     def test sqlalchemy orm bulk insert n 100000       init sqlalchemy       t0   time time       n1   n     while n1  gt  0          n1   n1 - 10000         DBSession bulk insert mappings              Customer                                dict name  NAME     str i                   for i in xrange min 10000  n1                               DBSession commit       print           SQLAlchemy ORM bulk save objects    Total time for     str n              records     str time time   - t0      secs     def test sqlalchemy core n 100000       init sqlalchemy       t0   time time       engine execute          Customer   table   insert               name    NAME     str i   for i in xrange n             print           SQLAlchemy Core  Total time for     str n              records     str time time   - t0      secs     def init sqlite3 dbname       conn   sqlite3 connect dbname      c   conn cursor       c execute  DROP TABLE IF EXISTS customer       c execute           CREATE TABLE customer  id INTEGER NOT NULL             name VARCHAR 255   PRIMARY KEY id         conn commit       return conn   def test sqlite3 n 100000  dbname  sqlite3 db        conn   init sqlite3 dbname      c   conn cursor       t0   time time       for i in xrange n           row     NAME     str i            c execute  INSERT INTO customer  name  VALUES       row      conn commit       print           sqlite3  Total time for     str n              records     str time time   - t0      sec    if   name         main         test sqlalchemy orm 100000      test sqlalchemy orm pk given 100000      test sqlalchemy orm bulk insert 100000      test sqlalchemy core 100000      test sqlite3 100000

User · Answer

SQLAlchemy introduced that in version 1 0 0   Bulk operations - SQLAlchemy docs  With these operations  you can now do bulk inserts or updates   For instance  if you want the lowest overhead for simple table INSERTs   you can use Session bulk insert mappings     loadme     1   a               2   b               3   c    dicts    dict bar t 0   fly t 1   for t in loadme   s   Session   s bulk insert mappings Foo  dicts  s commit     Or  if you want  skip the loadme tuples and write the dictionaries directly into dicts  but I find it easier to leave all the wordiness out of the data and load up a list of dictionaries in a loop

User · Answer

I usually do it using add all   from app import session from models import User  objects    User name  u1    User name  u2    User name  u3    session add all objects  session commit

User · Answer

SQLAlchemy introduced that in version 1 0 0   Bulk operations - SQLAlchemy docs  With these operations  you can now do bulk inserts or updates   For instance  you can do   s   Session   objects         User name  u1        User name  u2        User name  u3     s bulk save objects objects  s commit     Here  a bulk insert will be made

User · Answer

The best answer I found so far was in sqlalchemy documentation   http   docs sqlalchemy org en latest faq performance html i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow  There is a complete example of a benchmark of possible solutions   As shown in the documentation   bulk save objects is not the best solution but it performance are correct   The second best implementation in terms of readability I think was with the SQLAlchemy Core   def test sqlalchemy core n 100000       init sqlalchemy       t0   time time       engine execute          Customer   table   insert                   name    NAME     str i   for i in xrange n           The context of this function is given in the documentation article

[python] Bulk insert with SQLAlchemy ORM

Examples related to python

Examples related to mysql

Examples related to database

Examples related to orm

Examples related to sqlalchemy