SQLAlchemy What s the difference between flush and commit

Question

What the difference is between flush   and commit   in SQLAlchemy   I ve read the docs  but am none the wiser - they seem to assume a pre-understanding that I don t have   I m particularly interested in their impact on memory usage  I m loading some data into a database from a series of files  around 5 million rows in total  and my session is occasionally falling over - it s a large database and a machine with not much memory    I m wondering if I m using too many commit   and not enough flush   calls - but without really understanding what the difference is  it s hard to tell

User · Answer

Why flush if you can commit   As someone new to working with databases and sqlalchemy  the previous answers - that flush   sends SQL statements to the DB and commit   persists them - were not clear to me  The definitions make sense but it isn t immediately clear from the definitions why you would use a flush instead of just committing   Since a commit always flushes  https   docs sqlalchemy org en 13 orm session basics html committing  these sound really similar  I think the big issue to highlight is that a flush is not permanent and can be undone  whereas a commit is permanent  in the sense that you can t ask the database to undo the last commit  I think    snapshoe highlights that if you want to query the database and get results that include newly added objects  you need to have flushed first  or committed  which will flush for you   Perhaps this is useful for some people although I m not sure why you would want to flush rather than commit  other than the trivial answer that it can be undone    In another example I was syncing documents between a local DB and a remote server  and if the user decided to cancel  all adds updates deletes should be undone  i e  no partial sync  only a full sync   When updating a single document I ve decided to simply delete the old row and add the updated version from the remote server  It turns out that due to the way sqlalchemy is written  order of operations when committing is not guaranteed  This resulted in adding a duplicate version  before attempting to delete the old one   which resulted in the DB failing a unique constraint  To get around this I used flush   so that order was maintained  but I could still undo if later the sync process failed   See my post on this at  Is there any order for add versus delete when committing in sqlalchemy  Similarly  someone wanted to know whether add order is maintained when committing  i e  if I add object1 then add object2  does object1 get added to the database before object2 Does SQLAlchemy save order when adding objects to session   Again  here presumably the use of a flush   would ensure the desired behavior  So in summary  one use for flush is to provide order guarantees  I think   again while still allowing yourself an  undo  option that commit does not provide   Autoflush and Autocommit  Note  autoflush can be used to ensure queries act on an updated database as sqlalchemy will flush before executing the query  https   docs sqlalchemy org en 13 orm session api html sqlalchemy orm session Session params autoflush  Autocommit is something else that I don t completely understand but it sounds like its use is discouraged  https   docs sqlalchemy org en 13 orm session api html sqlalchemy orm session Session params autocommit  Memory Usage  Now the original question actually wanted to know about the impact of flush vs  commit for memory purposes  As the ability to persist or not is something the database offers  I think   simply flushing should be sufficient to offload to the database - although committing shouldn t hurt  actually probably helps - see below  if you don t care about undoing    sqlalchemy uses weak referencing for objects that have been flushed  https   docs sqlalchemy org en 13 orm session state management html session-referencing-behavior  This means if you don t have an object explicitly held onto somewhere  like in a list or dict  sqlalchemy won t keep it in memory   However  then you have the database side of things to worry about  Presumably flushing without committing comes with some memory penalty to maintain the transaction  Again  I m new to this but here s a link that seems to suggest exactly this  https   stackoverflow com a 15305650 764365  In other words  commits should reduce memory usage  although presumably there is a trade-off between memory and performance here  In other words  you probably don t want to commit every single database change  one at a time  for performance reasons   but waiting too long will increase memory usage

User · Answer

A Session object is basically an ongoing transaction of changes to a database  update  insert  delete   These operations aren t persisted to the database until they are committed  if your program aborts for some reason in mid-session transaction  any uncommitted changes within are lost    The session object registers transaction operations with session add    but doesn t yet communicate them to the database until session flush   is called    session flush   communicates a series of operations to the database  insert  update  delete   The database maintains them as pending operations in a transaction  The changes aren t persisted permanently to disk  or visible to other transactions until the database receives a COMMIT for the current transaction  which is what session commit   does    session commit   commits  persists  those changes to the database   flush   is always called as part of a call to commit    1    When you use a Session object to query the database  the query will return results both from the database and from the flushed parts of the uncommitted transaction it holds  By default  Session objects autoflush their operations  but this can be disabled   Hopefully this example will make this clearer    --- s   Session    s add Foo  A      The Foo  A   object has been added to the session                    It has not been committed to the database yet                      but is returned as part of a query  print 1  s query Foo  all   s commit     --- s2   Session   s2 autoflush   False  s2 add Foo  B    print 2  s2 query Foo  all     The Foo  B   object is  not  returned                                  as part of this query because it hasn t                                  been flushed yet  s2 flush                       Now  Foo  B   is in the same state as                                  Foo  A   was above  print 3  s2 query Foo  all    s2 rollback                    Foo  B   has not been committed  and rolling                                  back the session s transaction removes it                                  from the session  print 4  s2 query Foo  all     --- Output  1   lt Foo  A   gt   2   lt Foo  A   gt   3   lt Foo  A   gt    lt Foo  B   gt   4   lt Foo  A   gt

User · Answer

commit    records these changes in the database  flush    is always called as part of the commit     1  call  When you use a Session object to query a database  the query returns results from both the database and the reddened parts of the unrecorded transaction it is performing

User · Answer

Use flush when you need to write  for example to get a primary key ID from an autoincrementing counter  john Person name  John Smith   parent None  session add john  session flush    son Person name  Bill Smith   parent john id   Without flushing  john would never get an ID from the DB and so couldn t represent the parent child relationship in code  Like others have said  without commit   none of this will be permanently persisted to DB

User · Answer

This does not strictly answer the original question but some people have mentioned that with session autoflush   True you don t have to use session flush      And this is not always true   If you want to use the id of a newly created object in the middle of a transaction  you must call session flush       Given a model with at least this id class AModel Base      id   Column Integer  primary key True     autoincrement by default on integer primary key  session autoflush   True  a   AModel   session add a  a id    None session flush   a id    autoincremented integer   This is because autoflush does NOT auto fill the id  although a query of the object will  which sometimes can cause confusion as in  why this works here but not there   But snapshoe already covered this part      One related aspect that seems pretty important to me and wasn t really mentioned   Why would you not commit all the time  - The answer is atomicity   A fancy word to say  an ensemble of operations have to all be executed successfully OR none of them will take effect   For example  if you want to create update delete some object  A  and then create update delete another  B   but if  B  fails you want to revert  A   This means those 2 operations are atomic   Therefore  if  B  needs a result of  A   you want to call flush after  A  and commit after  B    Also  if session autoflush is True  except for the case that I mentioned above or others in Jimbo s answer  you will not need to call flush manually

User · Answer

As  snapshoe says  flush   sends your SQL statements to the database commit   commits the transaction   When session autocommit    False  commit   will call flush   if you set autoflush    True  When session autocommit    True  You can t call commit   if you haven t started a transaction  which you probably haven t since you would probably only use this mode to avoid manually managing transactions   In this mode  you must call flush   to save your ORM changes  The flush effectively also commits your data

[python] SQLAlchemy: What's the difference between flush() and commit()?

Examples related to python

Examples related to sqlalchemy