Why flush if you can commit?
As someone new to working with databases and sqlalchemy, the previous answers - that flush()
sends SQL statements to the DB and commit()
persists them - were not clear to me. The definitions make sense but it isn't immediately clear from the definitions why you would use a flush instead of just committing.
Since a commit always flushes (https://docs.sqlalchemy.org/en/13/orm/session_basics.html#committing) these sound really similar. I think the big issue to highlight is that a flush is not permanent and can be undone, whereas a commit is permanent, in the sense that you can't ask the database to undo the last commit (I think)
@snapshoe highlights that if you want to query the database and get results that include newly added objects, you need to have flushed first (or committed, which will flush for you). Perhaps this is useful for some people although I'm not sure why you would want to flush rather than commit (other than the trivial answer that it can be undone).
In another example I was syncing documents between a local DB and a remote server, and if the user decided to cancel, all adds/updates/deletes should be undone (i.e. no partial sync, only a full sync). When updating a single document I've decided to simply delete the old row and add the updated version from the remote server. It turns out that due to the way sqlalchemy is written, order of operations when committing is not guaranteed. This resulted in adding a duplicate version (before attempting to delete the old one), which resulted in the DB failing a unique constraint. To get around this I used flush()
so that order was maintained, but I could still undo if later the sync process failed.
See my post on this at: Is there any order for add versus delete when committing in sqlalchemy
Similarly, someone wanted to know whether add order is maintained when committing, i.e. if I add object1
then add object2
, does object1
get added to the database before object2
Does SQLAlchemy save order when adding objects to session?
Again, here presumably the use of a flush() would ensure the desired behavior. So in summary, one use for flush is to provide order guarantees (I think), again while still allowing yourself an "undo" option that commit does not provide.
Autoflush and Autocommit
Note, autoflush can be used to ensure queries act on an updated database as sqlalchemy will flush before executing the query. https://docs.sqlalchemy.org/en/13/orm/session_api.html#sqlalchemy.orm.session.Session.params.autoflush
Autocommit is something else that I don't completely understand but it sounds like its use is discouraged: https://docs.sqlalchemy.org/en/13/orm/session_api.html#sqlalchemy.orm.session.Session.params.autocommit
Memory Usage
Now the original question actually wanted to know about the impact of flush vs. commit for memory purposes. As the ability to persist or not is something the database offers (I think), simply flushing should be sufficient to offload to the database - although committing shouldn't hurt (actually probably helps - see below) if you don't care about undoing.
sqlalchemy uses weak referencing for objects that have been flushed: https://docs.sqlalchemy.org/en/13/orm/session_state_management.html#session-referencing-behavior
This means if you don't have an object explicitly held onto somewhere, like in a list or dict, sqlalchemy won't keep it in memory.
However, then you have the database side of things to worry about. Presumably flushing without committing comes with some memory penalty to maintain the transaction. Again, I'm new to this but here's a link that seems to suggest exactly this: https://stackoverflow.com/a/15305650/764365
In other words, commits should reduce memory usage, although presumably there is a trade-off between memory and performance here. In other words, you probably don't want to commit every single database change, one at a time (for performance reasons), but waiting too long will increase memory usage.