Multiprocessing and Django DB don't play well together.
I ended up closing Django DB connection first thing in the new process.
So that one will have no references to the connection used by the parent.
from multiprocessing import Pool
multi_core_arg = [[1,2,3], [4,5,6], [7,8,9]]
n_cpu = 4
pool = Pool(n_cpu)
pool.map(_etl_, multi_core_arg)
pool.close()
pool.join()
def _etl_(x):
from django.db import connection
connection.close()
print(x)
OR
Process.start()
calls a function which starts with
Some other suggest to use
from multiprocessing.dummy import Pool as ThreadPool
It solved my (2013, Lost connection) problem, but thread use GIL, when doing IO, to will release it when IO finish.
Comparatively, Process spawn a group of workers that communication each other, which may be slower.
I recommend you to time it. A side tips is to use joblib which is backed by scikit-learn project. some performance result shows it out perform the native Pool().. although it leave the responsibility to coder to verify the true run time cost.