Python multiprocessing PicklingError Can t pickle type function

Question

I am sorry that I can t reproduce the error with a simpler example  and my code is too complicated to post  If I run the program in IPython shell instead of the regular Python  things work out well    I looked up some previous notes on this problem  They were all caused by using pool to call function defined within a class function  But this is not the case for me   Exception in thread Thread-3  Traceback  most recent call last     File   usr lib64 python2 7 threading py   line 552  in   bootstrap inner     self run     File   usr lib64 python2 7 threading py   line 505  in run     self   target  self   args    self   kwargs    File   usr lib64 python2 7 multiprocessing pool py   line 313  in  handle tasks     put task  PicklingError  Can t pickle  lt type  function  gt   attribute lookup   builtin   function failed   I would appreciate any help    Update  The function I pickle is defined at the top level of the module  Though it calls a function that contains a nested function  i e  f   calls g   calls h   which has a nested function i    and I am calling pool apply async f   f    g    h   are all defined at the top level  I tried simpler example with this pattern and it works though

User · Answer

When this problem comes up with multiprocessing a simple solution is to switch from Pool to ThreadPool  This can be done with no change of code other than the import-  from multiprocessing pool import ThreadPool as Pool   This works because ThreadPool shares memory with the main thread  rather than creating a new process- this means that pickling is not required   The downside to this method is that python isn t the greatest language with handling threads- it uses something called the Global Interpreter Lock to stay thread safe  which can slow down some use cases here  However  if you re primarily interacting with other systems  running HTTP commands  talking with a database  writing to filesystems  then your code is likely not bound by CPU and won t take much of a hit  In fact I ve found when writing HTTP HTTPS benchmarks that the threaded model used here has less overhead and delays  as the overhead from creating new processes is much higher than the overhead for creating new threads   So if you re processing a ton of stuff in python userspace this might not be the best method

User · Answer

As others have said multiprocessing can only transfer Python objects to worker processes which can be pickled  If you cannot reorganize your code as described by unutbu  you can use dills extended pickling unpickling capabilities for transferring data  especially code data  as I show below   This solution requires only the installation of dill and no other libraries as pathos   import os from multiprocessing import Pool  import dill   def run dill encoded payload       fun  args   dill loads payload      return fun  args    def apply async pool  fun  args       payload   dill dumps  fun  args       return pool apply async run dill encoded   payload      if   name         main          pool   Pool processes 5         asyn execution of lambda     jobs          for i in range 10           job   apply async pool  lambda a  b   a  b  a   b    i  i   1           jobs append job       for job in jobs          print job get       print        async execution of static method      class O object             staticmethod         def calc                return os getpid        jobs          for i in range 10           job   apply async pool  O calc              jobs append job       for job in jobs          print job get

User · Answer

Can t pickle  lt type  function  gt   attribute lookup   builtin   function failed    This error will also come if you have any inbuilt function inside the model object that was passed to the async job    So make sure to check the model objects that are passed doesn t have inbuilt functions   In our case we were using FieldTracker   function of django-model-utils inside the model to track a certain field   Here is the link to relevant GitHub issue

User · Answer

I d use pathos multiprocesssing  instead of multiprocessing   pathos multiprocessing is a fork of multiprocessing that uses dill  dill can serialize almost anything in python  so you are able to send a lot more around in parallel   The pathos fork also has the ability to work directly with multiple argument functions  as you need for class methods    gt  gt  gt  from pathos multiprocessing import ProcessingPool as Pool  gt  gt  gt  p   Pool 4   gt  gt  gt  class Test object         def plus self  x  y            return x y       gt  gt  gt  t   Test    gt  gt  gt  p map t plus  x  y   4  6  8  10   gt  gt  gt    gt  gt  gt  class Foo object          staticmethod       def work self  x           return x 1       gt  gt  gt  f   Foo    gt  gt  gt  p apipe f work  f  100   lt processing pool ApplyResult object at 0x10504f8d0 gt   gt  gt  gt  res      gt  gt  gt  res get   101   Get pathos  and if you like  dill  here   https   github com uqfoundation

User · Answer

I have found that I can also generate exactly that error output on a perfectly working piece of code by attempting to use the profiler on it   Note that this was on Windows  where the forking is a bit less elegant    I was running   python -m profile -o output pstats  lt script gt     And found that removing the profiling removed the error and placing the profiling restored it  Was driving me batty too because I knew the code used to work  I was checking to see if something had updated pool py    then had a sinking feeling and eliminated the profiling and that was it   Posting here for the archives in case anybody else runs into it

User · Answer

Building on  rocksportrocker solution   It would make sense to dill when sending and RECVing the results   import dill import itertools def run dill encoded payload       fun  args   dill loads payload      res   fun  args      res   dill dumps res      return res  def dill map async pool  fun  args list                     as tuple True                       kw       if as tuple          args list     x   for x in args list       it   itertools izip          itertools cycle  fun            args list      it   itertools imap dill dumps  it      return pool map async run dill encoded  it    kw   if   name         main         import multiprocessing as mp     import sys os     p   mp Pool 4      res   dill map async p  lambda x  sys stdout write   s n  os getpid    x  -1                      lambda x x 1  10       res   res get timeout 100      res   map dill loads res      print res

User · Answer

This solution requires only the installation of dill and no other libraries as pathos   def apply packed function for map  dumped function  item  args  kwargs                 Unpack dumped function as target function and call it with arguments        param  dumped function  item  args  kwargs           a tuple of dumped function and its arguments      return          result of target function             target function   dill loads dumped function      res   target function item   args    kwargs      return res   def pack function for map target function  items   args    kwargs               Pack function and arguments to object that can be sent from one     multiprocessing Process to another  The main problem is            multiprocessing Pool map    or   apply            cannot use class methods or closures      It solves this problem with   dill        It works with target function as argument  dumps it    with dill        and returns dumped function with arguments of target function      For more performance we dump only target function itself     and don t dump its arguments      How to use  pseudo-code              gt  gt  gt  import multiprocessing           gt  gt  gt  images                   gt  gt  gt  pool   multiprocessing Pool 100500            gt  gt  gt  features   pool map                    pack function for map                       super Extractor  self  extract features                       images                       type  png                         options                                              gt  gt  gt        param target function          function  that you want to execute like  target function item   args    kwargs        param items          list of items for map      param args          positional arguments for target function item   args    kwargs       param kwargs          named arguments for target function item   args    kwargs       return  tuple function wrapper  dumped items          It returs a tuple with               function wrapper  that unpack and call target function                list of packed target function and its  arguments              dumped function   dill dumps target function      dumped items     dumped function  item  args  kwargs  for item in items      return apply packed function for map  dumped items   It also works for numpy arrays

User · Answer

Here is a list of what can be pickled  In particular  functions are only picklable if they are defined at the top-level of a module   This piece of code   import multiprocessing as mp  class Foo         staticmethod     def work self           pass  if   name         main            pool   mp Pool       foo   Foo       pool apply async foo work      pool close       pool join     yields an error almost identical to the one you posted   Exception in thread Thread-2  Traceback  most recent call last     File   usr lib python2 7 threading py   line 552  in   bootstrap inner     self run     File   usr lib python2 7 threading py   line 505  in run     self   target  self   args    self   kwargs    File   usr lib python2 7 multiprocessing pool py   line 315  in  handle tasks     put task  PicklingError  Can t pickle  lt type  function  gt   attribute lookup   builtin   function failed   The problem is that the pool methods all use a mp SimpleQueue to pass tasks to the worker processes  Everything that goes through the mp SimpleQueue must be pickable  and foo work is not picklable since it is not defined at the top level of the module   It can be fixed by defining a function at the top level  which calls foo work     def work foo       foo work    pool apply async work args  foo      Notice that foo is pickable  since Foo is defined at the top level and  foo   dict   is picklable

[python] Python multiprocessing PicklingError: Can't pickle <type 'function'>

Examples related to python

Examples related to multiprocessing

Examples related to pickle

Examples related to python-multiprocessing