Multiprocessing How to use Pool map on a function defined in a class

Question

When I run something like   from multiprocessing import Pool  p   Pool 5  def f x        return x x  p map f   1 2 3     it works fine  However  putting this as a function of a class   class calculate object       def run self           def f x               return x x          p   Pool           return p map f   1 2 3    cl   calculate   print cl run     Gives me the following error   Exception in thread Thread-1  Traceback  most recent call last     File   sw lib python2 6 threading py   line 532  in   bootstrap inner     self run     File   sw lib python2 6 threading py   line 484  in run     self   target  self   args    self   kwargs    File   sw lib python2 6 multiprocessing pool py   line 225  in  handle tasks     put task  PicklingError  Can t pickle  lt type  function  gt   attribute lookup   builtin   function failed   I ve seen a post from Alex Martelli dealing with the same kind of problem  but it wasn t explicit enough

User · Answer

From http   www rueckstiess net research snippets show ca1d7d90 and http   qingkaikong blogspot com 2016 12 python-parallel-method-in-class html  We can make an external function and seed it with the class self object   from joblib import Parallel  delayed def unwrap self arg    kwarg       return square class square int  arg    kwarg   class square class      def square int self  i           return i   i      def run self  num           results              results   Parallel n jobs  -1  backend  threading                 delayed unwrap self  i  for i in zip  self  len num   num           print results    OR without joblib   from multiprocessing import Pool import time  def unwrap self f arg    kwarg       return C f  arg    kwarg   class C      def f self  name           print  hello  s   name         time sleep 5          print  nice to meet you        def run self           pool   Pool processes 2          names     frank    justin    osi    thomas           pool map unwrap self f  zip  self  len names   names    if   name         main         c   C       c run

User · Answer

I know this was asked over 6 years ago now, but just wanted to add my solution, as some of the suggestions above seem horribly complicated, but my solution was actually very simple.

All I had to do was wrap the pool.map() call to a helper function. Passing the class object along with args for the method as a tuple, which looked a bit like this.

def run_in_parallel(args):
    return args[0].method(args[1])

myclass = MyClass()
method_args = [1,2,3,4,5,6]
args_map = [ (myclass, arg) for arg in method_args ]
pool = Pool()
pool.map(run_in_parallel, args_map)

User · Answer

Here is a boilerplate I wrote for using multiprocessing Pool in python3, specifically python3.7.7 was used to run the tests. I got my fastest runs using imap_unordered. Just plug in your scenario and try it out. You can use timeit or just time.time() to figure out which works best for you.

import multiprocessing
import time

NUMBER_OF_PROCESSES = multiprocessing.cpu_count()
MP_FUNCTION = 'starmap'  # 'imap_unordered' or 'starmap' or 'apply_async'

def process_chunk(a_chunk):
    print(f"processig mp chunk {a_chunk}")
    return a_chunk


map_jobs = [1, 2, 3, 4]

result_sum = 0

s = time.time()
if MP_FUNCTION == 'imap_unordered':
    pool = multiprocessing.Pool(processes=NUMBER_OF_PROCESSES)
    for i in pool.imap_unordered(process_chunk, map_jobs):
        result_sum += i
elif MP_FUNCTION == 'starmap':
    pool = multiprocessing.Pool(processes=NUMBER_OF_PROCESSES)
    try:
        map_jobs = [(i, ) for i in map_jobs]
        result_sum = pool.starmap(process_chunk, map_jobs)
        result_sum = sum(result_sum)
    finally:
        pool.close()
        pool.join()
elif MP_FUNCTION == 'apply_async':
    with multiprocessing.Pool(processes=NUMBER_OF_PROCESSES) as pool:
        result_sum = [pool.apply_async(process_chunk, [i, ]).get() for i in map_jobs]
    result_sum = sum(result_sum)
print(f"result_sum is {result_sum}, took {time.time() - s}s")

In the above scenario imap_unordered actually seems to perform the worst for me. Try out your case and benchmark it on the machine you plan to run it on. Also read up on Process Pools. Cheers!

User · Answer

I could not use the codes posted so far because the codes using "multiprocessing.Pool" do not work with lambda expressions and the codes not using "multiprocessing.Pool" spawn as many processes as there are work items.

I adapted the code s.t. it spawns a predefined amount of workers and only iterates through the input list if there exists an idle worker. I also enabled the "daemon" mode for the workers s.t. ctrl-c works as expected.

import multiprocessing


def fun(f, q_in, q_out):
    while True:
        i, x = q_in.get()
        if i is None:
            break
        q_out.put((i, f(x)))


def parmap(f, X, nprocs=multiprocessing.cpu_count()):
    q_in = multiprocessing.Queue(1)
    q_out = multiprocessing.Queue()

    proc = [multiprocessing.Process(target=fun, args=(f, q_in, q_out))
            for _ in range(nprocs)]
    for p in proc:
        p.daemon = True
        p.start()

    sent = [q_in.put((i, x)) for i, x in enumerate(X)]
    [q_in.put((None, None)) for _ in range(nprocs)]
    res = [q_out.get() for _ in range(len(sent))]

    [p.join() for p in proc]

    return [x for i, x in sorted(res)]


if __name__ == '__main__':
    print(parmap(lambda i: i * 2, [1, 2, 3, 4, 6, 7, 8]))

User · Answer

I m not sure if this approach has been taken but a work around i m using is   from multiprocessing import Pool  t   None  def run n       return t f n   class Test object       def   init   self  number           self number   number      def f self  x           print x   self number      def pool self           pool   Pool 2          pool map run  range 10    if   name         main         t   Test 9      t pool       pool   Pool 2      pool map run  range 10     Output should be   0 9 18 27 36 45 54 63 72 81 0 9 18 27 36 45 54 63 72 81

User · Answer

This may not be a very good solution but in my case, I solve it like this.

from multiprocessing import Pool

def foo1(data):
    self = data.get('slf')
    lst = data.get('lst')
    return sum(lst) + self.foo2()

class Foo(object):
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def foo2(self):
        return self.a**self.b   

    def foo(self):
        p = Pool(5)
        lst = [1, 2, 3]
        result = p.map(foo1, (dict(slf=self, lst=lst),))
        return result

if __name__ == '__main__':
    print(Foo(2, 4).foo())

I had to pass self to my function as I have to access attributes and functions of my class through that function. This is working for me. Corrections and suggestions are always welcome.

User · Answer

class Calculate object       Your instance method to be executed   def f self  x  y       return x y  if   name         main       inp list    1 2 3    y   2   cal obj   Calculate     pool   Pool 2    results   pool map lambda x  cal obj f x  y   inp list    There is a possibility that you would want to apply this function for each different instance of the class  Then here is the solution for that also  class Calculate object       Your instance method to be executed   def   init   self  x       self x   x    def f self  y       return self x y  if   name         main       inp list    Calculate i  for i in range 3     y   2   pool   Pool 2    results   pool map lambda x  x f y   inp list

User · Answer

The solution by mrule is correct but has a bug  if the child sends back a large amount of data  it can fill the pipe s buffer  blocking on the child s pipe send    while the parent is waiting for the child to exit on pipe join     The solution is to read the child s data before join  ing the child   Furthermore the child should close the parent s end of the pipe to prevent a deadlock   The code below fixes that   Also be aware that this parmap creates one process per element in X   A more advanced solution is to use multiprocessing cpu count   to divide X into a number of chunks  and then merge the results before returning   I leave that as an exercise to the reader so as not to spoil the conciseness of the nice answer by mrule       from multiprocessing import Process  Pipe from itertools import izip  def spawn f       def fun ppipe  cpipe x           ppipe close           cpipe send f x           cpipe close       return fun  def parmap f X       pipe  Pipe   for x in X      proc  Process target spawn f  args  p c x   for x  p c  in izip X pipe        p start   for p in proc      ret    p recv   for  p c  in pipe       p join   for p in proc      return ret  if   name         main         print parmap lambda x x  x range 1 5

User · Answer

You can run your code without any issues if you somehow manually ignore the Pool object from the list of objects in the class because it is not pickleable as the error says. You can do this with the __getstate__ function (look here too) as follow. The Pool object will try to find the __getstate__ and __setstate__ functions and execute them if it finds it when you run map, map_async etc:

class calculate(object):
    def __init__(self):
        self.p = Pool()
    def __getstate__(self):
        self_dict = self.__dict__.copy()
        del self_dict['p']
        return self_dict
    def __setstate__(self, state):
        self.__dict__.update(state)

    def f(self, x):
        return x*x
    def run(self):
        return self.p.map(self.f, [1,2,3])

Then do:

cl = calculate()
cl.run()

will give you the output:

[1, 4, 9]

I've tested the above code in Python 3.x and it works.

User · Answer

I know that this question was asked 8 years and 10 months ago but I want to present you my solution:

from multiprocessing import Pool

class Test:

    def __init__(self):
        self.main()

    @staticmethod
    def methodForMultiprocessing(x):
        print(x*x)

    def main(self):
        if __name__ == "__main__":
            p = Pool()
            p.map(Test.methodForMultiprocessing, list(range(1, 11)))
            p.close()

TestObject = Test()

You just need to make your class function into a static method. But it's also possible with a class method:

from multiprocessing import Pool

class Test:

    def __init__(self):
        self.main()

    @classmethod
    def methodForMultiprocessing(cls, x):
        print(x*x)

    def main(self):
        if __name__ == "__main__":
            p = Pool()
            p.map(Test.methodForMultiprocessing, list(range(1, 11)))
            p.close()

TestObject = Test()

Tested in Python 3.7.3

User · Answer

I also was annoyed by restrictions on what sort of functions pool map could accept  I wrote the following to circumvent this  It appears to work  even for recursive use of parmap  from multiprocessing import Process  Pipe from itertools import izip  def spawn f       def fun pipe  x           pipe send f x           pipe close       return fun  def parmap f  X       pipe    Pipe   for x in X      proc    Process target spawn f   args  c  x   for x   p  c  in izip X  pipe        p start   for p in proc       p join   for p in proc      return  p recv   for  p  c  in pipe   if   name         main         print parmap lambda x  x  x  range 1  5

User · Answer

Here is my solution  which I think is a bit less hackish than most others here  It is similar to nightowl s answer   someclasses    MyClass    MyClass    MyClass     def method caller some object  some method  the method        return getattr some object  some method     othermethod   partial method caller  some method  othermethod    with Pool 6  as pool      result   pool map othermethod  someclasses

User · Answer

I ve also struggled with this  I had functions as data members of a class  as a simplified example     from multiprocessing import Pool import itertools pool   Pool   class Example object       def   init   self  my add            self f   my add       def add lists self  list1  list2             Needed to do something like this  the following line won t work          return pool map self f list1 list2      I needed to use the function self f in a Pool map   call from within the same class and self f did not take a tuple as an argument  Since this function was embedded in a class  it was not clear to me how to write the type of wrapper other answers suggested     I solved this problem by using a different wrapper that takes a tuple list  where the first element is the function  and the remaining elements are the arguments to that function  called eval func tuple f args   Using this  the problematic line can be replaced by return pool map eval func tuple  itertools izip itertools repeat self f   list1  list2    Here is the full code   File  util py    def add a  b   return a b  def eval func tuple f args          Takes a tuple of a function and args  evaluates and returns result        return f args 0   f args 1        File  main py    from multiprocessing import Pool import itertools import util    pool   Pool   class Example object       def   init   self  my add            self f   my add       def add lists self  list1  list2             The following line will now work         return pool map util eval func tuple               itertools izip itertools repeat self f   list1  list2     if   name         main         myExample   Example util add      list1    1  2  3      list2    10  20  30      print myExample add lists list1  list2      Running main py will give  11  22  33   Feel free to improve this  for example eval func tuple could also be modified to take keyword arguments   On another note  in another answers  the function  parmap  can be made more efficient for the case of more Processes than number of CPUs available  I m copying an edited version below  This is my first post and I wasn t sure if I should directly edit the original answer  I also renamed some variables   from multiprocessing import Process  Pipe   from itertools import izip    def spawn f         def fun pipe x             pipe send f x             pipe close         return fun    def parmap f X         pipe  Pipe   for x in X        processes  Process target spawn f  args  c x   for x  p c  in izip X pipe         numProcesses   len processes        processNum   0       outputList            while processNum  lt  numProcesses            endProcessNum   min processNum multiprocessing cpu count    numProcesses            for proc in processes processNum endProcessNum                 proc start             for proc in processes processNum endProcessNum                 proc join             for proc c in pipe processNum endProcessNum                 outputList append proc recv              processNum   endProcessNum       return outputList      if   name         main           print parmap lambda x x  x range 1 5

User · Answer

There is currently no solution to your problem, as far as I know: the function that you give to map() must be accessible through an import of your module. This is why robert's code works: the function f() can be obtained by importing the following code:

def f(x):
    return x*x

class Calculate(object):
    def run(self):
        p = Pool()
        return p.map(f, [1,2,3])

if __name__ == '__main__':
    cl = Calculate()
    print cl.run()

I actually added a "main" section, because this follows the recommendations for the Windows platform ("Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects").

I also added an uppercase letter in front of Calculate, so as to follow PEP 8. :)

User · Answer

I modified klaus se s method because while it was working for me with small lists  it would hang when the number of items was  1000 or greater  Instead of pushing the jobs one at a time with the None stop condition  I load up the input queue all at once and just let the processes munch on it until it s empty   from multiprocessing import cpu count  Queue  Process  def apply func f  q in  q out       while not q in empty            i  x   q in get           q out put  i  f x       map a function using a pool of processes def parmap f  X  nprocs   cpu count         q in  q out     Queue    Queue       proc    Process target apply func  args  f  q in  q out   for   in range nprocs       sent    q in put  i  x   for i  x in enumerate X        p start   for p in proc      res    q out get   for   in sent       p join   for p in proc       return  x for i x in sorted res     Edit  unfortunately now I am running into this error on my system  Multiprocessing Queue maxsize limit is 32767  hopefully the workarounds there will help

User · Answer

Functions defined in classes  even within functions within classes  don t really pickle   However  this works   def f x       return x x  class calculate object       def run self           p   Pool       return p map f   1 2 3    cl   calculate   print cl run

User · Answer

I took klaus se s and aganders3 s answer  and made a documented module that is more readable and holds in one file  You can just add it to your project  It even has an optional progress bar        The   processes   module provides some convenience functions for using parallel processes in python   Adapted from http   stackoverflow com a 16071616 287297  Example usage       print prll map lambda i  i   2   1  2  3  4  6  7  8   32  verbose True   Comments    It spawns a predefined amount of workers and only iterates through the input list  if there exists an idle worker  I also enabled the  daemon  mode for the workers so  that KeyboardInterupt works as expected    Pitfalls  all the stdouts are sent back to the parent stdout  intertwined   Alternatively  use this fork of multiprocessing   https   github com uqfoundation multiprocess        Modules   import multiprocessing from tqdm import tqdm                                                                                   def apply function func to apply  queue in  queue out       while not queue in empty            num  obj   queue in get           queue out put  num  func to apply obj                                                                                      def prll map func to apply  items  cpus None  verbose False         Number of processes to use       if cpus is None  cpus   min multiprocessing cpu count    32        Create queues       q in    multiprocessing Queue       q out   multiprocessing Queue         Process list       new proc    lambda t a  multiprocessing Process target t  args a      processes    new proc apply function   func to apply  q in  q out   for x in range cpus         Put all the items  objects  in the queue       sent    q in put  i  x   for i  x in enumerate items         Start them all       for proc in processes          proc daemon   True         proc start         Display progress bar or not       if verbose          results    q out get   for x in tqdm range len sent         else          results    q out get   for x in range len sent          Wait for them to finish       for proc in processes  proc join         Return results       return  x for i  x in sorted results                                                                                     def test        def slow square x           import time         time sleep 2          return x  2     objs      range 20      squares   prll map slow square  objs  4  verbose True      print  Result   s    squares   EDIT  Added  alexander-mcfarlane suggestion and a test function

User · Answer

Multiprocessing and pickling is broken and limited unless you jump outside the standard library.

If you use a fork of multiprocessing called pathos.multiprocesssing, you can directly use classes and class methods in multiprocessing's map functions. This is because dill is used instead of pickle or cPickle, and dill can serialize almost anything in python.

pathos.multiprocessing also provides an asynchronous map function… and it can map functions with multiple arguments (e.g. map(math.pow, [1,2,3], [4,5,6]))

See discussions: What can multiprocessing and dill do together?

and: http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization

It even handles the code you wrote initially, without modification, and from the interpreter. Why do anything else that's more fragile and specific to a single case?

>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> class calculate(object):
...  def run(self):
...   def f(x):
...    return x*x
...   p = Pool()
...   return p.map(f, [1,2,3])
... 
>>> cl = calculate()
>>> print cl.run()
[1, 4, 9]

Get the code here: https://github.com/uqfoundation/pathos

And, just to show off a little more of what it can do:

>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> 
>>> p = Pool(4)
>>> 
>>> def add(x,y):
...   return x+y
... 
>>> x = [0,1,2,3]
>>> y = [4,5,6,7]
>>> 
>>> p.map(add, x, y)
[4, 6, 8, 10]
>>> 
>>> class Test(object):
...   def plus(self, x, y): 
...     return x+y
... 
>>> t = Test()
>>> 
>>> p.map(Test.plus, [t]*4, x, y)
[4, 6, 8, 10]
>>> 
>>> res = p.amap(t.plus, x, y)
>>> res.get()
[4, 6, 8, 10]

[python] Multiprocessing: How to use Pool.map on a function defined in a class?

The answer is

Examples related to python

Examples related to multiprocessing

Examples related to pickle

Tags