How do I parallelize a simple Python loop

Question

This is probably a trivial question  but how do I parallelize the following loop in python     setup output lists output1   list   output2   list   output3   list    for j in range 0  10         calc individual parameter value     parameter   j   offset       call the calculation     out1  out2  out3   calc stuff parameter   parameter         put results into correct output list     output1 append out1      output2 append out2      output3 append out3    I know how to start single threads in Python but I don t know how to  collect  the results    Multiple processes would be fine too - whatever is easiest for this case  I m using currently Linux but the code should run on Windows and Mac as-well   What s the easiest way to parallelize this code

User · Answer

from joblib import Parallel  delayed import multiprocessing  inputs   range 10   def processInput i       return i   i  num cores   multiprocessing cpu count    results   Parallel n jobs num cores  delayed processInput  i  for i in inputs  print results    The above works beautifully on my machine  Ubuntu  package joblib was pre-installed  but can be installed via pip install joblib    Taken from https   blog dominodatalab com simple-parallelization

User · Answer

Have a look at this   http   docs python org library queue html  This might not be the right way to do it  but I d do something like   Actual code   from multiprocessing import Process  JoinableQueue as Queue   class CustomWorker Process       def   init   self workQueue  out1 out2 out3           Process   init   self          self input workQueue         self out1 out1         self out2 out2         self out3 out3     def run self               while True                  try                      value   self input get                        value modifier                     temp1 temp2 temp3   self calc stuff value                      self out1 put temp1                      self out2 put temp2                      self out3 put temp3                      self input task done                   except Queue Empty                      return                     Catch things better here     def calc stuff self param           out1   param   2         out2   param   4         out3   param   8         return out1 out2 out3 def Main        inputQueue   Queue       for i in range 10           inputQueue put i      out1   Queue       out2   Queue       out3   Queue       processes          for x in range 2             p   CustomWorker inputQueue out1 out2 out3            p daemon   True           p start             processes append p      inputQueue join       while not out1 empty             print out1 get           print out2 get           print out3 get   if   name         main         Main     Hope that helps

User · Answer

very simple example of parallel processing is  from multiprocessing import Process  output1   list   output2   list   output3   list    def yourfunction        for j in range 0  10             calc individual parameter value         parameter   j   offset           call the calculation         out1  out2  out3   calc stuff parameter parameter             put results into correct output list         output1 append out1          output2 append out2          output3 append out3   if   name         main         p   Process target pa yourfunction  args   bob         p start       p join

User · Answer

why dont you use threads  and one mutex to protect one global list   import os import re import time import sys import thread  from threading import Thread  class thread it Thread       def   init    self param           Thread   init   self          self param   param     def run self           mutex acquire           output append calc stuff self param           mutex release        threads      output      mutex   thread allocate lock    for j in range 0  10       current   thread it j   offset      threads append current      current start    for t in threads      t join     here you have output list filled with data   keep in mind  you will be as fast as your slowest thread

User · Answer

There are a number of advantages to using Ray    You can parallelize over multiple machines in addition to multiple cores  with the same code   Efficient handling of numerical data through shared memory  and zero-copy serialization   High task throughput with distributed scheduling  Fault tolerance    In your case  you could start Ray and define a remote function  import ray  ray init     ray remote num return vals 3  def calc stuff parameter None         Do something      return 1  2  3   and then invoke it in parallel  output1  output2  output3                 Launch the tasks  for j in range 10       id1  id2  id3   calc stuff remote parameter j      output1 append id1      output2 append id2      output3 append id3     Block until the results have finished and get the results  output1   ray get output1  output2   ray get output2  output3   ray get output3    To run the same example on a cluster  the only line that would change would be the call to ray init    The relevant documentation can be found here   Note that I m helping to develop Ray

User · Answer

This could be useful when implementing multiprocessing and parallel  distributed computing in Python   YouTube tutorial on using techila package  Techila is a distributed computing middleware  which integrates directly with Python using the techila package  The peach function in the package can be useful in parallelizing loop structures   Following code snippet is from the Techila Community Forums   techila peach funcname    theheavyalgorithm     Function that will be called on the compute nodes  Workers     files    theheavyalgorithm py     Python-file that will be sourced on Workers     jobs   jobcount   Number of Jobs in the Project

User · Answer

I found joblib is very useful with me  Please see following example   from joblib import Parallel  delayed def yourfunction k          s 3 14 k k     print  Area of a circle with a radius    k    is    s  element run   Parallel n jobs -1  delayed yourfunction  k  for k in range 1 10     n jobs -1  use all available cores

User · Answer

Using multiple threads on CPython won t give you better performance for pure-Python code due to the global interpreter lock  GIL    I suggest using the multiprocessing module instead   pool   multiprocessing Pool 4  out1  out2  out3   zip  pool map calc stuff  range 0  10   offset  offset      Note that this won t work in the interactive interpreter   To avoid the usual FUD around the GIL  There wouldn t be any advantage to using threads for this example anyway   You want to use processes here  not threads  because they avoid a whole bunch of problems

User · Answer

What s the easiest way to parallelize this code   Use a PoolExecutor from concurrent futures  Compare the original code with this  side by side  First  the most concise way to approach this is with executor map      with ProcessPoolExecutor   as executor      for out1  out2  out3 in executor map calc stuff  parameters                or broken down by submitting each call individually      with ThreadPoolExecutor   as executor      futures          for parameter in parameters          futures append executor submit calc stuff  parameter        for future in futures          out1  out2  out3   future result     this will block              Leaving the context signals the executor to free up resources You can use threads or processes and use the exact same interface  A working example Here is working example code  that will demonstrate the value of   Put this in a file - futuretest py  from concurrent futures import ProcessPoolExecutor  ThreadPoolExecutor from time import time from http client import HTTPSConnection  def processor intensive arg       def fib n     recursive  processor intensive calculation  avoid n  gt  36          return fib n-1    fib n-2  if n  gt  1 else n     start   time       result   fib arg      return time   - start  result  def io bound arg       start   time       con   HTTPSConnection arg      con request  GET            result   con getresponse   getcode       return time   - start  result  def manager PoolExecutor  calc stuff       if calc stuff is io bound          inputs     python org    stackoverflow com    stackexchange com                      noaa gov    parler com    aaronhall dev       else          inputs   range 25  32      timings  results   list    list       start   time       with PoolExecutor   as executor          for timing  result in executor map calc stuff  inputs                 put results into correct output list              timings append timing   results append result      finish   time       print f  calc stuff   name      PoolExecutor   name          print f wall time to execute   finish-start        print f total of timings for each call   sum timings         print f time saved by parallelizing   sum timings  -  finish-start         print dict zip inputs  results    end     n n    def main        for computation in  processor intensive  io bound           for pool executor in  ProcessPoolExecutor  ThreadPoolExecutor               manager pool executor  calc stuff computation   if   name         main         main    And here s the output for one run of python -m futuretest  processor intensive  ProcessPoolExecutor wall time to execute  0 7326343059539795 total of timings for each call  1 8033506870269775 time saved by parallelizing  1 070716381072998  25  75025  26  121393  27  196418  28  317811  29  514229  30  832040  31  1346269   processor intensive  ThreadPoolExecutor wall time to execute  1 190223217010498 total of timings for each call  3 3561410903930664 time saved by parallelizing  2 1659178733825684  25  75025  26  121393  27  196418  28  317811  29  514229  30  832040  31  1346269   io bound  ProcessPoolExecutor wall time to execute  0 533886194229126 total of timings for each call  1 2977914810180664 time saved by parallelizing  0 7639052867889404   python org   301   stackoverflow com   200   stackexchange com   200   noaa gov   301   parler com   200   aaronhall dev   200   io bound  ThreadPoolExecutor wall time to execute  0 38941240310668945 total of timings for each call  1 6049387454986572 time saved by parallelizing  1 2155263423919678   python org   301   stackoverflow com   200   stackexchange com   200   noaa gov   301   parler com   200   aaronhall dev   200   Processor-intensive analysis When performing processor intensive calculations in Python  expect the ProcessPoolExecutor to be more performant than the ThreadPoolExecutor  Due to the Global Interpreter Lock  a k a  the GIL   threads cannot use multiple processors  so expect the time for each calculation and the wall time  elapsed real time  to be greater  IO-bound analysis On the other hand  when performing IO bound operations  expect ThreadPoolExecutor to be more performant than ProcessPoolExecutor  Python s threads are real  OS  threads  They can be put to sleep by the operating system and reawakened when their information arrives  Final thoughts I suspect that multiprocessing will be slower on Windows  since Windows doesn t support forking so each new process has to take time to launch  You can nest multiple threads inside multiple processes  but it s recommended to not use multiple threads to spin off multiple processes  If faced with a heavy processing problem in Python  you can trivially scale with additional processes - but not so much with threading

User · Answer

To parallelize a simple for loop  joblib brings a lot of value to raw use of multiprocessing  Not only the short syntax  but also things like transparent bunching of iterations when they are very fast  to remove the overhead  or capturing of the traceback of the child process  to have better error reporting   Disclaimer  I am the original author of joblib

User · Answer

Let s say we have an async function  async def work async self  student name  str  code  str  loop       Some async function           Do some async procesing       That needs to be run on a large array  Some attributes are being passed to the program and some are used from property of dictionary element in the array   async def process students self  student name  str  loop       market   sys argv 2      subjects          Some large array     batchsize   5     for i in range 0  len subjects   batchsize           batch   subjects i i batchsize          await asyncio gather   self work async student name                                             sub  Code                                               loop                         for sub in batch

User · Answer

Dask futures  I m surprised no one has mentioned it yet       from dask distributed import Client  client   Client n workers 8    In this example I have 8 cores and processes  can also use threads if desired   def my function i       output    lt code to execute in the for loop here gt      return output  futures       for i in  lt whatever you want to loop across here gt       future   client submit my function  i      futures append future   results   client gather futures  client close

User · Answer

thanks  iuryxavier  from multiprocessing import Pool from multiprocessing import cpu count   def add 1 x       return x   1  if   name         main         pool   Pool cpu count        results   pool map add 1  range 10  12       pool close       TERM      pool join        KILL

User · Answer

This is the easiest way to do it   You can use asyncio   Documentation can be found here   It is used as a foundation for multiple Python asynchronous frameworks that provide high-performance network and web-servers  database connection libraries  distributed task queues  etc  Plus it has both high-level and low-level APIs to accomodate any kind of problem   import asyncio  def background f       def wrapped  args    kwargs           return asyncio get event loop   run in executor None  f   args    kwargs       return wrapped   background def your function argument        code   Now this function will be run in parallel whenever called without putting main program into wait state  You can use it to parallelize for loop as well  When called for a for loop  though loop is sequential but every iteration runs in parallel to the main program as soon as interpreter gets there  For instance    background def your function argument       time sleep 5      print  function finished for   str argument     for i in range 10       your function i    print  loop finished     This produces following output   loop finished function finished for 4 function finished for 8 function finished for 0 function finished for 3 function finished for 6 function finished for 2 function finished for 5 function finished for 7 function finished for 9 function finished for 1

[python] How do I parallelize a simple Python loop?

Examples related to python

Examples related to parallel-processing