Multiprocessing vs Threading Python

Question

I am trying to understand the advantages of multiprocessing over threading  I know that multiprocessing gets around the Global Interpreter Lock  but what other advantages are there  and can threading not do the same thing

User · Answer

As mentioned in the question  Multiprocessing in Python is the only real way to achieve true parallelism  Multithreading cannot achieve this because the GIL prevents threads from running in parallel    As a consequence  threading may not always be useful in Python  and in fact  may even result in worse performance depending on what you are trying to achieve  For example  if you are performing a CPU-bound task such as decompressing gzip files or 3D-rendering  anything CPU intensive  then threading may actually hinder your performance rather than help  In such a case  you would want to use Multiprocessing as only this method actually runs in parallel and will help distribute the weight of the task at hand  There could be some overhead to this since Multiprocessing involves copying the memory of a script into each subprocess which may cause issues for larger-sized applications   However  Multithreading becomes useful when your task is IO-bound  For example  if most of your task involves waiting on API-calls  you would use Multithreading because why not start up another request in another thread while you wait  rather than have your CPU sit idly by   TL DR   Multithreading is concurrent and is used for IO-bound tasks Multiprocessing achieves true parallelism and is used for CPU-bound tasks

User · Answer

Another thing not mentioned is that it depends on what OS you are using where speed is concerned  In Windows processes are costly so threads would be better in windows but in unix processes are faster than their windows variants so using processes in unix is much safer plus quick to spawn

User · Answer

MULTIPROCESSING   Multiprocessing adds CPUs to increase computing power  Multiple processes are executed concurrently  Creation of a process is time-consuming and resource intensive  Multiprocessing can be symmetric or asymmetric          The multiprocessing library in Python uses separate memory space  multiple CPU cores  bypasses GIL limitations in CPython  child processes are killable  ex  function calls in program  and is much easier to use    Some caveats of the module are a larger memory footprint and IPC   s a little more complicated with more overhead       MULTITHREADING   Multithreading creates multiple threads of a single process to increase computing power  Multiple threads of a single process are executed concurrently  Creation of a thread is economical in both sense time and resource          The multithreading library is lightweight  shares memory  responsible for responsive UI and is used well for I O bound applications     The module isn   t killable and is subject to the GIL     Multiple threads live in the same process in the same space  each thread will do a specific task  have its own code  own stack memory  instruction pointer  and share heap memory     If a thread has a memory leak it can damage the other threads and parent process       Example of Multi-threading and Multiprocessing using Python  Python 3 has the facility of Launching parallel tasks  This makes our work easier    It has for thread pooling and Process pooling    The following gives an insight   ThreadPoolExecutor Example  import concurrent futures import urllib request  URLS     http   www foxnews com             http   www cnn com             http   europe wsj com             http   www bbc co uk             http   some-made-up-domain com       Retrieve a single page and report the URL and contents def load url url  timeout       with urllib request urlopen url  timeout timeout  as conn          return conn read      We can use a with statement to ensure threads are cleaned up promptly with concurrent futures ThreadPoolExecutor max workers 5  as executor        Start the load operations and mark each future with its URL     future to url    executor submit load url  url  60   url for url in URLS      for future in concurrent futures as completed future to url           url   future to url future          try              data   future result           except Exception as exc              print   r generated an exception   s     url  exc           else              print   r page is  d bytes     url  len data      ProcessPoolExecutor  import concurrent futures import math  PRIMES         112272535095293      112582705942171      112272535095293      115280095190773      115797848077099      1099726899285419   def is prime n       if n   2    0          return False      sqrt n   int math floor math sqrt n        for i in range 3  sqrt n   1  2           if n   i    0              return False     return True  def main        with concurrent futures ProcessPoolExecutor   as executor          for number  prime in zip PRIMES  executor map is prime  PRIMES                print   d is prime   s     number  prime    if   name         main         main

User · Answer

Here are some pros cons I came up with   Multiprocessing  Pros   Separate memory space Code is usually straightforward Takes advantage of multiple CPUs  amp  cores Avoids GIL limitations for cPython Eliminates most needs for synchronization primitives unless if you use shared memory  instead  it s more of a communication model for IPC  Child processes are interruptible killable Python multiprocessing module includes useful abstractions with an interface much like threading Thread A must with cPython for CPU-bound processing   Cons   IPC a little more complicated with more overhead  communication model vs  shared memory objects  Larger memory footprint   Threading  Pros   Lightweight - low memory footprint Shared memory - makes access to state from another context easier Allows you to easily make responsive UIs cPython C extension modules that properly release the GIL will run in parallel Great option for I O-bound applications   Cons   cPython - subject to the GIL Not interruptible killable If not following a command queue message pump model  using the Queue module   then manual use of synchronization primitives become a necessity  decisions are needed for the granularity of locking  Code is usually harder to understand and to get right - the potential for race conditions increases dramatically

User · Answer

Threads share the same memory space to guarantee that two threads don t share the same memory location so special precautions must be taken the CPython interpreter handles this using a mechanism called GIL  or the Global Interpreter Lock  what is GIL Just I want to Clarify GIL it s repeated above    In CPython  the global interpreter lock  or GIL  is a mutex that protects access to Python objects  preventing multiple threads from executing Python bytecodes at once  This lock is necessary mainly because CPython s memory management is not thread-safe   For the main question  we can compare using Use Cases  How   1-Use Cases for Threading  in case of GUI programs threading can be used to make the application responsive  For example  in a text editing program  one thread can take care of recording the user inputs  another can be responsible for displaying the text  a third can do spell-checking  and so on  Here  the program has to wait for user interaction  which is the biggest bottleneck  Another use case for threading is programs that are IO bound or network bound  such as web-scrapers   2-Use Cases for Multiprocessing  Multiprocessing outshines threading in cases where the program is CPU intensive and doesn   t have to do any IO or user interaction   For More Details visit this link and link or you need in-depth knowledge for threading visit here for Multiprocessing visit here

User · Answer

Python documentation quotes  The canonical version of this answer is now at the dupliquee question  What are the differences between the threading and multiprocessing modules   I ve highlighted the key Python documentation quotes about Process vs Threads and the GIL at  What is the global interpreter lock  GIL  in CPython   Process vs thread experiments  I did a bit of benchmarking in order to show the difference more concretely   In the benchmark  I timed CPU and IO bound work for various numbers of threads on an 8 hyperthread CPU  The work supplied per thread is always the same  such that more threads means more total work supplied   The results were     Plot data   Conclusions    for CPU bound work  multiprocessing is always faster  presumably due to the GIL for IO bound work  both are exactly the same speed threads only scale up to about 4x instead of the expected 8x since I m on an 8 hyperthread machine   Contrast that with a C POSIX CPU-bound work which reaches the expected 8x speedup  What do   39 real  39     39 user  39  and   39 sys  39  mean in the output of time 1    TODO  I don t know the reason for this  there must be other Python inefficiencies coming into play    Test code      usr bin env python3  import multiprocessing import threading import time import sys  def cpu func result  niters               A useless CPU bound function              for i in range niters           result    result   result   i   2   result   i   i   3    10000000     return result  class CpuThread threading Thread       def   init   self  niters           super     init             self niters   niters         self result   1     def run self           self result   cpu func self result  self niters   class CpuProcess multiprocessing Process       def   init   self  niters           super     init             self niters   niters         self result   1     def run self           self result   cpu func self result  self niters   class IoThread threading Thread       def   init   self  sleep           super     init             self sleep   sleep         self result   self sleep     def run self           time sleep self sleep   class IoProcess multiprocessing Process       def   init   self  sleep           super     init             self sleep   sleep         self result   self sleep     def run self           time sleep self sleep   if   name         main         cpu n iters   int sys argv 1       sleep   1     cpu count   multiprocessing cpu count       input params              CpuThread  cpu n iters            CpuProcess  cpu n iters            IoThread  sleep            IoProcess  sleep             header     nthreads       for thread class    in input params          header append thread class   name        print     join header       for nthreads in range 1  2   cpu count           results    nthreads          for thread class  work size in input params              start time   time time               threads                  for i in range nthreads                   thread   thread class work size                  threads append thread                  thread start               for i  thread in enumerate threads                   thread join               results append time time   - start time          print     join     6e   format result  for result in results     GitHub upstream   plotting code on same directory   Tested on Ubuntu 18 10  Python 3 6 7  in a Lenovo ThinkPad P51 laptop with CPU  Intel Core i7-7820HQ CPU  4 cores   8 threads   RAM  2x Samsung M471A2K43BB1-CRC  2x 16GiB   SSD  Samsung MZVLB512HAJQ-000L7  3 000 MB s    Visualize which threads are running at a given time  This post https   rohanvarma me GIL  taught me that you can run a callback whenever a thread is scheduled with the target  argument of threading Thread and the same for multiprocessing Process   This allows us to view exactly which thread runs at each time  When this is done  we would see something like  I made this particular graph up                  --------------------------------------                Active threads   processes              ----------- --------------------------------------   Thread   1                                                    2                                           ----------- --------------------------------------   Process  1                                                    2                                           ----------- --------------------------------------                Time -- gt                                             --------------------------------------    which would show that    threads are fully serialized by the GIL processes can run in parallel

User · Answer

The threading module uses threads  the multiprocessing module uses processes  The difference is that threads run in the same memory space  while processes have separate memory  This makes it a bit harder to share objects between processes with multiprocessing  Since threads use the same memory  precautions have to be taken or two threads will write to the same memory at the same time  This is what the global interpreter lock is for   Spawning processes is a bit slower than spawning threads

User · Answer

The key advantage is isolation  A crashing process won t bring down other processes  whereas a crashing thread will probably wreak havoc with other threads

User · Answer

As I learnd in university most of the answers above are right  In PRACTISE on different platforms  always using python  spawning multiple threads ends up like spawning one process  The difference is the multiple cores share the load instead of only 1 core processing everything at 100   So if you spawn for example 10 threads on a 4 core pc  you will end up getting only the 25  of the cpus power   And if u spawn 10 processes u will end up with the cpu processing at 100   if u dont have other limitations   Im not a expert in all the new technologies  Im answering with own real experience background

User · Answer

Process may have multiple threads  These threads may share memory and are the units of execution within a process    Processes run on the CPU  so  threads are residing under each process  Processes are individual entities which run independently  If you want to share data or state between each process  you may use a memory-storage tool such as Cache redis  memcache   Files  or a Database

User · Answer

Threading s job is to enable applications to be responsive  Suppose you have a database connection and you need to respond to user input  Without threading  if the database connection is busy the application will not be able to respond to the user  By splitting off the database connection into a separate thread you can make the application more responsive  Also because both threads are in the same process  they can access the same data structures - good performance  plus a flexible software design   Note that due to the GIL the app isn t actually doing two things at once  but what we ve done is put the resource lock on the database into a separate thread so that CPU time can be switched between it and the user interaction  CPU time gets rationed out between the threads   Multiprocessing is for times when you really do want more than one thing to be done at any given time  Suppose your application needs to connect to 6 databases and perform a complex matrix transformation on each dataset  Putting each job in a separate thread might help a little because when one connection is idle another one could get some CPU time  but the processing would not be done in parallel because the GIL means that you re only ever using the resources of one CPU  By putting each job in a Multiprocessing process  each can run on it s own CPU and run at full efficiency

User · Answer

Other answers have focused more on the multithreading vs multiprocessing aspect  but in python Global Interpreter Lock  GIL  has to be taken into account  When more number  say k  of threads are created  generally they will not increase the performance by  k  times  as it will still be running as a single threaded application  GIL is a global lock which locks everything out and allows only single thread execution  utilizing only a single core  The performance does increase in places where C extensions like numpy  Network  I O are being used  where a lot of background work is done and GIL is released   So when threading is used  there is only a single operating system level thread while python creates pseudo-threads which are completely managed by threading itself but are essentially running as a single process  Preemption takes place between these pseudo threads  If the CPU runs at maximum capacity  you may want to switch to multiprocessing  Now in case of self-contained instances of execution  you can instead opt for pool  But in case of overlapping data  where you may want processes communicating you should use multiprocessing Process

[python] Multiprocessing vs Threading Python

Examples related to python

Examples related to multithreading

Examples related to multiprocessing