How should I log while using multiprocessing in Python

Question

Right now I have a central module in a framework that spawns multiple processes using the Python 2 6 multiprocessing module  Because it uses multiprocessing  there is module-level multiprocessing-aware log  LOG   multiprocessing get logger    Per the docs  this logger has process-shared locks so that you don t garble things up in sys stderr  or whatever filehandle  by having multiple processes writing to it simultaneously   The issue I have now is that the other modules in the framework are not multiprocessing-aware  The way I see it  I need to make all dependencies on this central module use multiprocessing-aware logging  That s annoying within the framework  let alone for all clients of the framework  Are there alternatives I m not thinking of

User · Answer

Here s my simple hack workaround    not the most comprehensive  but easily modifiable and simpler to read and understand I think than any other answers I found before writing this   import logging import multiprocessing  class FakeLogger object       def   init   self  q           self q   q     def info self  item           self q put  INFO -     format item       def debug self  item           self q put  DEBUG -     format item       def critical self  item           self q put  CRITICAL -     format item       def warning self  item           self q put  WARNING -     format item    def some other func that gets logger and logs num         notice the name get s discarded       of course you can easily add this to your FakeLogger class     local logger   logging getLogger  local       local logger info  Hey I am logging this     and working on it to make this      format num  num 2       local logger debug  hmm  something may need debugging here       return num 2  def func to parallelize data chunk         unpack our args     the num  logger q   data chunk       since we re now in a new process  let s monkeypatch the logging module     logging getLogger   lambda name None  FakeLogger logger q        now do the actual work that happens to log stuff too     new num   some other func that gets logger and logs the num      return  the num  new num   if   name         main         multiprocessing freeze support       m   multiprocessing Manager       logger q   m Queue         we have to pass our data to be parallel-processed       we also need to pass the Queue object so we can retrieve the logs     parallelable data     1  logger q    2  logger q         set up a pool of processes so we can take advantage of multiple CPU cores     pool size   multiprocessing cpu count     2     pool   multiprocessing Pool processes pool size  maxtasksperchild 4      worker output   pool map func to parallelize  parallelable data      pool close     no more tasks     pool join      wrap up current tasks       get the contents of our FakeLogger object     while not logger q empty            print logger q get       print  worker output contained      format worker output

User · Answer

Yet another alternative might be the various non-file-based logging handlers in the logging package     SocketHandler DatagramHandler SyslogHandler    and others   This way  you could easily have a logging daemon somewhere that you could write to safely and would handle the results correctly   E g   a simple socket server that just unpickles the message and emits it to its own rotating file handler    The SyslogHandler would take care of this for you  too  Of course  you could use your own instance of syslog  not the system one

User · Answer

As of 2020 it seems there is a simpler way of logging with multiprocessing  This function will create the logger  You can set the format here and where you want your output to go  file  stdout   def create logger        import multiprocessing  logging     logger   multiprocessing get logger       logger setLevel logging INFO      formatter   logging Formatter               asctime s    levelname s    processName s    message s       handler   logging FileHandler  logs your file name log       handler setFormatter formatter         this bit will make sure you won t have        duplicated messages in the output     if not len logger handlers            logger addHandler handler      return logger   In the init you instantiate the logger  if   name         main          from multiprocessing import Pool     logger   create logger       logger info  Starting pooling       p   Pool         rest of the code  Now  you only need to add this reference in each function where you need logging  logger   create logger    And output messages  logger info f My message from  something     Hope this helps

User · Answer

I just now wrote a log handler of my own that just feeds everything to the parent process via a pipe   I ve only been testing it for ten minutes but it seems to work pretty well     Note  This is hardcoded to RotatingFileHandler  which is my own use case      Update   javier now maintains this approach as a package available on Pypi - see multiprocessing-logging on Pypi  github at https   github com jruere multiprocessing-logging    Update  Implementation   This now uses a queue for correct handling of concurrency  and also recovers from errors correctly   I ve now been using this in production for several months  and the current version below works without issue   from logging handlers import RotatingFileHandler import multiprocessing  threading  logging  sys  traceback  class MultiProcessingLog logging Handler       def   init   self  name  mode  maxsize  rotate           logging Handler   init   self           self  handler   RotatingFileHandler name  mode  maxsize  rotate          self queue   multiprocessing Queue -1           t   threading Thread target self receive          t daemon   True         t start        def setFormatter self  fmt           logging Handler setFormatter self  fmt          self  handler setFormatter fmt       def receive self           while True              try                  record   self queue get                   self  handler emit record              except  KeyboardInterrupt  SystemExit                   raise             except EOFError                  break             except                  traceback print exc file sys stderr       def send self  s           self queue put nowait s       def  format record self  record             ensure that exc info and args           have been stringified   Removes any chance of           unpickleable things inside and possibly reduces           message size sent over the pipe         if record args              record msg   record msg   record args             record args   None         if record exc info              dummy   self format record              record exc info   None          return record      def emit self  record           try              s   self  format record record              self send s          except  KeyboardInterrupt  SystemExit               raise         except              self handleError record       def close self           self  handler close           logging Handler close self

User · Answer

How about delegating all the logging to another process that reads all log entries from a Queue   LOG QUEUE   multiprocessing JoinableQueue    class CentralLogger multiprocessing Process       def   init   self  queue           multiprocessing Process   init   self          self queue   queue         self log   logger getLogger  some config           self log info  Started Central Logging process        def run self           while True              log level  message   self queue get               if log level is None                  self log info  Shutting down Central Logging process                   break             else                  self log log log level  message   central logger process   CentralLogger LOG QUEUE  central logger process start     Simply share LOG QUEUE via any of the multiprocess mechanisms or even inheritance and it all works out fine

User · Answer

One of the alternatives is to write the mutliprocessing logging to a known file and register an atexit handler to join on those processes read it back on stderr  however  you won t get a real-time flow to the output messages on stderr that way

User · Answer

Since we can represent multiprocess logging as many publishers and one subscriber  listener   using ZeroMQ to implement PUB-SUB messaging is indeed an option    Moreover  PyZMQ module  the Python bindings for ZMQ  implements PUBHandler  which is object for publishing logging messages over a zmq PUB socket   There s a solution on the web  for centralized logging from distributed application using PyZMQ and PUBHandler  which can be easily adopted for working locally with multiple publishing processes   formatters         logging DEBUG  logging Formatter     name s    message s        logging INFO  logging Formatter     name s    message s        logging WARN  logging Formatter     name s    message s        logging ERROR  logging Formatter     name s    message s        logging CRITICAL  logging Formatter     name s    message s        This one will be used by publishing processes class PUBLogger      def   init   self  host  port config PUBSUB LOGGER PORT           self  logger   logging getLogger   name            self  logger setLevel logging DEBUG          self ctx   zmq Context           self pub   self ctx socket zmq PUB          self pub connect  tcp    0   1   format socket gethostbyname host   port           self  handler   PUBHandler self pub          self  handler formatters   formatters         self  logger addHandler self  handler        property     def logger self           return self  logger    This one will be used by listener process class SUBLogger      def   init   self  ip  output dir     port config PUBSUB LOGGER PORT           self output dir   output dir         self  logger   logging getLogger           self  logger setLevel logging DEBUG           self ctx   zmq Context           self  sub   self ctx socket zmq SUB          self  sub bind  tcp      1   format ip  port           self  sub setsockopt zmq SUBSCRIBE               handler   handlers RotatingFileHandler os path join output dir   client debug log     w   100   1024   1024  10          handler setLevel logging DEBUG          formatter   logging Formatter    asctime s   levelname s -   message s           handler setFormatter formatter          self  logger addHandler handler      property   def sub self         return self  sub     property   def logger self         return self  logger     And that s the way we actually run things     Listener process will forever listen on SUB socket for incoming messages def run sub logger ip  event       sub logger   SUBLogger ip      while not event is set            try              topic  message   sub logger sub recv multipart flags zmq NOBLOCK              log msg   getattr logging  topic lower                log msg message          except zmq ZMQError as zmq error              if zmq error errno    zmq EAGAIN                  pass     Publisher processes loggers should be initialized as follows   class Publisher      def   init   self  stop event  proc id           self stop event   stop event         self proc id   proc id         self  logger   pub logger PUBLogger  127 0 0 1   logger       def run self            self  logger info   0  - Sending message  format proc id    def run worker event  proc id       worker   Publisher event  proc id      worker run      Starting subscriber process so we won t loose publisher s messages sub logger process   Process target run sub logger                                   args   127 0 0 1    stop event    sub logger process start     Starting publisher processes for i in range MAX WORKERS PER CLIENT       processes append Process target run worker                                   args  stop event  i     for p in processes      p start

User · Answer

For whoever might need this  I wrote a decorator for multiprocessing logging package that adds the current process name to logs  so it becomes clear who logs what   It also runs install mp handler   so it becomes unuseful to run it before creating a pool   This allows me to see which worker creates which logs messages   Here s the blueprint with an example   import sys import logging from functools import wraps import multiprocessing import multiprocessing logging    Setup basic console logger as  logger  logger   logging getLogger   console handler   logging StreamHandler sys stdout  console handler setFormatter logging Formatter u   asctime s      levelname s      message s    logger setLevel logging DEBUG  logger addHandler console handler      Create a decorator for functions that are called via multiprocessing pools def logs mp process names fn       class MultiProcessLogFilter logging Filter           def filter self  record               try                  process name   multiprocessing current process   name             except BaseException                  process name     name               record msg   f  process name      record msg               return True      multiprocessing logging install mp handler       f   MultiProcessLogFilter          Wraps is needed here so apply   apply async know the function name      wraps fn      def wrapper  args    kwargs           logger removeFilter f          logger addFilter f          return fn  args    kwargs       return wrapper     Create a test function and decorate it  logs mp process names def test argument       logger info f test function called via   argument        You can also redefine undecored functions def undecorated function        logger info  I am not decorated      logs mp process names def redecorated  args    kwargs       return undecorated function  args    kwargs      Enjoy if   name         main         with multiprocessing Pool   as mp pool            Also works with apply async         mp pool apply test    mp pool             mp pool apply redecorated          logger info  some main logs           test  main program

User · Answer

I liked zzzeek s answer  I would just substitute the Pipe for a Queue since if multiple threads processes use the same pipe end to generate log messages they will get garbled

User · Answer

I have a solution that s similar to ironhacker s except that I use logging exception in some of my code and found that I needed to format the exception before passing it back over the Queue since tracebacks aren t pickle able   class QueueHandler logging Handler       def   init   self  queue           logging Handler   init   self          self queue   queue     def emit self  record           if record exc info                can t pass exc info across processes so just format now             record exc text   self formatException record exc info              record exc info   None         self queue put record      def formatException self  ei           sio   cStringIO StringIO           traceback print exception ei 0   ei 1   ei 2   None  sio          s   sio getvalue           sio close           if s -1       n               s   s  -1          return s

User · Answer

There is this great package  Package  https   pypi python org pypi multiprocessing-logging   code  https   github com jruere multiprocessing-logging  Install   pip install multiprocessing-logging   Then add   import multiprocessing logging    This enables logs inside process multiprocessing logging install mp handler

User · Answer

I also like zzzeek s answer but Andre is correct that a queue is required to prevent garbling  I had some luck with the pipe  but did see garbling which is somewhat expected  Implementing it turned out to be harder than I thought  particularly due to running on Windows  where there are some additional restrictions about global variables and stuff  see  How  39 s Python Multiprocessing Implemented on Windows    But  I finally got it working  This example probably isn t perfect  so comments and suggestions are welcome  It also does not support setting the formatter or anything other than the root logger  Basically  you have to reinit the logger in each of the pool processes with the queue and set up the other attributes on the logger   Again  any suggestions on how to make the code better are welcome  I certainly don t know all the Python tricks yet  -   import multiprocessing  logging  sys  re  os  StringIO  threading  time  Queue  class MultiProcessingLogHandler logging Handler       def   init   self  handler  queue  child False           logging Handler   init   self           self  handler   handler         self queue   queue            we only want one of the loggers to be pulling from the queue            If there is a way to do this without needing to be passed this           information  that would be great          if child    False              self shutdown   False             self polltime   1             t   threading Thread target self receive              t daemon   True             t start        def setFormatter self  fmt           logging Handler setFormatter self  fmt          self  handler setFormatter fmt       def receive self            print  receive on          while  self shutdown    False  or  self queue empty      False                 so we block for a short period of time so that we can               check for the shutdown cases              try                  record   self queue get True  self polltime                  self  handler emit record              except Queue Empty  e                  pass      def send self  s             send just puts it in the queue for the server to retrieve         self queue put s       def  format record self  record           ei   record exc info         if ei              dummy   self format record    just to get traceback text into record exc text             record exc info   None    to avoid Unpickleable error          return record      def emit self  record           try              s   self  format record record              self send s          except  KeyboardInterrupt  SystemExit               raise         except              self handleError record       def close self           time sleep self polltime 1    give some time for messages to enter the queue          self shutdown   True         time sleep self polltime 1    give some time for the server to time out and see the shutdown      def   del   self           self close     hopefully this aids in orderly shutdown when things are going poorly   def f x         just a logging command        logging critical  function number      str x         to make some calls take longer than others  so the output is  jumbled  as real MP programs are      time sleep x   3   def initPool queue  level               This causes the logging module to be initialized with the necessary info     in pool threads to work correctly              logging getLogger     addHandler MultiProcessingLogHandler logging StreamHandler    queue  child True       logging getLogger     setLevel level   if   name         main         stream   StringIO StringIO       logQueue   multiprocessing Queue 100      handler  MultiProcessingLogHandler logging StreamHandler stream   logQueue      logging getLogger     addHandler handler      logging getLogger     setLevel logging DEBUG       logging debug  starting main          when bulding the pool on a Windows machine we also have to init the logger in all the instances with the queue and the level of logging      pool   multiprocessing Pool processes 10  initializer initPool  initargs  logQueue  logging getLogger     getEffectiveLevel        start worker processes     pool map f  range 0 50       pool close        logging debug  done       logging shutdown       print  stream output is       print stream getvalue

User · Answer

Below is another solution with a focus on simplicity for anyone else  like me  who get here from Google   Logging should be easy   Only for 3 2 or higher   import multiprocessing import logging from logging handlers import QueueHandler  QueueListener import time import random   def f i       time sleep random uniform  01   05       logging info  function called with    in worker thread   format i       time sleep random uniform  01   05       return i   def worker init q         all records from worker processes go to qh and then into q     qh   QueueHandler q      logger   logging getLogger       logger setLevel logging DEBUG      logger addHandler qh    def logger init        q   multiprocessing Queue         this is the handler for all log records     handler   logging StreamHandler       handler setFormatter logging Formatter    levelname s    asctime s -   process s -   message s           ql gets records from the queue and sends them to the handler     ql   QueueListener q  handler      ql start        logger   logging getLogger       logger setLevel logging DEBUG        add the handler to the logger so records from this process are handled     logger addHandler handler       return ql  q   def main        q listener  q   logger init        logging info  hello from main thread       pool   multiprocessing Pool 4  worker init   q       for result in pool map f  range 10            pass     pool close       pool join       q listener stop    if   name         main         main

User · Answer

If you have deadlocks occurring in a combination of locks  threads and forks in the logging module  that is reported in bug report 6721  see also related SO question    There is a small fixup solution posted here   However  that will just fix any potential deadlocks in logging  That will not fix that things are maybe garbled up  See the other answers presented here

User · Answer

The only way to deal with this non-intrusively is to    Spawn each worker process such that its log goes to a different file descriptor  to disk or to pipe    Ideally  all log entries should be timestamped    Your controller process can then do one of the following    If using disk files  Coalesce the log files at the end of the run  sorted by timestamp If using pipes  recommended   Coalesce log entries on-the-fly from all pipes  into a central log file   E g   Periodically select from the pipes  file descriptors  perform merge-sort on the available log entries  and flush to centralized log  Repeat

User · Answer

A variant of the others that keeps the logging and queue thread separate      sample code for logging in subprocesses using multiprocessing    Little handler magic - The main process uses loggers and handlers as normal    Only a simple handler is needed in the subprocess that feeds the queue    Original logger name from subprocess is preserved when logged in main   process    As in the other implementations  a thread reads the queue and calls the   handlers  Except in this implementation  the thread is defined outside of a   handler  which makes the logger definitions simpler    Works with multiple handlers   If the logger in the main process defines   multiple handlers  they will all be fed records generated by the   subprocesses loggers   tested with Python 2 5 and 2 6 on Linux and Windows       import os import sys import time import traceback import multiprocessing  threading  logging  sys  DEFAULT LEVEL   logging DEBUG  formatter   logging Formatter    levelname s    asctime s -   name s -   process s -   message s    class SubProcessLogHandler logging Handler          handler used by subprocesses      It simply puts items on a Queue for the main process to log                def   init   self  queue           logging Handler   init   self          self queue   queue      def emit self  record           self queue put record   class LogQueueReader threading Thread          thread to write subprocesses log records to main process log      This thread reads the records written by subprocesses and writes them to     the handlers defined in the main process s handlers                def   init   self  queue           threading Thread   init   self          self queue   queue         self daemon   True      def run self              read from the queue and write to the log handlers          The logging documentation says logging is thread safe  so there         shouldn t be contention between normal logging  from the main         process  and this thread           Note that we re using the name of the original logger                         Thanks Mike for the error checking code          while True              try                  record   self queue get                     get the logger for this record                 logger   logging getLogger record name                  logger callHandlers record              except  KeyboardInterrupt  SystemExit                   raise             except EOFError                  break             except                  traceback print exc file sys stderr   class LoggingProcess multiprocessing Process        def   init   self  queue           multiprocessing Process   init   self          self queue   queue      def  setupLogger self             create the logger to use          logger   logging getLogger  test subprocess             The only handler desired is the SubProcessLogHandler   If any others           exist  remove them  In this case  on Unix and Linux the StreamHandler           will be inherited           for handler in logger handlers                just a check for my sanity             assert not isinstance handler  SubProcessLogHandler              logger removeHandler handler            add the handler         handler   SubProcessLogHandler self queue          handler setFormatter formatter          logger addHandler handler             On Windows  the level will not be inherited   Also  we could just           set the level to log everything here and filter it in the main           process handlers   For now  just set it from the global default          logger setLevel DEFAULT LEVEL          self logger   logger      def run self           self  setupLogger           logger   self logger           and here goes the logging         p   multiprocessing current process           logger info  hello from process  s with pid  s     p name  p pid     if   name         main           queue used by the subprocess loggers     queue   multiprocessing Queue         Just a normal logger     logger   logging getLogger  test       handler   logging StreamHandler       handler setFormatter formatter      logger addHandler handler      logger setLevel DEFAULT LEVEL      logger info  hello from the main process         This thread will read from the subprocesses and write to the main log s       handlers      log queue reader   LogQueueReader queue      log queue reader start         create the processes      for i in range 10           p   LoggingProcess queue          p start         The way I read the multiprocessing warning about Queue  joining a       process before it has finished feeding the Queue can cause a deadlock        Also  Queue empty   is not realiable  so just make sure all processes       are finished        active children joins subprocesses when they re finished      while multiprocessing active children            time sleep  1

User · Answer

All current solutions are too coupled to the logging configuration by using a handler  My solution has the following architecture and features    You can use any logging configuration you want  Logging is done in a daemon thread Safe shutdown of the daemon by using a context manager Communication to the logging thread is done by multiprocessing Queue In subprocesses  logging Logger  and already defined instances  are patched to send all records to the queue New  format traceback and message before sending to queue to prevent pickling errors   Code with usage example and output can be found at the following Gist  https   gist github com schlamar 7003737

User · Answer

Simplest idea as mentioned    Grab the filename and the process id of the current process  Set up a  WatchedFileHandler  1   The reasons for this handler are discussed in detail here  but in short there are certain worse race conditions with the other logging handlers  This one has the shortest window for the race condition    Choose a path to save the logs to such as  var log

User · Answer

just publish somewhere your instance of the logger  that way  the other modules and clients can use your API to get the logger without having to import multiprocessing

User · Answer

Below is a class that can be used in Windows environment  requires ActivePython  You can also inherit for other logging handlers  StreamHandler etc    class SyncronizedFileHandler logging FileHandler       MUTEX NAME    logging mutex       def   init   self    args     kwargs            self mutex   win32event CreateMutex None   False   self MUTEX NAME          return super SyncronizedFileHandler   self     init    args     kwargs       def emit self   args     kwargs           try              win32event WaitForSingleObject self mutex   win32event INFINITE              ret   super SyncronizedFileHandler   self   emit  args     kwargs          finally              win32event ReleaseMutex self mutex          return ret   And here is an example that demonstrates usage   import logging import random   time   os   sys   datetime from string import letters import win32api   win32event from multiprocessing import Pool  def f i       time sleep random randint 0 10    0 1      ch   random choice letters      logging info  ch   30    def init logging                initilize the loggers             formatter   logging Formatter    levelname s -   process d -   asctime s -   filename s -   lineno d -   message s       logger   logging getLogger       logger setLevel logging INFO       file handler   SyncronizedFileHandler sys argv 1       file handler setLevel logging INFO      file handler setFormatter formatter      logger addHandler file handler    must be called in the parent and in every worker process init logging     if   name         main          multiprocessing stuff     pool   Pool processes 10      imap result   pool imap f   range 30       for i     in enumerate imap result           pass

User · Answer

QueueHandler is native in Python 3 2   and does exactly this  It is easily replicated in previous versions   Python docs have two complete examples  Logging to a single file from multiple processes  For those using Python  lt  3 2  just copy QueueHandler into your own code from  https   gist github com vsajip 591589 or alternatively import logutils   Each process  including the parent process  puts its logging on the Queue  and then a listener thread or process  one example is provided for each  picks those up and writes them all to a file - no risk of corruption or garbling

[python] How should I log while using multiprocessing in Python?

Examples related to python

Examples related to logging

Examples related to multiprocessing