When to use the different log levels

Question

There are different ways to log messages  in order of fatality    FATAL ERROR WARN INFO DEBUG TRACE   How do I decide when to use which   What s a good heuristic to use

User · Answer

I d recommend adopting Syslog severity levels  DEBUG  INFO  NOTICE  WARNING  ERROR  CRITICAL  ALERT  EMERGENCY  See http   en wikipedia org wiki Syslog Severity levels  They should provide enough fine-grained severity levels for most use-cases and are recognized by existing log-parsers  While you have of course the freedom to only implement a subset  e g  DEBUG  ERROR  EMERGENCY depending on your app s requirements   Let s standardize on something that s been around for ages instead of coming up with our own standard for every different app we make  Once you start aggregating logs and are trying to detect patterns across different ones it really helps

User · Answer

I totally agree with the others  and think that GrayWizardx said it best   All that I can add is that these levels generally correspond to their dictionary definitions  so it can t be that hard  If in doubt  treat it like a puzzle  For your particular project  think of everything that you might want to log   Now  can you figure out what might be fatal  You know what fatal means  don t you  So  which items on your list are fatal   Ok  that s fatal dealt with  now let s look at errors     rinse and repeat   Below Fatal  or maybe Error  I would suggest that more information is always better than less  so err  upwards   Not sure if it s Info or Warning  Then make it a warning   I do think that Fatal and error ought to be clear to all of us  The others might be fuzzier  but it is arguably less vital to get them right      Here are some examples    Fatal - can t allocate memory  database  etc - can t continue    Error - no reply to message  transaction aborted  can t save file  etc   Warning - resource allocation reaches X   say 80   - that is a sign that you might want to re-dimension your   Info - user logged in out  new transaction  file crated  new d b field  or field deleted   Debug - dump of internal data structure  Anything Trace level with file name  amp  line number  Trace - action succeeded failed  d b updated

User · Answer

G day   As a corollary to this question  communicate your interpretations of the log levels and make sure that all people on a project are aligned in their interpretation of the levels   It s painful to see a vast variety of log messages where the severities and the selected log levels are inconsistent   Provide examples if possible of the different logging levels  And be consistent in the info to be logged in a message   HTH

User · Answer

As others have said  errors are problems  warnings are potential problems   In development  I frequently use warnings where I might put the equivalent of an assertion failure but the application can continue working  this enables me to find out if that case ever actually happens  or if it s my imagination   But yes  it gets down to the recoverabilty and actuality aspects   If you can recover  it s probably a warning  if it causes something to actually fail  it s an error

User · Answer

An error is something that is wrong  plain wrong  no way around it  it needs to be fixed   A warning is a sign of a pattern that might be wrong  but then also might not be   Having said that  I cannot come up with a good example of a warning that isn t also an error  What I mean by that is that if you go to the trouble of logging a warning  you might as well fix the underlying issue   However  things like  sql execution takes too long  might be a warning  while  sql execution deadlocks  is an error  so perhaps there s some cases after all

User · Answer

Warnings you can recover from  Errors you can t  That s my heuristic  others may have other ideas   For example  let s say you enter import the name  Angela M  ller  into your application  note the umlaut over the u   Your code database may be English only  though it probably shouldn t be in this day and age  and could therefore warn that all  unusual  characters had been converted to regular English characters   Contrast that with trying to write that information to the database and getting back a network down message for 60 seconds straight  That s more of an error than a warning

User · Answer

I suggest using only three levels   Fatal - Which would break the application  Info - Info Debug - Less important info

User · Answer

I ve built systems before that use the following    ERROR - means something is seriously wrong and that particular thread process sequence can t carry on  Some user admin intervention is required WARNING - something is not right  but the process can carry on as before  e g  one job in a set of 100 has failed  but the remainder can be processed    In the systems I ve built admins were under instruction to react to ERRORs  On the other hand we would watch for WARNINGS and determine for each case whether any system changes  reconfigurations etc  were required

User · Answer

Would you want the message to get a system administrator out of bed in the middle of the night    yes -  error no -  warn

User · Answer

My two cents about FATAL and TRACE error log levels   ERROR is when some FAULT  exception  occur   FATAL is actually DOUBLE FAULT  when exception occur while handling exception    It s easy to understand for web service    Request come  Event is logged as INFO  System detects low disk space  Event is logged as WARN Some function is called to handle the request   While processing division by zero occur  Event is logged as ERROR Web service s exception handler is called to handle division by zero   Web service framework is going to send email  but it can not because mailing service is offline now  This second exception can not be handled normally  because Web service s exception handler can not process exception  Different exception handler called  Event is logged as FATAL   TRACE is when we can trace function entry exit  This is not about logging  because this message can be generated by some debugger and your code has not call to log at all  So messages that are not from your application are marked like TRACE level  For example your run your application by with strace  So generally in your program you do DEBUG  INFO and WARN logging  And only if you are writing some web service framework you will use FATAL  And when you are debugging application you will get TRACE logging from this type of software

User · Answer

From RFC 5424  the Syslog Protocol  IETF  - Page 10      Each message Priority also has a decimal Severity level indicator     These are described in the following table along with their numerical    values   Severity values MUST be in the range of 0 to 7 inclusive          Numerical         Severity          Code            0       Emergency  system is unusable           1       Alert  action must be taken immediately           2       Critical  critical conditions           3       Error  error conditions           4       Warning  warning conditions           5       Notice  normal but significant condition           6       Informational  informational messages           7       Debug  debug-level messages            Table 2  Syslog Message Severities

User · Answer

It s an old topic  but still relevant  This week  I wrote a small article about it  for my colleagues  For that purpose  I also created this cheat sheet  because I couldn t find any online

User · Answer

I find it more helpful to think about severities from the perspective of viewing the log file   Fatal Critical  Overall application or system failure that should be investigated immediately  Yes  wake up the SysAdmin  Since we prefer our SysAdmins alert and well-rested  this severity should be used very infrequently   If it s happening daily and that s not a BFD  it s lost it s meaning   Typically  a Fatal error only occurs once in the process lifetime  so if the log file is tied to the process  this is typically the last message in the log   Error  Definitely a problem that should be investigated  SysAdmin should be notified automatically  but doesn t need to be dragged out of bed   By filtering a log to look at errors and above you get an overview of error frequency and can quickly identify the initiating failure that might have resulted in a cascade of additional errors   Tracking error rates as versus application usage can yield useful quality metrics such as MTBF which can be used to assess overall quality   For example  this metric might help inform decisions about whether or not another beta testing cycle is needed before a release   Warning  This MIGHT be problem  or might not   For example  expected transient environmental conditions such as short loss of network or database connectivity should be logged as Warnings  not Errors   Viewing a log filtered to show only warnings and errors may give quick insight into early hints at the root cause of a subsequent error   Warnings should be used sparingly so that they don t become meaningless   For example  loss of network access should be a warning or even an error in a server application  but might be just an Info in a desktop app designed for occassionally disconnected laptop users   Info  This is important information that should be logged under normal conditions such as successful initialization  services starting and stopping or successful completion of significant transactions   Viewing a log showing Info and above should give a quick overview of major state changes in the process providing top-level context for understanding any warnings or errors that also occur   Don t have too many Info messages   We typically have  lt  5  Info messages relative to Trace   Trace  Trace is by far the most commonly used severity and should provide context to understand the steps leading up to errors and warnings   Having the right density of Trace messages makes software much more maintainable but requires some diligence because the value of individual Trace statements may change over time as programs evolve   The best way to achieve this is by getting the dev team in the habit of regularly reviewing logs as a standard part of troubleshooting customer reported issues   Encourage the team to prune out Trace messages that no longer provide useful context and to add messages where needed to understand the context of subsequent messages   For example  it is often helpful to log user input such as changing displays or tabs   Debug  We consider Debug  lt  Trace   The distinction being that Debug messages are compiled out of Release builds   That said  we discourage use of Debug messages  Allowing Debug messages tends to lead to more and more Debug messages being added and none ever removed   In time  this makes log files almost useless because it s too hard to filter signal from noise   That causes devs to not use the logs which continues the death spiral   In contrast  constantly pruning Trace messages encourages devs to use them which results in a virtuous spiral   Also  this eliminates the possibility of bugs introduced because of needed side-effects in debug code that isn t included in the release build   Yeah  I know that shouldn t happen in good code  but better safe then sorry

User · Answer

I generally subscribe to the following convention    Trace - Only when I would be  tracing  the code and trying to find one part of a function specifically  Debug - Information that is diagnostically helpful to people more than just developers  IT  sysadmins  etc    Info - Generally useful information to log  service start stop  configuration assumptions  etc   Info I want to always have available but usually don t care about under normal circumstances  This is my out-of-the-box config level  Warn - Anything that can potentially cause application oddities  but for which I am automatically recovering   Such as switching from a primary to backup server  retrying an operation  missing secondary data  etc   Error - Any error which is fatal to the operation  but not the service or application  can t open a required file  missing data  etc    These errors will force user  administrator  or direct user  intervention  These are usually reserved  in my apps  for incorrect connection strings  missing services  etc   Fatal -  Any error that is forcing a shutdown of the service or application to prevent data loss  or further data loss   I reserve these only for the most heinous errors and situations where there is guaranteed to have been data corruption or loss

User · Answer

Taco Jan Osinga s answer is very good  and very practical  to boot  I am in partial agreement with him  though with some variations  On Python  there are only 5  quot named quot  logging levels  so this is how I use them   DEBUG -- information important for troubleshooting  and usually suppressed in normal day-to-day operation INFO -- day-to-day operation as  quot proof quot  that program is performing its function as  designed WARN -- out-of-nominal but recoverable situation   or  coming upon something that may result in future problems ERROR -- something happened that necessitates the program to do recovery  but recovery is successful  Program is likely not in the originally expected state  though  so user of the program will need to adapt CRITICAL -- something happened that cannot be recovered from  and program likely need to terminate lest everyone will be living in a state of sin

User · Answer

If you can recover from the problem then it s a warning  If it prevents continuing execution then it s an error

User · Answer

I ve always considered warning the first log level that for sure means there is a problem  for example  perhaps a config file isn t where it should be and we re going to have to run with default settings   An error implies  to me  something that means the main goal of the software is now impossible and we re going to try to shut down cleanly

User · Answer

I think that SYSLOG levels NOTICE and ALERT EMERGENCY are largely superfluous for application-level logging - while CRITICAL ALERT EMERGENCY may be useful alert levels for an operator that may trigger different actions and notifications  to an application admin it s all the same as FATAL  And I just cannot sufficiently distinguish between being given a notice or some information  If the information is not noteworthy  it s not really information     I like Jay Cincotta s interpretation best - tracing your code s execution is something very useful in tech support  and putting trace statements into the code liberally should be encouraged - especially in combination with a dynamic filtering mechanism for logging the trace messages from specific application components  However DEBUG level to me indicates that we re still in the process of figuring out what s going on - I see DEBUG level output as a development-only option  not as something that should ever show up in a production log   There is however a logging level that I like to see in my error logs when wearing the hat of a sysadmin as much as that of tech support  or even developer  OPER  for OPERATIONAL messages  This I use for logging a timestamp  the type of operation invoked  the arguments supplied  possibly a  unique  task identifier  and task completion  It s used when e g  a standalone task is fired off  something that is a true invocation from within the larger long-running app  It s the sort of thing I want always logged  no matter whether anything goes wrong or not  so I consider the level of OPER to be higher than FATAL  so you can only turn it off by going to totally silent mode  And it s much more than mere INFO log data - a log level often abused for spamming logs with minor operational messages of no historical value whatsoever   As the case dictates this information may be directed to a separate invocation log  or may be obtained by filtering it out of a large log recording more information  But it s always needed  as historical info  to know what was being done - without descending to the level of AUDIT  another totally separate log level that has nothing to do with malfunctions or system operation  doesn t really fit within the above levels  as it needs its own control switch  not a severity classification  and which definitely needs its own separate log file

User · Answer

Here s a list of what  the loggers  have     Apache log4j   sect 1   sect 2   FATAL       v1 2      very severe error events that will presumably lead the application to abort        v2 0      severe error that will prevent the application from continuing   ERROR       v1 2      error events that might still allow the application to continue running        v2 0      error in the application  possibly recoverable   WARN       v1 2      potentially harmful situations        v2 0      event that might possible  sic  lead to an error   INFO       v1 2      informational messages that highlight the progress of the application at coarse-grained level        v2 0      event for informational purposes   DEBUG       v1 2      fine-grained informational events that are most useful to debug an application        v2 0      general debugging event   TRACE       v1 2      finer-grained informational events than the DEBUG        v2 0      fine-grained debug message  typically capturing the flow through the application       Apache Httpd  as usual  likes to go for the overkill   sect    emerg      Emergencies  ndash  system is unusable   alert      Action must be taken immediately  but system is still usable     crit       Critical Conditions  but action need not be taken immediately            socket  Failed to get a socket  exiting child      error      Error conditions  but not critical            Premature end of script headers      warn      Warning conditions   close to error  but not error   notice      Normal but significant  notable  condition            httpd  caught SIGBUS  attempting to dump core in          info      Informational  and unnotable             Server has been running for x hours        debug      Debug-level messages    i e  messages logged for the sake of de-bugging             Opening config file          trace1  rarr  trace6      Trace messages    i e  messages logged for the sake of tracing            proxy  FTP  control connection complete     proxy  CONNECT  sending the CONNECT request to the remote proxy     openssl  Handshake  start     read from buffered SSL brigade  mode 0  17 bytes     map lookup FAILED  map rewritemap key keyname     cache lookup FAILED  forcing new map lookup      trace7  rarr  trace8      Trace messages  dumping large amounts of data            0000  02 23 44 30 13 40 ac 34 df 3d bf 9a 19 49 39 15         0000  02 23 44 30 13 40 ac 34 df 3d bf 9a 19 49 39 15            Apache commons-logging   sect    fatal      Severe errors that cause premature termination  Expect these to be immediately visible on a status console    error      Other runtime errors or unexpected conditions  Expect these to be immediately visible on a status console    warn      Use of deprecated APIs  poor use of API   almost  errors  other runtime situations that are undesirable or unexpected  but not necessarily  wrong   Expect these to be immediately visible on a status console    info      Interesting runtime events  startup shutdown   Expect these to be immediately visible on a console  so be conservative and keep to a minimum    debug      detailed information on the flow through the system  Expect these to be written to logs only   trace      more detailed information  Expect these to be written to logs only     Apache commons-logging  best practices  for enterprise usage makes a distinction between debug and info based on what kind of boundaries they cross    Boundaries include    External Boundaries - Expected Exceptions  External Boundaries - Unexpected Exceptions  Internal Boundaries   Significant Internal Boundaries      See commons-logging guide for more info on this

User · Answer

Btw  I am a great fan of capturing everything and filtering the information later   What would happen if you were capturing at Warning level and want some Debug info related to the warning  but were unable to recreate the warning    Capture everything and filter later   This holds true even for embedded software unless you find that your processor can t keep up  in which case you might want to re-design your tracing to make it more efficient  or the tracing is interfering with timing  you might consider debugging on a more powerful processor  but that opens up a whole nother can of worms    Capture everything and filter later     btw  capture everything is also good because it lets you develop tools to do more than just show debug trace  I draw Message Sequence Charts from mine  and histograms of memory usage  It also gives you a basis for comparison if something goes wrong in future  keep all logs  whether pass or fail  and be sure to include build number in the log file

[logging] When to use the different log levels

Examples related to logging