How to turn off INFO logging in Spark

Question

I installed Spark using the AWS EC2 guide and I can launch the program fine using the bin pyspark script to get to the spark prompt and can also do the Quick Start quide successfully   However  I cannot for the life of me figure out how to stop all of the verbose INFO logging after each command   I have tried nearly every possible scenario in the below code  commenting out  setting to OFF  within my log4j properties file in the conf folder in where I launch the application from as well as on each node and nothing is doing anything  I still get the logging INFO statements printing after executing each statement   I am very confused with how this is supposed to work     Set everything to be logged to the console log4j rootCategory INFO  console                                                                         log4j appender console org apache log4j ConsoleAppender  log4j appender console target System err      log4j appender console layout org apache log4j PatternLayout  log4j appender console layout ConversionPattern  d yy MM dd HH mm ss   p  c 1    m n    Settings to quiet third party logs that are too verbose log4j logger org eclipse jetty WARN log4j logger org apache spark repl SparkIMain exprTyper INFO log4j logger org apache spark repl SparkILoop SparkILoopInterpreter INFO   Here is my full classpath when I use SPARK PRINT LAUNCH COMMAND      Spark Command     Library Java JavaVirtualMachines jdk1 8 0 05 jdk Contents Home bin java   -cp      root spark-1 0 1-bin-hadoop2 conf  root spark-1 0 1-bin-hadoop2 conf  root spark-1 0 1-bin-hadoop2 lib spark-assembly-1 0 1-hadoop2 2 0 jar  root spark-1 0 1-bin-hadoop2 lib datanucleus-api-jdo-3 2 1 jar  root spark-1 0 1-bin-hadoop2 lib datanucleus-core-3 2 2 jar  root spark-1 0 1-bin-hadoop2 lib datanucleus-rdbms-3 2 1 jar   -XX MaxPermSize 128m -Djava library path  -Xms512m -Xmx512m org apache spark deploy SparkSubmit spark-shell --class   org apache spark repl Main   contents of spark-env sh      usr bin env bash    This file is sourced when running various Spark programs    Copy it as spark-env sh and edit that to configure Spark for your site     Options read when launching programs locally with      bin run-example or   bin spark-submit   - HADOOP CONF DIR  to point Spark towards Hadoop configuration files   - SPARK LOCAL IP  to set the IP address Spark binds to on this node   - SPARK PUBLIC DNS  to set the public dns name of the driver program   - SPARK CLASSPATH  root spark-1 0 1-bin-hadoop2 conf     Options read by executors and drivers running inside the cluster   - SPARK LOCAL IP  to set the IP address Spark binds to on this node   - SPARK PUBLIC DNS  to set the public DNS name of the driver program   - SPARK CLASSPATH  default classpath entries to append   - SPARK LOCAL DIRS  storage directories to use on this node for shuffle and RDD data   - MESOS NATIVE LIBRARY  to point to your libmesos so if you use Mesos    Options read in YARN client mode   - HADOOP CONF DIR  to point Spark towards Hadoop configuration files   - SPARK EXECUTOR INSTANCES  Number of workers to start  Default  2    - SPARK EXECUTOR CORES  Number of cores for the workers  Default  1     - SPARK EXECUTOR MEMORY  Memory per Worker  e g  1000M  2G   Default  1G    - SPARK DRIVER MEMORY  Memory for Master  e g  1000M  2G   Default  512 Mb    - SPARK YARN APP NAME  The name of your application  Default  Spark    - SPARK YARN QUEUE  The hadoop queue to use for allocation requests  Default     default       - SPARK YARN DIST FILES  Comma separated list of files to be distributed with the job    - SPARK YARN DIST ARCHIVES  Comma separated list of archives to be distributed with the job     Options for the daemons used in the standalone deploy mode    - SPARK MASTER IP  to bind the master to a different IP address or hostname   - SPARK MASTER PORT   SPARK MASTER WEBUI PORT  to use non-default ports for the master   - SPARK MASTER OPTS  to set config properties only for the master  e g   -Dx y     - SPARK WORKER CORES  to set the number of cores to use on this machine   - SPARK WORKER MEMORY  to set how much total memory workers have to give executors  e g  1000m  2g    - SPARK WORKER PORT   SPARK WORKER WEBUI PORT  to use non-default ports for the worker   - SPARK WORKER INSTANCES  to set the number of worker processes per node   - SPARK WORKER DIR  to set the working directory of worker processes   - SPARK WORKER OPTS  to set config properties only for the worker  e g   -Dx y     - SPARK HISTORY OPTS  to set config properties only for the history server  e g   -Dx y     - SPARK DAEMON JAVA OPTS  to set config properties for all daemons  e g   -Dx y     - SPARK PUBLIC DNS  to set the public dns name of the master or workers  export SPARK SUBMIT CLASSPATH   FWDIR conf

User · Answer

The way I do it is   in the location I run the spark-submit script do    cp  etc spark conf log4j properties     nano log4j properties   change INFO to what ever level of logging you want and then run your spark-submit

User · Answer

Programmatic way   spark sparkContext setLogLevel  WARN     Available Options  ERROR WARN  INFO

User · Answer

For PySpark  you can also set the log level in your scripts with sc setLogLevel  FATAL    From the docs      Control our logLevel  This overrides any user-defined log settings  Valid log levels include  ALL  DEBUG  ERROR  FATAL  INFO  OFF  TRACE  WARN

User · Answer

Inspired by the pyspark tests py I did  def quiet logs sc       logger   sc  jvm org apache log4j     logger LogManager getLogger  org    setLevel  logger Level ERROR       logger LogManager getLogger  akka   setLevel  logger Level ERROR     Calling this just after creating SparkContext reduced stderr lines logged for my test from 2647 to 163  However creating the SparkContext itself logs 163  up to  15 08 25 10 14 16 INFO SparkDeploySchedulerBackend  SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio  0 0   and it s not clear to me how to adjust those programmatically

User · Answer

In Spark 2 0 you can also configure it dynamically for your application using setLogLevel       from pyspark sql import SparkSession     spark   SparkSession builder           master  local             appName  foo             getOrCreate       spark sparkContext setLogLevel  WARN     In the pyspark console  a default spark session will already be available

User · Answer

This below code snippet for scala users    Option 1     Below snippet you can add at the file level  import org apache log4j  Level  Logger  Logger getLogger  org   setLevel Level WARN    Option 2          Note   which will be applicable for all the application which is using   spark session    import org apache spark sql SparkSession    private this  implicit val spark   SparkSession builder   master  local      getOrCreate    spark sparkContext setLogLevel  WARN     Option 3        Note   This configuration should be added to your log4j properties    could be like  etc spark conf log4j properties  where the spark installation is there  or your project folder level log4j properties    since you are changing at module level   This  will be applicable for all the application    log4j rootCategory ERROR  console      IMHO  Option 1 is wise way since it can be switched off at file level

User · Answer

This may be due to how Spark computes its classpath   My hunch is that Hadoop s log4j properties file is appearing ahead of Spark s on the classpath  preventing your changes from taking effect   If you run   SPARK PRINT LAUNCH COMMAND 1 bin spark-shell   then Spark will print the full classpath used to launch the shell  in my case  I see   Spark Command   usr lib jvm java bin java -cp     root ephemeral-hdfs conf  root spark conf  root spark lib spark-assembly-1 0 0-hadoop1 0 4 jar  root spark lib datanucleus-api-jdo-3 2 1 jar  root spark lib datanucleus-core-3 2 2 jar  root spark lib datanucleus-rdbms-3 2 1 jar -XX MaxPermSize 128m -Djava library path   root ephemeral-hdfs lib native  -Xms512m -Xmx512m org apache spark deploy SparkSubmit spark-shell --class org apache spark repl Main   where  root ephemeral-hdfs conf is at the head of the classpath   I ve opened an issue  SPARK-2913  to fix this in the next release  I should have a patch out soon    In the meantime  here s a couple of workarounds    Add export SPARK SUBMIT CLASSPATH   FWDIR conf  to spark-env sh  Delete  or rename   root ephemeral-hdfs conf log4j properties

User · Answer

Edit your conf log4j properties file and Change the following line      log4j rootCategory INFO  console   to      log4j rootCategory ERROR  console   Another approach would be to    Fireup spark-shell and type in the following   import org apache log4j Logger import org apache log4j Level  Logger getLogger  org   setLevel Level OFF  Logger getLogger  akka   setLevel Level OFF    You won t see any logs after that

User · Answer

gt  gt  gt  log4j   sc  jvm org apache log4j  gt  gt  gt  log4j LogManager getRootLogger   setLevel log4j Level ERROR

User · Answer

I you want to keep using the logging  Logging facility for Python  you can try splitting configurations for your application and for Spark   LoggerManager   logger   logging getLogger   name    loggerSpark   logging getLogger  py4j   loggerSpark setLevel  WARNING

User · Answer

Just execute this command in the spark directory   cp conf log4j properties template conf log4j properties   Edit log4j properties     Set everything to be logged to the console log4j rootCategory INFO  console log4j appender console org apache log4j ConsoleAppender log4j appender console target System err log4j appender console layout org apache log4j PatternLayout log4j appender console layout ConversionPattern  d yy MM dd HH mm ss   p  c 1    m n    Settings to quiet third party logs that are too verbose log4j logger org eclipse jetty WARN log4j logger org eclipse jetty util component AbstractLifeCycle ERROR log4j logger org apache spark repl SparkIMain exprTyper INFO log4j logger org apache spark repl SparkILoop SparkILoopInterpreter INFO   Replace at the first line   log4j rootCategory INFO  console   by   log4j rootCategory WARN  console   Save and restart your shell  It works for me for Spark 1 1 0 and Spark 1 5 1 on OS X

User · Answer

Spark 1 6 2   log4j   sc  jvm org apache log4j log4j LogManager getRootLogger   setLevel log4j Level ERROR    Spark 2 x   spark sparkContext setLogLevel  WARN      spark being the SparkSession   Alternatively the old methods   Rename conf log4j properties template to conf log4j properties in Spark Dir   In the log4j properties  change log4j rootCategory INFO  console to log4j rootCategory WARN  console  Different log levels available    OFF  most specific  no logging  FATAL  most specific  little data  ERROR - Log only in case of Errors WARN - Log only in case of Warnings or Errors  INFO  Default  DEBUG - Log details steps  and all logs stated above  TRACE  least specific  a lot of data  ALL  least specific  all data

User · Answer

I used this with Amazon EC2 with 1 master and 2 slaves and Spark 1 2 1     Step 1  Change config file on the master node nano  root ephemeral-hdfs conf log4j properties    Before hadoop root logger INFO console   After hadoop root logger WARN console    Step 2  Replicate this change to slaves   spark-ec2 copy-dir  root ephemeral-hdfs conf

User · Answer

Simply add below param to your spark-submit command  --conf  spark driver extraJavaOptions -Dlog4jspark root logger WARN console    This overrides system value temporarily only for that job  Check exact property name  log4jspark root logger here  from log4j properties file   Hope this helps  cheers

User · Answer

You can use setLogLevel  val spark   SparkSession        builder          config  spark master    local 1           appName  TestLog          getOrCreate    spark sparkContext setLogLevel  WARN

[python] How to turn off INFO logging in Spark?

Examples related to python

Examples related to scala

Examples related to apache-spark

Examples related to hadoop

Examples related to pyspark