[scala] Spark - Error "A master URL must be set in your configuration" when submitting an app

I have an Spark app which runs with no problem in local mode,but have some problems when submitting to the Spark cluster.

The error msg are as follows:

16/06/24 15:42:06 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2, cluster-node-02): java.lang.ExceptionInInitializerError
    at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)
    at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)
    at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
    at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:401)
    at GroupEvolutionES$.<init>(GroupEvolutionES.scala:37)
    at GroupEvolutionES$.<clinit>(GroupEvolutionES.scala)
    ... 14 more

16/06/24 15:42:06 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, cluster-node-02): java.lang.NoClassDefFoundError: Could not initialize class GroupEvolutionES$
    at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)
    at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)
    at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
    at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

In the above code, GroupEvolutionES is the main class. The error msg says "A master URL must be set in your configuration", but I have provided the "--master" parameter to spark-submit.

Anyone who knows how to fix this problem?

Spark version: 1.6.1

This question is related to scala apache-spark

The answer is


Worked for me after replacing

SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME");

with

SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME").setMaster("local[2]").set("spark.executor.memory","1g");

Found this solution on some other thread on stackoverflow.


The TLDR:

.config("spark.master", "local")

a list of the options for spark.master in spark 2.2.1

I ended up on this page after trying to run a simple Spark SQL java program in local mode. To do this, I found that I could set spark.master using:

SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL basic example")
.config("spark.master", "local")
.getOrCreate();

An update to my answer:

To be clear, this is not what you should do in a production environment. In a production environment, spark.master should be specified in one of a couple other places: either in $SPARK_HOME/conf/spark-defaults.conf (this is where cloudera manager will put it), or on the command line when you submit the app. (ex spark-submit --master yarn).

If you specify spark.master to be 'local' in this way, spark will try to run in a single jvm, as indicated by the comments below. If you then try to specify --deploy-mode cluster, you will get an error 'Cluster deploy mode is not compatible with master "local"'. This is because setting spark.master=local means that you are NOT running in cluster mode.

Instead, for a production app, within your main function (or in functions called by your main function), you should simply use:

SparkSession
.builder()
.appName("Java Spark SQL basic example")
.getOrCreate();

This will use the configurations specified on the command line/in config files.

Also, to be clear on this too: --master and "spark.master" are the exact same parameter, just specified in different ways. Setting spark.master in code, like in my answer above, will override attempts to set --master, and will override values in spark-defaults.conf, so don't do it in production. Its great for tests though.

also, see this answer. which links to a list of the options for spark.master and what each one actually does.

a list of the options for spark.master in spark 2.2.1


just add .setMaster("local") to your code as shown below:

val conf = new SparkConf().setAppName("Second").setMaster("local") 

It worked for me ! Happy coding !


I had the same problem, Here is my code before modification :

package com.asagaama

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD

/**
  * Created by asagaama on 16/02/2017.
  */
object Word {

  def countWords(sc: SparkContext) = {
    // Load our input data
    val input = sc.textFile("/Users/Documents/spark/testscase/test/test.txt")
    // Split it up into words
    val words = input.flatMap(line => line.split(" "))
    // Transform into pairs and count
    val counts = words.map(word => (word, 1)).reduceByKey { case (x, y) => x + y }
    // Save the word count back out to a text file, causing evaluation.
    counts.saveAsTextFile("/Users/Documents/spark/testscase/test/result.txt")
  }

  def main(args: Array[String]) = {
    val conf = new SparkConf().setAppName("wordCount")
    val sc = new SparkContext(conf)
    countWords(sc)
  }

}

And after replacing :

val conf = new SparkConf().setAppName("wordCount")

With :

val conf = new SparkConf().setAppName("wordCount").setMaster("local[*]")

It worked fine !


Replacing :

SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME");
WITH
SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME").setMaster("local[2]").set("spark.executor.memory","1g");

Did the magic.


If you are running a standalone application then you have to use SparkContext instead of SparkSession

val conf = new SparkConf().setAppName("Samples").setMaster("local")
val sc = new SparkContext(conf)
val textData = sc.textFile("sample.txt").cache()

How does spark context in your application pick the value for spark master?

  • You either provide it explcitly withing SparkConf while creating SC.
  • Or it picks from the System.getProperties (where SparkSubmit earlier put it after reading your --master argument).

Now, SparkSubmit runs on the driver -- which in your case is the machine from where you're executing the spark-submit script. And this is probably working as expected for you too.

However, from the information you've posted it looks like you are creating a spark context in the code that is sent to the executor -- and given that there is no spark.master system property available there, it fails. (And you shouldn't really be doing so, if this is the case.)

Can you please post the GroupEvolutionES code (specifically where you're creating SparkContext(s)).


Tried this option in learning Spark processing with setting up Spark context in local machine. Requisite 1)Keep Spark sessionr running in local 2)Add Spark maven dependency 3)Keep the input file at root\input folder 4)output will be placed at \output folder. Getting max share value for year. down load any CSV from yahoo finance https://in.finance.yahoo.com/quote/CAPPL.BO/history/ Maven dependency and Scala code below -

<dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.4.3</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>   

object MaxEquityPriceForYear {
  def main(args: Array[String]) {
    val sparkConf = new SparkConf().setAppName("ShareMaxPrice").setMaster("local[2]").set("spark.executor.memory", "1g");
    val sc = new SparkContext(sparkConf);
    val input = "./input/CAPPL.BO.csv"
    val output = "./output"
    sc.textFile(input)
      .map(_.split(","))
      .map(rec => ((rec(0).split("-"))(0).toInt, rec(1).toFloat))
      .reduceByKey((a, b) => Math.max(a, b))
      .saveAsTextFile(output)
  }

I used this SparkContext constructor instead, and errors were gone:

val sc = new SparkContext("local[*]", "MyApp")

try this

make trait

import org.apache.spark.sql.SparkSession
trait SparkSessionWrapper {
   lazy val spark:SparkSession = {
      SparkSession
        .builder()
        .getOrCreate()
    }
}

extends it

object Preprocess extends SparkSessionWrapper {

We are missing the setMaster("local[*]") to set. Once we added then problem get resolved.

Problem:

val spark = SparkSession
      .builder()
      .appName("Spark Hive Example")
      .config("spark.sql.warehouse.dir", warehouseLocation)
      .enableHiveSupport()
      .getOrCreate()

solution:

val spark = SparkSession
      .builder()
      .appName("Spark Hive Example")
      .config("spark.sql.warehouse.dir", warehouseLocation)
      .enableHiveSupport()
      .master("local[*]")
      .getOrCreate()

If you don't provide Spark configuration in JavaSparkContext then you get this error. That is: JavaSparkContext sc = new JavaSparkContext();

Solution: Provide JavaSparkContext sc = new JavaSparkContext(conf);


If you are using following code

 val sc = new SparkContext(master, "WordCount", System.getenv("SPARK_HOME"))

Then replace with following lines

  val jobName = "WordCount";
  val conf = new SparkConf().setAppName(jobName);
  val sc = new SparkContext(conf)

In Spark 2.0 you can use following code

val spark = SparkSession
  .builder()
  .appName("Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
  .master("local[*]")// need to add
  .getOrCreate()

You need to add .master("local[*]") if runing local here * means all node , you can say insted of 8 1,2 etc

You need to set Master URL if on cluster


The default value of "spark.master" is spark://HOST:PORT, and the following code tries to get a session from the standalone cluster that is running at HOST:PORT, and expects the HOST:PORT value to be in the spark config file.

SparkSession spark = SparkSession
    .builder()
    .appName("SomeAppName")
    .getOrCreate();

"org.apache.spark.SparkException: A master URL must be set in your configuration" states that HOST:PORT is not set in the spark configuration file.

To not bother about value of "HOST:PORT", set spark.master as local

SparkSession spark = SparkSession
    .builder()
    .appName("SomeAppName")
    .config("spark.master", "local")
    .getOrCreate();

Here is the link for list of formats in which master URL can be passed to spark.master

Reference : Spark Tutorial - Setup Spark Ecosystem


var appName:String ="test"
val conf = new SparkConf().setAppName(appName).setMaster("local[*]").set("spark.executor.memory","1g");
val sc =  SparkContext.getOrCreate(conf)
sc.setLogLevel("WARN")