I'm not able to run a simple spark
job in Scala IDE
(Maven spark project) installed on Windows 7
Spark core dependency has been added.
val conf = new SparkConf().setAppName("DemoDF").setMaster("local")
val sc = new SparkContext(conf)
val logData = sc.textFile("File.txt")
logData.count()
Error:
16/02/26 18:29:33 INFO SparkContext: Created broadcast 0 from textFile at FrameDemo.scala:13
16/02/26 18:29:34 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
at <br>org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)<br>
at scala.Option.map(Option.scala:145)<br>
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)<br>
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>
at scala.Option.getOrElse(Option.scala:120)<br>
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>
at scala.Option.getOrElse(Option.scala:120)<br>
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)<br>
at org.apache.spark.rdd.RDD.count(RDD.scala:1143)<br>
at com.org.SparkDF.FrameDemo$.main(FrameDemo.scala:14)<br>
at com.org.SparkDF.FrameDemo.main(FrameDemo.scala)<br>
This question is related to
eclipse
scala
apache-spark
Follow this:
Create a bin
folder in any directory(to be used in step 3).
Download winutils.exe and place it in the bin directory.
Now add System.setProperty("hadoop.home.dir", "PATH/TO/THE/DIR");
in your code.
That's a tricky one... Your storage letter must be capical. For example "C:\..."
C:\winutils\bin
winutils.exe
inside C:\winutils\bin
HADOOP_HOME
to C:\winutils
1) Download winutils.exe from https://github.com/steveloughran/winutils
2) Create a directory In windows "C:\winutils\bin
3) Copy the winutils.exe inside the above bib folder .
4) Set the environmental property in the code
System.setProperty("hadoop.home.dir", "file:///C:/winutils/");
5) Create a folder "file:///C:/temp" and give 777 permissions.
6) Add config property in spark Session ".config("spark.sql.warehouse.dir", "file:///C:/temp")"
if we see below issue
ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
then do following steps
On Windows 10 - you should add two different arguments.
(1) Add the new variable and value as - HADOOP_HOME and path (i.e. c:\Hadoop) under System Variables.
(2) Add/append new entry to the "Path" variable as "C:\Hadoop\bin".
The above worked for me.
On top of mentioning your environment variable for HADOOP_HOME
in windows as C:\winutils
, you also need to make sure you are the administrator of the machine. If not and adding environment variables prompts you for admin credentials (even under USER
variables) then these variables will be applicable once you start your command prompt as administrator.
You can alternatively download winutils.exe
from GITHub:
https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin
replace hadoop-2.7.1
with the version you want and place the file in D:\hadoop\bin
If you do not have access rights to the environment variable settings on your machine, simply add the below line to your code:
System.setProperty("hadoop.home.dir", "D:\\hadoop");
I got the same problem while running unit tests. I found this workaround solution:
The following workaround allows to get rid of this message:
File workaround = new File(".");
System.getProperties().put("hadoop.home.dir", workaround.getAbsolutePath());
new File("./bin").mkdirs();
new File("./bin/winutils.exe").createNewFile();
Setting the Hadoop_Home environment variable in system properties didn't work for me. But this did:
I have also faced the similar problem with the following details Java 1.8.0_121, Spark spark-1.6.1-bin-hadoop2.6, Windows 10 and Eclipse Oxygen.When I ran my WordCount.java in Eclipse using HADOOP_HOME as a system variable as mentioned in the previous post, it did not work, what worked for me is -
System.setProperty("hadoop.home.dir", "PATH/TO/THE/DIR");
PATH/TO/THE/DIR/bin=winutils.exe whether you run within Eclipse as a Java application or by spark-submit from cmd using
spark-submit --class groupid.artifactid.classname --master local[2] /path to the jar file created using maven /path to a demo test file /path to output directory command
Example: Go to the bin location of Spark/home/location/bin and execute the spark-submit as mentioned,
D:\BigData\spark-2.3.0-bin-hadoop2.7\bin>spark-submit --class com.bigdata.abdus.sparkdemo.WordCount --master local[1] D:\BigData\spark-quickstart\target\spark-quickstart-0.0.1-SNAPSHOT.jar D:\BigData\spark-quickstart\wordcount.txt
Source: Stackoverflow.com