hadoop No FileSystem for scheme file

Question

I am trying to run a simple NaiveBayesClassifer using hadoop  getting this error  Exception in thread  main  java io IOException  No FileSystem for scheme  file     at org apache hadoop fs FileSystem createFileSystem FileSystem java 1375      at org apache hadoop fs FileSystem access 200 FileSystem java 66      at org apache hadoop fs FileSystem Cache get FileSystem java 1390      at org apache hadoop fs FileSystem get FileSystem java 196      at org apache hadoop fs FileSystem get FileSystem java 95      at org apache hadoop fs FileSystem get FileSystem java 180      at org apache hadoop fs Path getFileSystem Path java 175      at org apache mahout classifier naivebayes NaiveBayesModel materialize NaiveBayesModel java 100    Code         Configuration configuration   new Configuration        NaiveBayesModel model   NaiveBayesModel materialize new Path modelPath   configuration     error in this line     modelPath is pointing to NaiveBayes bin file  and configuration object is printing - Configuration  core-default xml  core-site xml   I think its because of jars  any ideas

User · Answer

It took me sometime to figure out fix from given answers, due to my newbieness. This is what I came up with, if anyone else needs help from the very beginning:

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object MyObject {
  def main(args: Array[String]): Unit = {

    val mySparkConf = new SparkConf().setAppName("SparkApp").setMaster("local[*]").set("spark.executor.memory","5g");
    val sc = new SparkContext(mySparkConf)

    val conf = sc.hadoopConfiguration

    conf.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
    conf.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)

I am using Spark 2.1

And I have this part in my build.sbt

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}

User · Answer

For those using the shade plugin  following on david p s advice  you can merge the services in the shaded jar by adding the ServicesResourceTransformer to the plugin config      lt plugin gt       lt groupId gt org apache maven plugins lt  groupId gt       lt artifactId gt maven-shade-plugin lt  artifactId gt       lt version gt 2 3 lt  version gt       lt executions gt         lt execution gt           lt phase gt package lt  phase gt           lt goals gt             lt goal gt shade lt  goal gt           lt  goals gt           lt configuration gt             lt transformers gt               lt transformer implementation  org apache maven plugins shade resource ServicesResourceTransformer   gt             lt  transformers gt           lt  configuration gt         lt  execution gt       lt  executions gt     lt  plugin gt    This will merge all the org apache hadoop fs FileSystem services in one file

User · Answer

Another possible cause  though the OPs question doesn t itself suffer from this  is if you create a configuration instance that does not load the defaults   Configuration config   new Configuration false     If you don t load the defaults then you won t get the default settings for things like the FileSystem implementations which leads to identical errors like this when trying to access HDFS   Switching to the parameterless constructor of passing in true to load defaults may resolve this   Additionally if you are adding custom configuration locations  e g  on the file system  to the Configuration object be careful of which overload of addResource   you use   For example if you use addResource String  then Hadoop assumes that the string is a class path resource  if you need to specify a local file try the following   File configFile   new File  example config xml    config addResource new Path  file       configFile getAbsolutePath

User · Answer

I also came across similar issue   Added core-site xml and hdfs-site xml as resources of conf  object   Configuration conf   new Configuration true       conf addResource new Path   lt path to gt  core-site xml     conf addResource new Path   lt path to gt  hdfs-site xml       Also edited version conflicts in pom xml   e g  If configured version of hadoop is 2 8 1  but in pom xml file  dependancies has version 2 7 1  then change that to 2 8 1  Run Maven install again   This solved error for me

User · Answer

thanks david p scala   conf set  fs hdfs impl   classOf org apache hadoop hdfs DistributedFileSystem  getName   conf set  fs file impl   classOf org apache hadoop fs LocalFileSystem  getName     or   lt property gt    lt name gt fs hdfs impl lt  name gt    lt value gt org apache hadoop hdfs DistributedFileSystem lt  value gt   lt  property gt

User · Answer

Configuration conf   new Configuration    conf set  fs defaultFS    hdfs   nameNode 9000    FileSystem fs   FileSystem get conf     set fs defaultFS works for me  Hadoop-2 8 1

User · Answer

For maven  just add the maven dependency for hadoop-hdfs  refer to the link below  will solve the issue   http   mvnrepository com artifact org apache hadoop hadoop-hdfs 2 7 1

User · Answer

This is a typical case of the maven-assembly plugin breaking things   Why this happened to us  Different JARs  hadoop-commons for LocalFileSystem  hadoop-hdfs for DistributedFileSystem  each contain a different file called org apache hadoop fs FileSystem in their META-INFO services directory  This file lists the canonical classnames of the filesystem implementations they want to declare  This is called a Service Provider Interface implemented via java util ServiceLoader  see org apache hadoop FileSystem loadFileSystems    When we use maven-assembly-plugin  it merges all our JARs into one  and all META-INFO services org apache hadoop fs FileSystem overwrite each-other  Only one of these files remains  the last one that was added   In this case  the FileSystem list from hadoop-commons overwrites the list from hadoop-hdfs  so DistributedFileSystem was no longer declared   How we fixed it  After loading the Hadoop configuration  but just before doing anything FileSystem-related  we call this       hadoopConfig set  fs hdfs impl            org apache hadoop hdfs DistributedFileSystem class getName              hadoopConfig set  fs file impl           org apache hadoop fs LocalFileSystem class getName            Update  the correct fix  It has been brought to my attention by krookedking that there is a configuration-based way to make the maven-assembly use a merged version of all the FileSystem services declarations  check out his answer below

User · Answer

I faced the same problem  I found two solutions   1  Editing the jar file manually   Open the jar file with WinRar  or similar tools   Go to Meta-info   services   and edit  org apache hadoop fs FileSystem  by appending   org apache hadoop fs LocalFileSystem    2  Changing the order of my dependencies as follow   lt dependencies gt   lt dependency gt     lt groupId gt org apache hadoop lt  groupId gt     lt artifactId gt hadoop-hdfs lt  artifactId gt     lt version gt 3 2 1 lt  version gt   lt  dependency gt    lt dependency gt     lt groupId gt org apache hadoop lt  groupId gt     lt artifactId gt hadoop-common lt  artifactId gt     lt version gt 3 2 1 lt  version gt   lt  dependency gt    lt dependency gt     lt groupId gt org apache hadoop lt  groupId gt     lt artifactId gt hadoop-mapreduce-client-core lt  artifactId gt     lt version gt 3 2 1 lt  version gt   lt  dependency gt    lt dependency gt     lt groupId gt org apache hadoop lt  groupId gt     lt artifactId gt hadoop-client lt  artifactId gt     lt version gt 3 2 1 lt  version gt   lt  dependency gt      lt  dependencies gt

User · Answer

If you are using sbt     hadoop lazy val HADOOP VERSION    2 8 0   lazy val dependenceList   Seq     hadoop   The order is important   hadoop-hdfs  and then  hadoop-common   org apache hadoop     hadoop-hdfs    HADOOP VERSION    org apache hadoop     hadoop-common    HADOOP VERSION

User · Answer

For SBT use below mergeStrategy in build sbt  mergeStrategy in assembly  lt  lt    mergeStrategy in assembly     old    gt        case PathList  META-INF    services    org apache hadoop fs FileSystem     gt  MergeStrategy filterDistinctLines     case s   gt  old s

User · Answer

For the record  this is still happening in hadoop 2 4 0  So frustrating     I was able to follow the instructions in this link  http   grokbase com t cloudera scm-users 1288xszz7r no-filesystem-for-scheme-hdfs  I added the following to my core-site xml and it worked    lt property gt      lt name gt fs file impl lt  name gt      lt value gt org apache hadoop fs LocalFileSystem lt  value gt      lt description gt The FileSystem for file  uris  lt  description gt   lt  property gt    lt property gt      lt name gt fs hdfs impl lt  name gt      lt value gt org apache hadoop hdfs DistributedFileSystem lt  value gt      lt description gt The FileSystem for hdfs  uris  lt  description gt   lt  property gt

User · Answer

This is not related to Flink  but I ve found this issue in Flink also   For people using Flink  you need to download Pre-bundled Hadoop and put it inside  opt flink lib

User · Answer

I assume you build sample using maven   Please check content of the JAR you re trying to run  Especially META-INFO services directory  file org apache hadoop fs FileSystem  There should be list of filsystem implementation classes  Check line org apache hadoop hdfs DistributedFileSystem is present in the list for HDFS and org apache hadoop fs LocalFileSystem for local file scheme   If this is the case  you have to override referred resource during the build   Other possibility is you simply don t have hadoop-hdfs jar in your classpath but this has low probability  Usually if you have correct hadoop-client dependency it is not an option

User · Answer

Use this plugin   lt plugin gt                   lt groupId gt org apache maven plugins lt  groupId gt                   lt artifactId gt maven-shade-plugin lt  artifactId gt                   lt version gt 1 5 lt  version gt                   lt executions gt                       lt execution gt                           lt phase gt package lt  phase gt                           lt goals gt                               lt goal gt shade lt  goal gt                           lt  goals gt                            lt configuration gt                               lt filters gt                                   lt filter gt                                       lt artifact gt     lt  artifact gt                                       lt excludes gt                                           lt exclude gt META-INF   SF lt  exclude gt                                           lt exclude gt META-INF   DSA lt  exclude gt                                           lt exclude gt META-INF   RSA lt  exclude gt                                       lt  excludes gt                                   lt  filter gt                               lt  filters gt                               lt shadedArtifactAttached gt true lt  shadedArtifactAttached gt                               lt shadedClassifierName gt allinone lt  shadedClassifierName gt                               lt artifactSet gt                                   lt includes gt                                       lt include gt     lt  include gt                                   lt  includes gt                               lt  artifactSet gt                               lt transformers gt                                   lt transformer                                     implementation  org apache maven plugins shade resource AppendingTransformer  gt                                       lt resource gt reference conf lt  resource gt                                   lt  transformer gt                                   lt transformer                                     implementation  org apache maven plugins shade resource ManifestResourceTransformer  gt                                   lt  transformer gt                                   lt transformer                                  implementation  org apache maven plugins shade resource ServicesResourceTransformer  gt                                   lt  transformer gt                               lt  transformers gt                           lt  configuration gt                       lt  execution gt                   lt  executions gt               lt  plugin gt

User · Answer

Took me ages to figure it out with Spark 2 0 2  but here s my bit   val sparkBuilder   SparkSession builder  appName  app name    master  local      Various Params  getOrCreate    val hadoopConfig  Configuration   sparkBuilder sparkContext hadoopConfiguration  hadoopConfig set  fs hdfs impl   classOf org apache hadoop hdfs DistributedFileSystem  getName   hadoopConfig set  fs file impl   classOf org apache hadoop fs LocalFileSystem  getName    And the relevant parts of my build sbt   scalaVersion     2 11 8  libraryDependencies     org apache spark      spark-core     2 0 2    I hope this can help

User · Answer

Assuming that you are using mvn and cloudera distribution of hadoop  I m using cdh4 6 and adding these dependencies worked for me I think you should check the versions of hadoop and mvn dependencies    lt dependency gt           lt groupId gt org apache hadoop lt  groupId gt           lt artifactId gt hadoop-core lt  artifactId gt           lt version gt 2 0 0-mr1-cdh4 6 0 lt  version gt       lt  dependency gt        lt dependency gt           lt groupId gt org apache hadoop lt  groupId gt           lt artifactId gt hadoop-common lt  artifactId gt           lt version gt 2 0 0-cdh4 6 0 lt  version gt       lt  dependency gt        lt dependency gt           lt groupId gt org apache hadoop lt  groupId gt           lt artifactId gt hadoop-client lt  artifactId gt           lt version gt 2 0 0-cdh4 6 0 lt  version gt       lt  dependency gt    don t forget to add cloudera mvn repository    lt repository gt           lt id gt cloudera lt  id gt           lt url gt https   repository cloudera com artifactory cloudera-repos  lt  url gt   lt  repository gt

User · Answer

I use sbt assembly to package my project  I also meet this problem  My solution is here   Step1  add META-INF mergestrategy in your build sbt  case PathList  META-INF    MANIFEST MF     gt  MergeStrategy discard case PathList  META-INF   ps         gt  MergeStrategy first   Step2  add hadoop-hdfs lib to build sbt   org apache hadoop     hadoop-hdfs     2 4 0    Step3  sbt clean  sbt assembly  Hope the above information can help you

[java] hadoop No FileSystem for scheme: file

Examples related to java

Examples related to hadoop

Examples related to io