[hadoop] Datanode process not running in Hadoop

I set up and configured a multi-node Hadoop cluster using this tutorial.

When I type in the start-all.sh command, it shows all the processes initializing properly as follows:

starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-namenode-jawwadtest1.out
jawwadtest1: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest1.out
jawwadtest2: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest2.out
jawwadtest1: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-secondarynamenode-jawwadtest1.out
starting jobtracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-jobtracker-jawwadtest1.out
jawwadtest1: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-jawwadtest1.out
jawwadtest2: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-jawwadtest2.out

However, when I type the jps command, I get the following output:

31057 NameNode
4001 RunJar
6182 RunJar
31328 SecondaryNameNode
31411 JobTracker
32119 Jps
31560 TaskTracker

As you can see, there's no datanode process running. I tried configuring a single-node cluster but got the same problem. Would anyone have any idea what could be going wrong here? Are there any configuration files that are not mentioned in the tutorial or I may have looked over? I am new to Hadoop and am kinda lost and any help would be greatly appreciated.

EDIT: hadoop-root-datanode-jawwadtest1.log:

STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.3
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/$
************************************************************/
2012-08-09 23:07:30,717 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loa$
2012-08-09 23:07:30,734 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapt$
2012-08-09 23:07:30,735 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:30,736 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:31,018 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapt$
2012-08-09 23:07:31,024 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:32,366 INFO org.apache.hadoop.ipc.Client: Retrying connect to $
2012-08-09 23:07:37,949 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: $
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(Data$
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransition$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNo$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNod$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode($
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataN$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1$

2012-08-09 23:07:37,951 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: S$
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at jawwadtest1/198.101.220.90
************************************************************/

This question is related to hadoop configuration process

The answer is


Error in datanode.log file

$ more /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu.log

Shows:

java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop_tmp/hdfs/datanode: namenode clusterID = CID-e4c3fed0-c2ce-4d8b-8bf3-c6388689eb82; datanode clusterID = CID-2fcfefc7-c931-4cda-8f89-1a67346a9b7c

Solution: Stop your cluster and issue the below command & then start your cluster again.

sudo rm -rf  /usr/local/hadoop_tmp/hdfs/datanode/*

I Have applied some mixed configuration, and its worked for me.
First >>
Stop Hadoop all Services using ${HADOOP_HOME}/sbin/stop-all.sh

Second >>
Check mapred-site.xml which is located at your ${HADOOP_HOME}/etc/hadoop/mapred-site.xml and change the localhost to master.

Third >>
Remove the temporary folder created by hadoop
rm -rf //path//to//your//hadoop//temp//folder

Fourth >>
Add the recursive permission on temp.
sudo chmod -R 777 //path//to//your//hadoop//temp//folder

Fifth >>
Now Start all the services again. And First check that all service including datanode is running. enter image description here


    mv /usr/local/hadoop_store/hdfs/datanode /usr/local/hadoop_store/hdfs/datanode.backup

    mkdir /usr/local/hadoop_store/hdfs/datanode

    hadoop datanode OR start-all.sh

    jps

Run Below Commands in Line:-

  1. stop-all.sh (Run Stop All to Stop all the hadoop process)
  2. rm -r /usr/local/hadoop/tmp/ (Your Hadoop tmp directory which you configured in hadoop/conf/core-site.xml)
  3. sudo mkdir /usr/local/hadoop/tmp (Make the same directory again)
  4. hadoop namenode -format (Format your namenode)
  5. start-all.sh (Run Start All to start all the hadoop process)
  6. JPS (It will show the running processes)

Check whether the hadoop.tmp.dir property in the core-site.xml is correctly set. If you set it, navigate to this directory, and remove or empty this directory. If you didn't set it, you navigate to its default folder /tmp/hadoop-${user.name}, likewise remove or empty this directory.


Step 1:- Stop-all.sh

Step 2:- got to this path

cd /usr/local/hadoop/bin

Step 3:- Run that command hadoop datanode

Now DataNode work


if formatting the tmp directory is not working then try this:

  1. first stop all the entities like namenode, datanode etc. (you will be having some script or command to do that)
  2. Format tmp directory
  3. Go to /var/cache/hadoop-hdfs/hdfs/dfs/ and delete all the contents in the directory manually
  4. Now format your namenode again
  5. start all the entities then use jps command to confirm that the datanode has been started
  6. Now run whichever application you have

Hope this helps.


Once I was not able to find data node using jps in hadoop, then I deleted the current folder in the hadoop installed directory (/opt/hadoop-2.7.0/hadoop_data/dfs/data) and restarted hadoop using start-all.sh and jps.

This time I could find the data node and current folder was created again.


I have got details of the issue in the log file like below : "Invalid directory in dfs.data.dir: Incorrect permission for /home/hdfs/dnman1, expected: rwxr-xr-x, while actual: rwxrwxr-x" and from there I identified that the datanote file permission was 777 for my folder. I corrected to 755 and it started working.


  • Erase the files where data and name are in dfs.

In my case , I have hadoop on windows, over C:/, this file according to core-site.xml, etc , it was in tmp/Administrator/dfs/data... name, etc, so erase it.

Then, namenode -format. and try again,


Got the same error. Tried to start and stop dfs several times, cleared all directories that are mentioned in previous answers, but nothing helped.

The issue was resolved only after rebooting OS and configuring Hadoop from the scratch. (configuring Hadoop from the scratch without rebooting didn't work)


Stop all the services - ./stop-all.sh Format all the hdfs tmp directory from all the master and slave. Don't forget to format from slave.

Format the namenode.(hadoop namenode -format)

Now start the services on namenode. ./bin/start-all.sh

This made a difference for me to start the datanode service.


Follow these steps and your datanode will start again.

1)Stop dfs. 2)Open hdfs-site.xml 3)Remove the data.dir and name.dir properties from hdfs-site.xml and -format namenode again.

4)Then start dfs again.


You need to check :

/app/hadoop/tmp/dfs/data/current/VERSION and /app/hadoop/tmp/dfs/name/current/VERSION ---

in those two files and that to Namespace ID of name node and datanode.

If and only if data node's NamespaceID is same as name node's NamespaceID then your datanode will run.

If those are different copy the namenode NamespaceID to your Datanode's NamespaceID using vi editor or gedit and save and re run the deamons it will work perfectly.


I was having the same problem running a single-node pseudo-distributed instance. Couldn't figure out how to solve it, but a quick workaround is to manually start a DataNode with
hadoop-x.x.x/bin/hadoop datanode


In case of Mac os(Pseudo-distributed mode):

Open terminal

  1. Stop dfs. 'sbin/stop-all.sh'.
  2. cd /tmp
  3. rm -rf hadoop*
  4. Navigate to hadoop directory. Format the hdfs. bin/hdfs namenode -format
  5. sbin/start-dfs.sh

I faced similar issue while running the datanode. The following steps were useful.

  1. In [hadoop_directory]/sbin directory use ./stop-all.sh to stop all the running services.
  2. Remove the tmp dir using rm -r [hadoop_directory]/tmp (The path configured in [hadoop_directory]/etc/hadoop/core-site.xml)
  3. sudo mkdir [hadoop_directory]/tmp (Make a new tmp directory)
  4. Go to */hadoop_store/hdfs directory where you have created namenode and datanode as sub-directories. (The paths configured in [hadoop_directory]/etc/hadoop/hdfs-site.xml). Use

    rm -r namenode
    
    rm -r datanode
    
  5. In */hadoop_store/hdfs directory use

    sudo mkdir namenode
    
    sudo mkdir datanode
    

In case of permission issue, use

   chmod -R 755 namenode 

   chmod -R 755 datanode
  1. In [hadoop_directory]/bin use

     hadoop namenode -format (To format your namenode)
    
  2. In [hadoop_directory]/sbin directory use ./start-all.sh or ./start-dfs.sh to start the services.
  3. Use jps to check the services running.

Delete the datanode under your hadoop folder then rerun start-all.sh


Instead of deleting everything under the "hadoop tmp dir", you can set another one. For example, if your core-site.xml has this property:

<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hduser/data/tmp</value>
</property>

You can change this to:

<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hduser/data/tmp2</value>
</property>

and then scp core-site.xml to each node, and then "hadoop namenode -format", and then restart hadoop.


Try this

  1. stop-all.sh
  2. vi hdfs-site.xml
  3. change the value given for property dfs.data.dir
  4. format namenode
  5. start-all.sh

This is for newer version of Hadoop (I am running 2.4.0)

  • In this case stop the cluster sbin/stop-all.sh
  • Then go to /etc/hadoop for config files.

In the file: hdfs-site.xml Look out for directory paths corresponding to dfs.namenode.name.dir dfs.namenode.data.dir

  • Delete both the directories recursively (rm -r).
  • Now format the namenode via bin/hadoop namenode -format
  • And finally sbin/start-all.sh

Hope this helps.


  1. I configured hadoop.tmp.dir in conf/core-site.xml

  2. I configured dfs.data.dir in conf/hdfs-site.xml

  3. I configured dfs.name.dir in conf/hdfs-site.xml

  4. Deleted everything under "/tmp/hadoop-/" directory

  5. Changed file permissions from 777 to 755 for directory listed under dfs.data.dir

    And the data node started working.


Need to follow 3 steps.

(1) Need to go to the logs and check the most recent log (In hadoop- 2.6.0/logs/hadoop-user-datanode-ubuntu.log)

If the error is as

java.io.IOException: Incompatible clusterIDs in /home/kutty/work/hadoop2data/dfs/data: namenode clusterID = CID-c41df580-e197-4db6-a02a-a62b71463089; datanode clusterID = CID-a5f4ba24-3a56-4125-9137-fa77c5bb07b1

i.e. namenode cluster id and datanode cluster id's are not identical.

(2) Now copy the namenode clusterID which is CID-c41df580-e197-4db6-a02a-a62b71463089 in above error

(3) Replace the Datanode cluster ID with Namenode cluster ID in hadoopdata/dfs/data/current/version

clusterID=CID-c41df580-e197-4db6-a02a-a62b71463089

Restart Hadoop. Will run DataNode


Even after removing the remaking the directories, the datanode wasn't starting. So, I started it manually using bin/hadoop datanode It did not reach any conclusion. I opened another terminal from the same username and did jps and it showed me the running datanode process. It's working, but I just have to keep the unfinished terminal open by the side.


  1. Stop the dfs and yarn first.
  2. Remove the datanode and namenode directories as specified in the core-site.xml file.
  3. Re-create the directories.
  4. Then re-start the dfs and the yarn as follows.

    start-dfs.sh

    start-yarn.sh

    mr-jobhistory-daemon.sh start historyserver

    Hope this works fine.


I ran into the same issue. I have created a hdfs folder '/home/username/hdfs' with sub-directories name, data, and tmp which were referenced in config xml files of hadoop/conf.

When I started hadoop and did jps, I couldn't find datanode so I tried to manually start datanode using bin/hadoop datanode. Then I realized from error message that it has permissions issue accessing the dfs.data.dir=/home/username/hdfs/data/ which was referenced in one of the hadoop config files. All I had to do was stop hadoop, delete the contents of /home/username/hdfs/tmp/* directory and then try this command - chmod -R 755 /home/username/hdfs/ and then start hadoop. I could find the datanode!


Follow these steps and your datanode will start again.

  1. Stop dfs.
  2. Open hdfs-site.xml
  3. Remove the data.dir and name.dir properties from hdfs-site.xml and -format namenode again.
  4. Then remove the hadoopdata directory and add the data.dir and name.dir in hdfs-site.xml and again format namenode.
  5. Then start dfs again.

Please control if the the tmp directory property is pointing to a valid directory in core-site.xml

<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hduser/data/tmp</value>
</property>

If the directory is misconfigured, the datanode process will not start properly.


Delete the files under $hadoop_User/dfsdata and $hadoop_User/tmpdata then run:

hdfs namenode -format

finally run:

start-all.sh

Then your problem gets solved.


Examples related to hadoop

Hadoop MapReduce: Strange Result when Storing Previous Value in Memory in a Reduce Class (Java) What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check Spark Version What are the pros and cons of parquet format compared to other formats? java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient How to export data from Spark SQL to CSV How to copy data from one HDFS to another HDFS? How to calculate Date difference in Hive Select top 2 rows in Hive Spark - load CSV file as DataFrame?

Examples related to configuration

My eclipse won't open, i download the bundle pack it keeps saying error log Getting value from appsettings.json in .net core pgadmin4 : postgresql application server could not be contacted. ASP.NET Core configuration for .NET Core console application Turning off eslint rule for a specific file PHP Warning: Module already loaded in Unknown on line 0 How to read AppSettings values from a .json file in ASP.NET Core How to store Configuration file and read it using React Hadoop cluster setup - java.net.ConnectException: Connection refused Maven Jacoco Configuration - Exclude classes/packages from report not working

Examples related to process

Fork() function in C How to kill a nodejs process in Linux? Xcode process launch failed: Security Understanding PrimeFaces process/update and JSF f:ajax execute/render attributes Linux Script to check if process is running and act on the result CreateProcess error=2, The system cannot find the file specified How to make parent wait for all child processes to finish? How to use [DllImport("")] in C#? Visual Studio "Could not copy" .... during build How to terminate process from Python using pid?