The
NameNode is the single point of failure - SPOF in Hadoop-1.x configuration.
It maintains the locations of all of the data blocks in the cluster. In hadoop-1.x we have
the concept on SecondaryNameNode which holds a copy of the NameNode metadata.
If your NameNode goes down you can take the metadata copy stored with
SecondaryNameNode and use it to resume your work, once your NameNode is up
again.
It is important to note that the
Secondary NameNode is not a backup for the NameNode. It performs a checkpoint
process periodically. The data is almost certainly stale when recovering from a
Secondary NameNode checkpoint. However, recovering from a NameNode failure
using an old file system state is better than not being able to recover at all.
It is possible to recover from a previous checkpoint generated by the Secondary
NameNode. So, in case of NameNode failure, Hadoop admins have to manually
recover the data from Secondary NameNode.
In Hadoop 2.x, with the
introduction of HA (High Availability), the Standby NameNode came into picture. The standby NameNode is the node that
removes the problem of SPOF (Single Point of Failure) that was
there in Hadoop 1.x. The standby NameNode provides automatic failover in case
Active NameNode fails (if HA is not enabled)
Moreover, enabling HA is not mandatory. But, when it is enabled, you can't use Secondary NameNode. So, either Secondary NameNode is enabled OR Standby NameNode is enabled. For these reasons, adding high availability (HA) to the HDFS Name Node became one of the top priorities for the HDFS community.
Moreover, enabling HA is not mandatory. But, when it is enabled, you can't use Secondary NameNode. So, either Secondary NameNode is enabled OR Standby NameNode is enabled. For these reasons, adding high availability (HA) to the HDFS Name Node became one of the top priorities for the HDFS community.
So in another words, in Hadoop-2.x anyone can have more than one NameNode. In
case primary NameNode goes
down, the redundant NameNode can take over so that your cluster doesn't stop
working(either manual or automatic). In this implementation there is a pair of
NameNodes in an active/standby configuration. In the event of the failure of
the active namenode, the standby takes over its duties to continue servicing
client requests.
Steps for recovering from a NameNode
failure
1. Stop the Secondary NameNode:
$ cd $HADOOP_HOME
$ bin/hadoop-daemon.sh stop
secondarynamenode
2. Bring up a new machine which
will act as the new NameNode. This machine should have Hadoop installed
configuration setup like the previous NameNode. Also SSH for password-less login should be configured. Also, it should
have the same IP and hostname as the previous NameNode.
3. Copy the contents of fs.checkpoint.dir on the Secondary NameNode to the dfs.name.dir folder on the new NameNode machine.
4. Start the new NameNode on the
new machine:
$ bin/hadoop-daemon.sh start
namenode
5. Start the Secondary NameNode on
the Secondary NameNode machine:
$ bin/hadoop-daemon.sh start
secondarynamenode
6. Verify that the NameNode
started successfully by looking at the NameNode status page http://localhost:50070/.
Thus first we logged into the
Secondary NameNode and stopped the service. Next, we set up a new machine in
the exact manner we set up the failed NameNode. Next, we copied all of the
checkpoint and edit files from the Secondary NameNode to the new NameNode. This
will allow us to recover the file system status, metadata, and edits at the
time of the last checkpoint. Finally, we restarted the new NameNode and
Secondary NameNode.
Additionally
Recovering using the old data is
unacceptable for certain processing environments. Instead, another option would
be to set up some type of offsite storage where the NameNode can write its
image and edits files. This way, if there is a hardware failure of the
NameNode, you can recover the latest filesystem without resorting to restoring
old data from the Secondary NameNode snapshot.
The first step in this would be to
designate a new machine to hold the NameNode image and edit file backups. Next,
mount the backup machine on the NameNode server. Finally, modify the hdfs-site.xml
file on the server running the NameNode to write to the local filesystem and
the backup machine mount.
$ cd $HADOOP_HOME
Edit hdfs-site.xml
$ sudo vi conf/hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>/path/to/hadoop/cache/hadoop/dfs,
/path/to/backup</value>
</property>
Now the NameNode will write all of
the filesystem metadata to both /path/to/hadoop/ cache/hadoop/dfs and the
mounted /path/to/backup folders.
Hope you have enjoyed the article.
Author: Iqubal Mustafa Kaki, Technical Specialist.
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
nice article.
ReplyDeletehope you will continue this job.
Thank you
--
I really enjoy the blog.Much thanks again. Really Great.
ReplyDeleteBig Data Hadoop Training Videos
Simply superb article thanks a lot
ReplyDeleteThanks for sharing the very useful info about hadoop and please keep updating......
ReplyDeleteHadoop Admin Online Training
Thanks for the post, very useful. Would like to suggest best training on hadoop visit
ReplyDeleteHadoop Admin Online Course Bangalore
Hadoop Admin Online Training Bangalore
That was a great message in my carrier, and It's wonderful commands like mind relaxes with understand words of knowledge by information's.
ReplyDeleteHadoop Training in Chennai
Excellent article Hadoop admin Online Training
ReplyDeleteNice . very useful topic inBig Data Hadoop Online course Bangalore
ReplyDeleteThanks for good info it's very helpfull blog
ReplyDeleteHadoop Admin Online Training Bangalore
thank you for sharingHadoop Admin Online Training Hyderabad
ReplyDeleteThanks for sharing such details about hadoop admin is the interseting topic and to get some important information.Hadoop admin Keep updating the blogger on Hadoop Admin Online Training.
ReplyDeleteHi, your post on hadoop administration namenode failure was helpful being in working environment i learnt something new Hadoop Training in Velachery | Hadoop Training .
ReplyDeleteThanks.
ReplyDeleteBig Data and Hadoop Online Training
Very nice article,keep sharing more posts with us.
ReplyDeletebig data hadoop training
nice post.
ReplyDeletehttps://kitsonlinetrainings.com/course/salesforce-online-training
https://www.kitsonlinetrainings.com/course/hadoop-online-training
https://kitsonlinetrainings.com/course/mulesoft-online-training
Thanks for the blog article.Thanks Again. Keep writing.
ReplyDeletebest machine learning course in india
best machine learning course online