Dear Nagaraj,

Hope you are doing well

You might get your both/all namenode in “standby” state while performing following task

i.  upgrading hadoop from one version to another,
ii.  upgrading cloudera manager
iii. migrating your cloudera manager database from one database to another 
iv. changing hostname or IP of any node of your cluster specially namenode and zoo keeper.

Problem you are facing right now is really not with namenode and datanode. Problem is due to zookeeper.

The ZooKeeper maintains znode under which the ZK failover controller stores its information. Note that the nameservice ID is automatically appended to this znode, so it is not normally necessary to configure this, even in a federated environment.

This znode is configured with ha.zookeeper.parent-znode and by default has value “hadoop-ha” .

you have few ways to resolve this issue.

1. Edit ActiveStandbyElecorLock


run zookeeper shell. zkCli

zkCli $ get /hadoop-ha/touk-cluster-dev/ActiveStandbyElectorLock

You will see which process is holding onto this lock. Look at the “ephemeralOwner” line and then grep in your ZooKeeper logs for that same ID, which should tell you who has the lock. Usually process other than ZKFCs are holding this lock that is why you have the issue. You can edit this with ZkFCs process obtained from log. This is tedious process.

ha-nn-hostname.com
cZxid = 0x2150
ctime = Wed Nov 14 20:41:59 PST 2012
mZxid = 0x2150
mtime = Wed Nov 14 20:41:59 PST 2012
pZxid = 0x2150
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x13b011b76cb0078
dataLength = 47
numChildren = 0


2. Remove ZNode from ZooKeeper

Whenever we re-enabled Automatic Failover both the namenode would go to standby mode with the error:
“Failed to initialize High Availability state in ZooKeeper. This is because a ZNode for this nameservice is already created. So remove the ZNode, To delete the ZNode for “hadoop-ha” in Zookeeper login go zookeeper shell using zkcli.

$ zkCli.sh
[zk: localhost:2181(CONNECTED) 0] ls
/hadoop-ha
[nameservice1]
[zk: localhost:2181(CONNECTED) 1] rmr
/hadoop-ha


Then go to CM UI, “All service” -> “hdfs1″ -> “Instances” -> “Failovercontroller (xx1)” -> “Actions” -> “Initialize Automatic Failover “. This will create a new ZNode with correct lock(of Fail over controller). Then restart the HDFS service.


Please let me know if you have any further issue regarding this.

We are eagerly waiting fro your reply

Feel free to contact us in case you have any query.

Please note if you are not happy with the response on this ticket, please escalate it to escalations@edureka.in.
We assure you that we will get back to you within 24 hours