Description of problem: The NameNode HA feature for HDP, described here: http://specs.openstack.org/openstack/sahara-specs/specs/kilo/hdp-plugin-enable-hdfs-ha.html even if it is able to turn a cluster into NameNode HA configuration, does not change the configuration of Oozie, which still points to one of the two NameNodes. If it points to the standby node, job execution does not even start with a strange error: 2015-07-01 13:46:58.882 15549 WARNING sahara.service.edp.job_manager [-] Can't run job execution 437c1c6a-72e8-4b86-b036-6fa4b5657538 (reason: type Status report message description This request requires HTTP authentication. ) and keystone reports 2015-07-01 13:47:46.604 31419 WARNING keystone.token.controllers [-] User 0545bfa11fc444bb8782acb14f3e871e is unauthorized for tenant bd133d1e161345a69a15778cf7a580ca 2015-07-01 13:47:46.605 31419 WARNING keystone.common.wsgi [-] Authorization failed. The request you have made requires authentication. from x.y.z.t which would translate as that "User admin is unauthorized for tenant services". The errors, which could be improved, seems to be a red herring. The real issue is that Oozie returns with an exception that: 2015-07-01 09:50:07,693 INFO BaseJobServlet:539 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] AuthorizationException org.apache.oozie.service.AuthorizationException: E0501: Could not perform authorization operation, Operation category READ is not supported in state standby at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87) [...] This is the cluster configuration used, detailed by node groups and number of nodes for each of them: * master-ha-common (1 node) - AMBARI_SERVER - HISTORYSERVER - OOZIE_SERVER - RESOURCEMANAGER - SECONDARY_NAMENODE * master-ha-nn (2 nodes) - NAMENODE - ZOOKEEPER_SERVER - JOURNALNODE * master-ha-node (1 node) - ZOOKEEPER_SERVER - JOURNALNODE * worker-ha (3 nodes) - DATANODE - HDFS_CLIENT - MAPREDUCE2_CLIENT - NODEMANAGER - OOZIE_CLIENT - PIG - YARN_CLIENT - ZOOKEEPER_CLIENT The configuration key hdfs.nnha is set to true, as described by the documentation. I tested using a beta version of RHEL-OSP7, so basically Kilo, but the relevant code did not change in master: openstack-sahara-common-2015.1.0-4.el7ost.noarch openstack-sahara-engine-2015.1.0-4.el7ost.noarch openstack-sahara-api-2015.1.0-4.el7ost.noarch The discussion is mirrored upstream on the linked launchpad bug.
From http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.3/bk_using_Ambari_book/content/install-ha_2x.html: If you are using Oozie, you need to use the Nameservice URI instead of the NameNode URI in your workflow files. For example, where the Nameservice ID is mycluster: <workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf"> <start to="mr-node"/> <action name="mr-node"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>hdfs://mycluster</name-node> From http://172.24.4.230:8080/#/main/hosts/my-hdp2ha-b090a511-master-ha-common-f2ebfbc6-001.novalocal/configs (which pulls from hdfs-site.xml): dfs.nameservices: my-hdp2ha-b090a511 From your cluster definition (which should be valid): | info | {u'HDFS': {u'NameNode': | | | u'hdfs://172.24.4.229:8020', u'Web UI': | | | u'http://172.24.4.229:50070'}, u'JobFlow': | | | {u'Oozie': u'http://172.24.4.230:11000'}, | | | u'MapReduce2': {u'Web UI': | | | u'http://172.24.4.230:19888', u'History | | | Server': u'172.24.4.230:10020'}, u'Yarn': | | | {u'Web UI': u'http://172.24.4.230:8088', | | | u'ResourceManager': u'172.24.4.230:8050'}, | | | u'Ambari Console': {u'Web UI': | | | u'http://172.24.4.230:8080'}} | From sahara/sahara/service/edp/oozie/engine.py: nn_path = self.get_name_node_uri(self.cluster) ... job_parameters = { "jobTracker": rm_path, "nameNode": nn_path, "user.name": hdfs_user, oozie_libpath_key: oozie_libpath, app_path: "%s%s" % (nn_path, path_to_workflow), "oozie.use.system.libpath": "true"} From sahara/sahara/plugins/hdp/edp_engine.py: def get_name_node_uri(self, cluster): return cluster['info']['HDFS']['NameNode'] We've succeeded in setting up a highly available cluster, but we're hard-coding ourselves into only using Oozie through one of the nodes, rather than using the nameservice. I believe this to be the root cause of the issue; however, fixing it is a non-trivial change, as nameservice designation is dynamic. To be discussed. It is notable that this bug does not wholly block the RFE: through a certain legalistic interpretation, Sahara does now support clusters with HDP HA namenodes. The issue is that whenever the active namenode is not the one at which Oozie is (permanently) pointed through Sahara, Oozie will be unavailable, and with it, Sahara's EDP interface.
Reproduced locally; repaired; review posted upstream. https://review.openstack.org/#/c/198895/
After several positive reviews and no negatives, backporting.
The HA NameNode is correctly setup and Oozie points to the active namenode, even if the active NameNode disappear and it is replaced by the standby instance. See also rhbz#1149055. Tested on: openstack-sahara-api-2015.1.0-5.el7ost.noarch openstack-sahara-engine-2015.1.0-5.el7ost.noarch openstack-sahara-common-2015.1.0-5.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1548