Bug 1238700
Summary: | NameNode HA for HDP2 does not set up Oozie correctly | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Luigi Toscano <ltoscano> |
Component: | openstack-sahara | Assignee: | Elise Gafford <egafford> |
Status: | CLOSED ERRATA | QA Contact: | Luigi Toscano <ltoscano> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 7.0 (Kilo) | CC: | kbasil, matt, mimccune, mlopes, pkshiras, yeylon |
Target Milestone: | ga | ||
Target Release: | 7.0 (Kilo) | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | openstack-sahara-2015.1.0-5.el7ost | Doc Type: | Bug Fix |
Doc Text: |
Prior to this update, while NameNode HA for HDP was functional and feature-complete upstream, Sahara continued to point Oozie at a single NameNode IP for all jobs.
Consequently, Oozie and Sahara's EDP were only successful when a single, arbitrary node was designated active (in an A/P HA model).
This update addresses this issue by directing Oozie to the nameservice, rather than any one namenode.
As a result, Oozie and EDP jobs can succeed regardless of which NameNode is active.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2015-08-05 13:28:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Luigi Toscano
2015-07-02 12:45:39 UTC
From http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.3/bk_using_Ambari_book/content/install-ha_2x.html: If you are using Oozie, you need to use the Nameservice URI instead of the NameNode URI in your workflow files. For example, where the Nameservice ID is mycluster: <workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf"> <start to="mr-node"/> <action name="mr-node"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>hdfs://mycluster</name-node> From http://172.24.4.230:8080/#/main/hosts/my-hdp2ha-b090a511-master-ha-common-f2ebfbc6-001.novalocal/configs (which pulls from hdfs-site.xml): dfs.nameservices: my-hdp2ha-b090a511 From your cluster definition (which should be valid): | info | {u'HDFS': {u'NameNode': | | | u'hdfs://172.24.4.229:8020', u'Web UI': | | | u'http://172.24.4.229:50070'}, u'JobFlow': | | | {u'Oozie': u'http://172.24.4.230:11000'}, | | | u'MapReduce2': {u'Web UI': | | | u'http://172.24.4.230:19888', u'History | | | Server': u'172.24.4.230:10020'}, u'Yarn': | | | {u'Web UI': u'http://172.24.4.230:8088', | | | u'ResourceManager': u'172.24.4.230:8050'}, | | | u'Ambari Console': {u'Web UI': | | | u'http://172.24.4.230:8080'}} | From sahara/sahara/service/edp/oozie/engine.py: nn_path = self.get_name_node_uri(self.cluster) ... job_parameters = { "jobTracker": rm_path, "nameNode": nn_path, "user.name": hdfs_user, oozie_libpath_key: oozie_libpath, app_path: "%s%s" % (nn_path, path_to_workflow), "oozie.use.system.libpath": "true"} From sahara/sahara/plugins/hdp/edp_engine.py: def get_name_node_uri(self, cluster): return cluster['info']['HDFS']['NameNode'] We've succeeded in setting up a highly available cluster, but we're hard-coding ourselves into only using Oozie through one of the nodes, rather than using the nameservice. I believe this to be the root cause of the issue; however, fixing it is a non-trivial change, as nameservice designation is dynamic. To be discussed. It is notable that this bug does not wholly block the RFE: through a certain legalistic interpretation, Sahara does now support clusters with HDP HA namenodes. The issue is that whenever the active namenode is not the one at which Oozie is (permanently) pointed through Sahara, Oozie will be unavailable, and with it, Sahara's EDP interface. Reproduced locally; repaired; review posted upstream. https://review.openstack.org/#/c/198895/ After several positive reviews and no negatives, backporting. The HA NameNode is correctly setup and Oozie points to the active namenode, even if the active NameNode disappear and it is replaced by the standby instance. See also rhbz#1149055. Tested on: openstack-sahara-api-2015.1.0-5.el7ost.noarch openstack-sahara-engine-2015.1.0-5.el7ost.noarch openstack-sahara-common-2015.1.0-5.el7ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1548 |