Bug 1305790
Summary: | Failure to launch Caldera 5.0.4 Hadoop Cluster via Sahara Wizards on RDO Liberty | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Community] RDO | Reporter: | Boris Derzhavets <bderzhavets> | ||||||||
Component: | openstack-sahara | Assignee: | Telles Nobrega <tenobreg> | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | Luigi Toscano <ltoscano> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | Liberty | CC: | bderzhavets, chris.brown | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | Liberty | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2018-10-23 18:13:57 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Can you please enable debug=True in Sahara logging, restart sahara-api and sahara-engine, retry and attach the logs? Please attach only the logs which are relevant to the last run, and pay attention to sensitive informations in the logs (IPs, passwords, etc) before submitting them (I'm not sure that there are in debug mode, but just in case please check the logs). Created attachment 1122392 [details]
sahara-egine.log ( debug=true,box rebooted )
Done as requested
Created attachment 1122393 [details]
sahara-api.log (debug=True,box rebooted)
Done as requested
Old logs have been removed, new ones created via # touch -f sahara-egine.log # touch -f sahara-api.log # chown sahara. *.log Got same error in Starting phase for upstream image sahara-liberty-cdh-5.4.0-centos-6.6.qcow2 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [req-ff8c07db-a925-47b3-b552-015e1f730648 ] [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] Error during operating on cluster (reason: Failed to Provision Hadoop Cluster: Failed to format NameNode. Error ID: b7f7d944-0a01-4edf-867d-eb6674231945) 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] Traceback (most recent call last): 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] File "/usr/lib/python2.7/site-packages/sahara/service/ops.py", line 164, in wrapper 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] f(cluster_id, *args, **kwds) 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] File "/usr/lib/python2.7/site-packages/sahara/service/ops.py", line 276, in _provision_cluster 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] plugin.start_cluster(cluster) 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/plugin.py", line 51, in start_cluster 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] cluster.hadoop_version).start_cluster(cluster) 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/v5_4_0/versionhandler.py", line 86, in start_cluster 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] dp.start_cluster(cluster) 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/v5_4_0/deploy.py", line 165, in start_cluster 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] CU.first_run(cluster) 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] File "/usr/lib/python2.7/site-packages/sahara/utils/cluster_progress_ops.py", line 139, in handler 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] add_fail_event(instance, e) 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__ 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] six.reraise(self.type_, self.value, self.tb) 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] File "/usr/lib/python2.7/site-packages/sahara/utils/cluster_progress_ops.py", line 136, in handler 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] value = func(*args, **kwargs) 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/cloudera_utils.py", line 42, in wrapper 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] raise ex.HadoopProvisionError(c.resultMessage) 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] HadoopProvisionError: Failed to Provision Hadoop Cluster: Failed to format NameNode. 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] Error ID: b7f7d944-0a01-4edf-867d-eb6674231945 2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] Hi Boris, In https://bugzilla.redhat.com/show_bug.cgi?id=1305419 you reported: Quotas update via https://github.com/openstack/sahara-ci-config/blob/master/config/devstack/local.sh#L63 seems to be important for both HDP 2.0.6 and CDH 5.0.4 Hadoop clusters launching Does this mean that you were successful with your 5.0.4 cluster and this bug can be closed, or are you still having trouble? I had success with HDP 2.0.6 and failure CDH 5.0.4. Caldera - CDH was root cause why client cancelled project. Close https://bugzilla.redhat.com/show_bug.cgi?id=1305419. Current one was a core issue. Would you like you may close both, it is not my concern since 02/2016. Just keeping you aware about real issue on RDO. Understood; thanks Boris! We'll leave it open for further investigation. Was there any further investigation here? Since there has been no activity on this bug for a while I'm closing this bug. |
Created attachment 1122358 [details] sahara-engine.log Description of problem: 5 node templates prepared with daemons set up per [root@ServerCentOS7 version_5_4_0(keystone_admin)]# pwd /usr/lib/python2.7/site-packages/sahara/plugins/default_templates/cdh/version_5_4_0 [root@ServerCentOS7 version_5_4_0(keystone_admin)]# cat namenode.json { "plugin_name": "cdh", "hadoop_version": "5.4.0", "node_processes": [ "HDFS_NAMENODE", "YARN_RESOURCEMANAGER", "HIVE_SERVER2", "HIVE_METASTORE", "CLOUDERA_MANAGER" ], "name": "cdh-540-default-namenode", "floating_ip_pool": "{floating_ip_pool}", "flavor_id": "{flavor_id}", "auto_security_group": "{auto_security_group}", "security_groups": "{security_groups}" } [root@ServerCentOS7 version_5_4_0(keystone_admin)]# cat datanode.json { "plugin_name": "cdh", "hadoop_version": "5.4.0", "node_processes": [ "HDFS_DATANODE", "YARN_NODEMANAGER" ], "name": "cdh-540-default-datanode", "floating_ip_pool": "{floating_ip_pool}", "flavor_id": "{flavor_id}", "auto_security_group": "{auto_security_group}", "security_groups": "{security_groups}" } [root@ServerCentOS7 version_5_4_0(keystone_admin)]# cat secondary-namenode.json { "plugin_name": "cdh", "hadoop_version": "5.4.0", "node_processes": [ "HDFS_SECONDARYNAMENODE", "OOZIE_SERVER", "YARN_JOBHISTORY", "SPARK_YARN_HISTORY_SERVER" ], "name": "cdh-540-default-secondary-namenode", "floating_ip_pool": "{floating_ip_pool}", "flavor_id": "{flavor_id}", "auto_security_group": "{auto_security_group}", "security_groups": "{security_groups}" } [root@ServerCentOS7 version_5_4_0(keystone_admin)]# cat cluster.json { "plugin_name": "cdh", "hadoop_version": "5.4.0", "node_groups": [ { "name": "datanode", "count": 3, "node_group_template_id": "{cdh-540-default-datanode}" }, { "name": "secondary-namenode", "count": 1, "node_group_template_id": "{cdh-540-default-secondary-namenode}" }, { "name": "namenode", "count": 1, "node_group_template_id": "{cdh-540-default-namenode}" } ], "name": "cdh-540-default-cluster", "neutron_management_network": "{neutron_management_network}", "cluster_configs": {} } Upstream sahara image been used was :- sahara-liberty-cdh-5.4.0-ubuntu-12.04.qcow2 Version-Release number of selected component (if applicable): [root@ServerCentOS7 ~]# rpm -qa \*sahara\* openstack-sahara-engine-3.0.0-5.cc218ddgit.el7.noarch python-saharaclient-0.11.1-1.el7.noarch openstack-sahara-api-3.0.0-5.cc218ddgit.el7.noarch openstack-sahara-common-3.0.0-5.cc218ddgit.el7.noarch [root@ServerCentOS7 ~]# rpm -qa \*heat\* python-heatclient-0.8.0-1.el7.noarch openstack-heat-common-5.0.0-1.el7.noarch openstack-heat-api-5.0.0-1.el7.noarch openstack-heat-engine-5.0.0-1.el7.noarch How reproducible: Steps to Reproduce: 1. Create NodeTemplates 2. Create ClusterTemplate = Namenode+SecondaryNamenode+3*Datanode 3. Attempt to launch Cluster Actual results: 152aebc96c1] File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/cloudera_utils.py", line 42, in wrapper 2016-02-09 11:12:40.370 1389 ERROR sahara.service.ops [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1] raise ex.HadoopProvisionError(c.resultMessage) 2016-02-09 11:12:40.370 1389 ERROR sahara.service.ops [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1] HadoopProvisionError: Failed to Provision Hadoop Cluster: Failed to format NameNode. 2016-02-09 11:12:40.370 1389 ERROR sahara.service.ops [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1] Error ID: 95a9f1a5-8dd8-487c-9f42-047f2efc2407 2016-02-09 11:12:40.370 1389 ERROR sahara.service.ops [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1] 2016-02-09 11:12:40.909 1389 INFO sahara.utils.cluster [req-2d1a2e7c-fe8d-42f6-8263-bd2e05e70f05 ] [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1] Cluster status has been changed. New status=Error Expected results: Cluster goes to Active State ,5 VMs forked, Caldera 5.0.4 Hadoop Cluster deployment is done, each of VMs provides expected functinality Additional info: To fix first error been popped up and proceed further following command has been run (as admin) :- # neutron quota-update --tenant_id $ADMIN_ID --port 64 --floatingip 64 --security-group 1000 --security-group-rule 10000 as was advised by Luigi Toscano