Bug 1305790 - Failure to launch Caldera 5.0.4 Hadoop Cluster via Sahara Wizards on RDO Liberty
Summary: Failure to launch Caldera 5.0.4 Hadoop Cluster via Sahara Wizards on RDO Lib...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: RDO
Classification: Community
Component: openstack-sahara
Version: Liberty
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: Liberty
Assignee: Telles Nobrega
QA Contact: Luigi Toscano
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-09 09:03 UTC by Boris Derzhavets
Modified: 2018-10-23 18:13 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-23 18:13:57 UTC


Attachments (Terms of Use)
sahara-engine.log (14.58 KB, application/x-gzip)
2016-02-09 09:03 UTC, Boris Derzhavets
no flags Details
sahara-egine.log ( debug=true,box rebooted ) (40.47 KB, application/x-gzip)
2016-02-09 11:43 UTC, Boris Derzhavets
no flags Details
sahara-api.log (debug=True,box rebooted) (40.77 KB, application/x-gzip)
2016-02-09 11:44 UTC, Boris Derzhavets
no flags Details

Description Boris Derzhavets 2016-02-09 09:03:18 UTC
Created attachment 1122358 [details]
sahara-engine.log

Description of problem:
5 node templates prepared with daemons set up per

[root@ServerCentOS7 version_5_4_0(keystone_admin)]# pwd
/usr/lib/python2.7/site-packages/sahara/plugins/default_templates/cdh/version_5_4_0
[root@ServerCentOS7 version_5_4_0(keystone_admin)]# cat namenode.json
{
    "plugin_name": "cdh",
    "hadoop_version": "5.4.0",
    "node_processes": [
        "HDFS_NAMENODE",
        "YARN_RESOURCEMANAGER",
        "HIVE_SERVER2",
        "HIVE_METASTORE",
        "CLOUDERA_MANAGER"
    ],
    "name": "cdh-540-default-namenode",
    "floating_ip_pool": "{floating_ip_pool}",
    "flavor_id": "{flavor_id}",
    "auto_security_group": "{auto_security_group}",
    "security_groups": "{security_groups}"
}
[root@ServerCentOS7 version_5_4_0(keystone_admin)]# cat datanode.json
{
    "plugin_name": "cdh",
    "hadoop_version": "5.4.0",
    "node_processes": [
        "HDFS_DATANODE",
        "YARN_NODEMANAGER"
    ],
    "name": "cdh-540-default-datanode",
    "floating_ip_pool": "{floating_ip_pool}",
    "flavor_id": "{flavor_id}",
    "auto_security_group": "{auto_security_group}",
    "security_groups": "{security_groups}"
}
[root@ServerCentOS7 version_5_4_0(keystone_admin)]# cat secondary-namenode.json
{
    "plugin_name": "cdh",
    "hadoop_version": "5.4.0",
    "node_processes": [
        "HDFS_SECONDARYNAMENODE",
        "OOZIE_SERVER",
        "YARN_JOBHISTORY",
        "SPARK_YARN_HISTORY_SERVER"
    ],
    "name": "cdh-540-default-secondary-namenode",
    "floating_ip_pool": "{floating_ip_pool}",
    "flavor_id": "{flavor_id}",
    "auto_security_group": "{auto_security_group}",
    "security_groups": "{security_groups}"
}
[root@ServerCentOS7 version_5_4_0(keystone_admin)]# cat cluster.json
{
    "plugin_name": "cdh",
    "hadoop_version": "5.4.0",
    "node_groups": [
        {
            "name": "datanode",
            "count": 3,
            "node_group_template_id": "{cdh-540-default-datanode}"
        },
        {
            "name": "secondary-namenode",
            "count": 1,
            "node_group_template_id": "{cdh-540-default-secondary-namenode}"
        },
        {
            "name": "namenode",
            "count": 1,
            "node_group_template_id": "{cdh-540-default-namenode}"
        }
    ],
    "name": "cdh-540-default-cluster",
    "neutron_management_network": "{neutron_management_network}",
    "cluster_configs": {}
}

Upstream sahara image been used was :-

sahara-liberty-cdh-5.4.0-ubuntu-12.04.qcow2

Version-Release number of selected component (if applicable):

[root@ServerCentOS7 ~]# rpm -qa \*sahara\*
openstack-sahara-engine-3.0.0-5.cc218ddgit.el7.noarch
python-saharaclient-0.11.1-1.el7.noarch
openstack-sahara-api-3.0.0-5.cc218ddgit.el7.noarch
openstack-sahara-common-3.0.0-5.cc218ddgit.el7.noarch

[root@ServerCentOS7 ~]# rpm -qa \*heat\*

python-heatclient-0.8.0-1.el7.noarch
openstack-heat-common-5.0.0-1.el7.noarch
openstack-heat-api-5.0.0-1.el7.noarch
openstack-heat-engine-5.0.0-1.el7.noarch

How reproducible:


Steps to Reproduce:
1. Create NodeTemplates 
2. Create ClusterTemplate = Namenode+SecondaryNamenode+3*Datanode
3. Attempt to launch Cluster 

Actual results:

152aebc96c1]   File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/cloudera_utils.py", line 42, in wrapper
2016-02-09 11:12:40.370 1389 ERROR sahara.service.ops [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1]     raise ex.HadoopProvisionError(c.resultMessage)
2016-02-09 11:12:40.370 1389 ERROR sahara.service.ops [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1] HadoopProvisionError: Failed to Provision Hadoop Cluster: Failed to format NameNode.
2016-02-09 11:12:40.370 1389 ERROR sahara.service.ops [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1] Error ID: 95a9f1a5-8dd8-487c-9f42-047f2efc2407
2016-02-09 11:12:40.370 1389 ERROR sahara.service.ops [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1]
2016-02-09 11:12:40.909 1389 INFO sahara.utils.cluster [req-2d1a2e7c-fe8d-42f6-8263-bd2e05e70f05 ] [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1] Cluster status has been changed. New status=Error


Expected results:

Cluster goes to Active State ,5 VMs forked, Caldera 5.0.4 Hadoop Cluster deployment is done, each of VMs provides expected functinality 

Additional info:

To fix first error been popped up and proceed further 
following command has been run (as admin) :-

# neutron quota-update --tenant_id $ADMIN_ID --port 64 --floatingip 64 --security-group 1000 --security-group-rule 10000

as was advised by Luigi Toscano

Comment 1 Luigi Toscano 2016-02-09 10:29:18 UTC
Can you please enable 
debug=True 
in Sahara logging, restart sahara-api and sahara-engine, retry and attach the logs?
Please attach only the logs which are relevant to the last run, and pay attention to sensitive informations in the logs (IPs, passwords, etc) before submitting them (I'm not sure that there are in debug mode, but just in case please check the logs).

Comment 2 Boris Derzhavets 2016-02-09 11:43:31 UTC
Created attachment 1122392 [details]
sahara-egine.log ( debug=true,box rebooted )

Done as requested

Comment 3 Boris Derzhavets 2016-02-09 11:44:52 UTC
Created attachment 1122393 [details]
sahara-api.log (debug=True,box rebooted)

Done as requested

Comment 4 Boris Derzhavets 2016-02-09 11:48:00 UTC
Old logs have been removed, new ones created via 
# touch -f sahara-egine.log
# touch -f sahara-api.log
# chown sahara. *.log

Comment 5 Boris Derzhavets 2016-02-09 20:25:20 UTC
Got same error in Starting phase for upstream image
sahara-liberty-cdh-5.4.0-centos-6.6.qcow2

2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [req-ff8c07db-a925-47b3-b552-015e1f730648 ] [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] Error during operating on cluster (reason: Failed to Provision Hadoop Cluster: Failed to format NameNode.
Error ID: b7f7d944-0a01-4edf-867d-eb6674231945)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] Traceback (most recent call last):
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/service/ops.py", line 164, in wrapper
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     f(cluster_id, *args, **kwds)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/service/ops.py", line 276, in _provision_cluster
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     plugin.start_cluster(cluster)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/plugin.py", line 51, in start_cluster
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     cluster.hadoop_version).start_cluster(cluster)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/v5_4_0/versionhandler.py", line 86, in start_cluster
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     dp.start_cluster(cluster)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/v5_4_0/deploy.py", line 165, in start_cluster
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     CU.first_run(cluster)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/utils/cluster_progress_ops.py", line 139, in handler
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     add_fail_event(instance, e)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     six.reraise(self.type_, self.value, self.tb)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/utils/cluster_progress_ops.py", line 136, in handler
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     value = func(*args, **kwargs)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/cloudera_utils.py", line 42, in wrapper
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     raise ex.HadoopProvisionError(c.resultMessage)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] HadoopProvisionError: Failed to Provision Hadoop Cluster: Failed to format NameNode.
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] Error ID: b7f7d944-0a01-4edf-867d-eb6674231945
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]

Comment 6 Elise Gafford 2016-07-13 16:58:58 UTC
Hi Boris,

In https://bugzilla.redhat.com/show_bug.cgi?id=1305419 you reported:

Quotas update via
https://github.com/openstack/sahara-ci-config/blob/master/config/devstack/local.sh#L63
seems to be important for both HDP 2.0.6 and CDH 5.0.4 Hadoop clusters launching

Does this mean that you were successful with your 5.0.4 cluster and this bug can be closed, or are you still having trouble?

Comment 7 Boris Derzhavets 2016-07-13 17:08:46 UTC
I had success with HDP 2.0.6 and failure CDH 5.0.4.
Caldera - CDH was root cause why client cancelled project.
Close https://bugzilla.redhat.com/show_bug.cgi?id=1305419.
Current one was a core issue.
Would you like you may close both, it is not my concern since 02/2016.
Just keeping you aware about real issue on RDO.

Comment 8 Elise Gafford 2016-07-13 17:37:42 UTC
Understood; thanks Boris! We'll leave it open for further investigation.

Comment 9 Christopher Brown 2017-06-18 12:04:03 UTC
Was there any further investigation here?

Comment 10 Telles Nobrega 2018-10-23 18:13:57 UTC
Since there has been no activity on this bug for a while I'm closing this bug.


Note You need to log in before you can comment on or make changes to this bug.