Bug 1305790 - Failure to launch Caldera 5.0.4 Hadoop Cluster via Sahara Wizards on RDO Liberty [NEEDINFO]
Failure to launch Caldera 5.0.4 Hadoop Cluster via Sahara Wizards on RDO Lib...
Status: NEW
Product: RDO
Classification: Community
Component: openstack-sahara (Show other bugs)
Liberty
x86_64 Linux
unspecified Severity high
: ---
: Liberty
Assigned To: Elise Gafford
Luigi Toscano
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-09 04:03 EST by Boris Derzhavets
Modified: 2017-06-18 08:04 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
chris.brown: needinfo? (egafford)


Attachments (Terms of Use)
sahara-engine.log (14.58 KB, application/x-gzip)
2016-02-09 04:03 EST, Boris Derzhavets
no flags Details
sahara-egine.log ( debug=true,box rebooted ) (40.47 KB, application/x-gzip)
2016-02-09 06:43 EST, Boris Derzhavets
no flags Details
sahara-api.log (debug=True,box rebooted) (40.77 KB, application/x-gzip)
2016-02-09 06:44 EST, Boris Derzhavets
no flags Details

  None (edit)
Description Boris Derzhavets 2016-02-09 04:03:18 EST
Created attachment 1122358 [details]
sahara-engine.log

Description of problem:
5 node templates prepared with daemons set up per

[root@ServerCentOS7 version_5_4_0(keystone_admin)]# pwd
/usr/lib/python2.7/site-packages/sahara/plugins/default_templates/cdh/version_5_4_0
[root@ServerCentOS7 version_5_4_0(keystone_admin)]# cat namenode.json
{
    "plugin_name": "cdh",
    "hadoop_version": "5.4.0",
    "node_processes": [
        "HDFS_NAMENODE",
        "YARN_RESOURCEMANAGER",
        "HIVE_SERVER2",
        "HIVE_METASTORE",
        "CLOUDERA_MANAGER"
    ],
    "name": "cdh-540-default-namenode",
    "floating_ip_pool": "{floating_ip_pool}",
    "flavor_id": "{flavor_id}",
    "auto_security_group": "{auto_security_group}",
    "security_groups": "{security_groups}"
}
[root@ServerCentOS7 version_5_4_0(keystone_admin)]# cat datanode.json
{
    "plugin_name": "cdh",
    "hadoop_version": "5.4.0",
    "node_processes": [
        "HDFS_DATANODE",
        "YARN_NODEMANAGER"
    ],
    "name": "cdh-540-default-datanode",
    "floating_ip_pool": "{floating_ip_pool}",
    "flavor_id": "{flavor_id}",
    "auto_security_group": "{auto_security_group}",
    "security_groups": "{security_groups}"
}
[root@ServerCentOS7 version_5_4_0(keystone_admin)]# cat secondary-namenode.json
{
    "plugin_name": "cdh",
    "hadoop_version": "5.4.0",
    "node_processes": [
        "HDFS_SECONDARYNAMENODE",
        "OOZIE_SERVER",
        "YARN_JOBHISTORY",
        "SPARK_YARN_HISTORY_SERVER"
    ],
    "name": "cdh-540-default-secondary-namenode",
    "floating_ip_pool": "{floating_ip_pool}",
    "flavor_id": "{flavor_id}",
    "auto_security_group": "{auto_security_group}",
    "security_groups": "{security_groups}"
}
[root@ServerCentOS7 version_5_4_0(keystone_admin)]# cat cluster.json
{
    "plugin_name": "cdh",
    "hadoop_version": "5.4.0",
    "node_groups": [
        {
            "name": "datanode",
            "count": 3,
            "node_group_template_id": "{cdh-540-default-datanode}"
        },
        {
            "name": "secondary-namenode",
            "count": 1,
            "node_group_template_id": "{cdh-540-default-secondary-namenode}"
        },
        {
            "name": "namenode",
            "count": 1,
            "node_group_template_id": "{cdh-540-default-namenode}"
        }
    ],
    "name": "cdh-540-default-cluster",
    "neutron_management_network": "{neutron_management_network}",
    "cluster_configs": {}
}

Upstream sahara image been used was :-

sahara-liberty-cdh-5.4.0-ubuntu-12.04.qcow2

Version-Release number of selected component (if applicable):

[root@ServerCentOS7 ~]# rpm -qa \*sahara\*
openstack-sahara-engine-3.0.0-5.cc218ddgit.el7.noarch
python-saharaclient-0.11.1-1.el7.noarch
openstack-sahara-api-3.0.0-5.cc218ddgit.el7.noarch
openstack-sahara-common-3.0.0-5.cc218ddgit.el7.noarch

[root@ServerCentOS7 ~]# rpm -qa \*heat\*

python-heatclient-0.8.0-1.el7.noarch
openstack-heat-common-5.0.0-1.el7.noarch
openstack-heat-api-5.0.0-1.el7.noarch
openstack-heat-engine-5.0.0-1.el7.noarch

How reproducible:


Steps to Reproduce:
1. Create NodeTemplates 
2. Create ClusterTemplate = Namenode+SecondaryNamenode+3*Datanode
3. Attempt to launch Cluster 

Actual results:

152aebc96c1]   File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/cloudera_utils.py", line 42, in wrapper
2016-02-09 11:12:40.370 1389 ERROR sahara.service.ops [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1]     raise ex.HadoopProvisionError(c.resultMessage)
2016-02-09 11:12:40.370 1389 ERROR sahara.service.ops [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1] HadoopProvisionError: Failed to Provision Hadoop Cluster: Failed to format NameNode.
2016-02-09 11:12:40.370 1389 ERROR sahara.service.ops [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1] Error ID: 95a9f1a5-8dd8-487c-9f42-047f2efc2407
2016-02-09 11:12:40.370 1389 ERROR sahara.service.ops [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1]
2016-02-09 11:12:40.909 1389 INFO sahara.utils.cluster [req-2d1a2e7c-fe8d-42f6-8263-bd2e05e70f05 ] [instance: none, cluster: 2eb0d606-9d5f-4f64-99cd-7152aebc96c1] Cluster status has been changed. New status=Error


Expected results:

Cluster goes to Active State ,5 VMs forked, Caldera 5.0.4 Hadoop Cluster deployment is done, each of VMs provides expected functinality 

Additional info:

To fix first error been popped up and proceed further 
following command has been run (as admin) :-

# neutron quota-update --tenant_id $ADMIN_ID --port 64 --floatingip 64 --security-group 1000 --security-group-rule 10000

as was advised by Luigi Toscano
Comment 1 Luigi Toscano 2016-02-09 05:29:18 EST
Can you please enable 
debug=True 
in Sahara logging, restart sahara-api and sahara-engine, retry and attach the logs?
Please attach only the logs which are relevant to the last run, and pay attention to sensitive informations in the logs (IPs, passwords, etc) before submitting them (I'm not sure that there are in debug mode, but just in case please check the logs).
Comment 2 Boris Derzhavets 2016-02-09 06:43 EST
Created attachment 1122392 [details]
sahara-egine.log ( debug=true,box rebooted )

Done as requested
Comment 3 Boris Derzhavets 2016-02-09 06:44 EST
Created attachment 1122393 [details]
sahara-api.log (debug=True,box rebooted)

Done as requested
Comment 4 Boris Derzhavets 2016-02-09 06:48:00 EST
Old logs have been removed, new ones created via 
# touch -f sahara-egine.log
# touch -f sahara-api.log
# chown sahara. *.log
Comment 5 Boris Derzhavets 2016-02-09 15:25:20 EST
Got same error in Starting phase for upstream image
sahara-liberty-cdh-5.4.0-centos-6.6.qcow2

2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [req-ff8c07db-a925-47b3-b552-015e1f730648 ] [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] Error during operating on cluster (reason: Failed to Provision Hadoop Cluster: Failed to format NameNode.
Error ID: b7f7d944-0a01-4edf-867d-eb6674231945)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] Traceback (most recent call last):
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/service/ops.py", line 164, in wrapper
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     f(cluster_id, *args, **kwds)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/service/ops.py", line 276, in _provision_cluster
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     plugin.start_cluster(cluster)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/plugin.py", line 51, in start_cluster
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     cluster.hadoop_version).start_cluster(cluster)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/v5_4_0/versionhandler.py", line 86, in start_cluster
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     dp.start_cluster(cluster)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/v5_4_0/deploy.py", line 165, in start_cluster
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     CU.first_run(cluster)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/utils/cluster_progress_ops.py", line 139, in handler
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     add_fail_event(instance, e)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     six.reraise(self.type_, self.value, self.tb)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/utils/cluster_progress_ops.py", line 136, in handler
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     value = func(*args, **kwargs)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]   File "/usr/lib/python2.7/site-packages/sahara/plugins/cdh/cloudera_utils.py", line 42, in wrapper
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]     raise ex.HadoopProvisionError(c.resultMessage)
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] HadoopProvisionError: Failed to Provision Hadoop Cluster: Failed to format NameNode.
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8] Error ID: b7f7d944-0a01-4edf-867d-eb6674231945
2016-02-09 23:11:26.189 1369 ERROR sahara.service.ops [instance: none, cluster: 95c9141e-7994-4ba2-a12a-18028d6f38b8]
Comment 6 Elise Gafford 2016-07-13 12:58:58 EDT
Hi Boris,

In https://bugzilla.redhat.com/show_bug.cgi?id=1305419 you reported:

Quotas update via
https://github.com/openstack/sahara-ci-config/blob/master/config/devstack/local.sh#L63
seems to be important for both HDP 2.0.6 and CDH 5.0.4 Hadoop clusters launching

Does this mean that you were successful with your 5.0.4 cluster and this bug can be closed, or are you still having trouble?
Comment 7 Boris Derzhavets 2016-07-13 13:08:46 EDT
I had success with HDP 2.0.6 and failure CDH 5.0.4.
Caldera - CDH was root cause why client cancelled project.
Close https://bugzilla.redhat.com/show_bug.cgi?id=1305419.
Current one was a core issue.
Would you like you may close both, it is not my concern since 02/2016.
Just keeping you aware about real issue on RDO.
Comment 8 Elise Gafford 2016-07-13 13:37:42 EDT
Understood; thanks Boris! We'll leave it open for further investigation.
Comment 9 Christopher Brown 2017-06-18 08:04:03 EDT
Was there any further investigation here?

Note You need to log in before you can comment on or make changes to this bug.