Bug 1230966

Summary: Overcloud post deployment fails with Pacemaker enabled - nodes active, CREATE_FAILED
Product: Red Hat OpenStack Reporter: Ronelle Landy <rlandy>
Component: rhosp-directorAssignee: Dan Sneddon <dsneddon>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: dmacpher, dsneddon, gfidente, jdobies, mburns, ohochman, rhel-osp-director-maint
Target Milestone: gaKeywords: Automation, Triaged
Target Release: Director   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-0.8.6-6.el7ost Doc Type: Known Issue
Doc Text:
Redis needs to use a separate VIP. When deploying with network isolation, the director automatically place Redis VIP on the Internal API VIP by default. Operators do have the ability to move Redis to another network using the ServiceNetMap parameter.
Story Points: ---
Clone Of:
: 1231184 (view as bug list) Environment:
Last Closed: 2015-08-05 13:53:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ronelle Landy 2015-06-11 21:49:12 UTC
Description of problem:

virt env is installed with bits from the latest poodle where Pacemaker is used by default for the overcloud.
instack-deploy-overcloud -- tuskar fails (CREATE_FAILED) with the following errors:

ERROR heat.engine.resources.openstack.heat.software_deployment [-] Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1", "warnings": []}

| logical_resource_id    | ControllerNodesPostDeployment                                                                                                                                                                                                                                                                           |
17:09:19 | physical_resource_id   | e108159e-b556-4d4c-be41-41009e41087c                                                                                                                                                                                                                                                                    |
17:09:20 | required_by            | BlockStorageNodesPostDeployment                                                                                                                                                                                                                                                                         |
17:09:20 |                        | CephStorageNodesPostDeployment                                                                                                                                                                                                                                                                          |
17:09:20 | resource_name          | ControllerNodesPostDeployment                                                                                                                                                                                                                                                                           |
17:09:20 | resource_status        | CREATE_FAILED                                                                                                                                                                                                                                                                                           |
17:09:20 | resource_status_reason | ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: Error: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1"" |
17:09:20 | resource_type          | OS::TripleO::ControllerPostDeployment                                                                                                                                                                                                                       


Error: Could not find data item redis_vip in any Hiera data file and no default supplied at /var/lib/heat-config/heat-config-puppet/7d0a69c1-4821-430c-8547-fe4ba0a928d6.pp:257


Version-Release number of selected component (if applicable):

[stack@instack ~]$ rpm -qa  | grep openstack
openstack-nova-console-2015.1.0-10.el7ost.noarch
openstack-neutron-2015.1.0-2.el7ost.noarch
openstack-ironic-conductor-2015.1.0-4.el7ost.noarch
openstack-ceilometer-alarm-2015.1.0-2.el7ost.noarch
openstack-swift-account-2.3.0-1.el7ost.noarch
openstack-tuskar-ui-0.3.0-2.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-4.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.0-3.el7ost.noarch
openstack-ceilometer-notification-2015.1.0-2.el7ost.noarch
openstack-neutron-openvswitch-2015.1.0-2.el7ost.noarch
openstack-nova-api-2015.1.0-10.el7ost.noarch
openstack-tripleo-image-elements-0.9.6-1.el7ost.noarch
python-openstackclient-1.0.3-2.el7ost.noarch
openstack-ironic-discoverd-1.1.0-3.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-2.el7ost.noarch
openstack-swift-object-2.3.0-1.el7ost.noarch
openstack-tripleo-0.0.6-0.1.git812abe0.el7ost.noarch
openstack-utils-2014.2-1.el7ost.noarch
openstack-nova-common-2015.1.0-10.el7ost.noarch
openstack-heat-common-2015.1.0-3.el7ost.noarch
openstack-tuskar-0.4.18-2.el7ost.noarch
python-django-openstack-auth-1.2.0-2.el7ost.noarch
openstack-dashboard-theme-2015.1.0-9.el7ost.noarch
openstack-tuskar-ui-extras-0.0.3-3.el7ost.noarch
openstack-tempest-kilo-20150507.2.el7ost.noarch
openstack-swift-2.3.0-1.el7ost.noarch
openstack-neutron-ml2-2015.1.0-2.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-10.el7ost.noarch
openstack-keystone-2015.1.0-1.el7ost.noarch
openstack-swift-plugin-swift3-1.7-3.el7ost.noarch
openstack-tripleo-common-0.0.1.dev6-0.git49b57eb.el7ost.noarch
openstack-neutron-common-2015.1.0-2.el7ost.noarch
openstack-heat-engine-2015.1.0-3.el7ost.noarch
openstack-ceilometer-common-2015.1.0-2.el7ost.noarch
openstack-heat-api-cfn-2015.1.0-3.el7ost.noarch
openstack-ceilometer-api-2015.1.0-2.el7ost.noarch
openstack-ironic-api-2015.1.0-4.el7ost.noarch
openstack-swift-proxy-2.3.0-1.el7ost.noarch
openstack-ceilometer-collector-2015.1.0-2.el7ost.noarch
openstack-ironic-common-2015.1.0-4.el7ost.noarch
openstack-selinux-0.6.31-1.el7ost.noarch
openstack-nova-compute-2015.1.0-10.el7ost.noarch
openstack-nova-conductor-2015.1.0-10.el7ost.noarch
openstack-swift-container-2.3.0-1.el7ost.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-glance-2015.1.0-6.el7ost.noarch
openstack-heat-api-2015.1.0-3.el7ost.noarch
openstack-ceilometer-central-2015.1.0-2.el7ost.noarch
openstack-puppet-modules-2015.1.4-1.el7ost.noarch
openstack-nova-scheduler-2015.1.0-10.el7ost.noarch
openstack-nova-cert-2015.1.0-10.el7ost.noarch
openstack-dashboard-2015.1.0-9.el7ost.noarch



How reproducible:
Always with latest poodle  confirmed with two installs

Steps to Reproduce:
1. Install and set up virt env with bits from latest poodle (06/11)
2. Run instack-deploy-overcloud --tuskar
3. See failures/ERRORS/CREATE_FAILED in  heat stack-show overcloud

Actual results:
Overcloud deploy is CREATE_FAILED

Expected results:
Should be CREATE_COMPLETE

Additional info:

Comment 4 Ronelle Landy 2015-06-11 22:37:07 UTC
Three controller deploy showed some other issues:

[heat-admin@ov-ik3glkjldcc-0-bgdxz5dw33jc-controller-b6mgf742iqfj ~]$ sudo grep -i error /var/log/messages
Jun 11 17:51:39 localhost kdumpctl: cat: write error: Broken pipe
Jun 11 18:23:58 localhost pengine[17793]: error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
Jun 11 18:23:58 localhost pengine[17793]: error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
Jun 11 18:23:58 localhost pengine[17793]: error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Jun 11 18:23:58 localhost pengine[17793]: notice: process_pe_message: Configuration ERRORs found during PE processing.  Please run "crm_verify -L" to identify issues.
[heat-admin@ov-ik3glkjldcc-0-bgdxz5dw33jc-controller-b6mgf742iqfj ~]$ 
[heat-admin@ov-ik3glkjldcc-0-bgdxz5dw33jc-controller-b6mgf742iqfj ~]$ crm_verify -L
Live CIB query failed: Transport endpoint is not connected


overcloud was still CREATE_IN PROGRESS .. assuming this will timeout shortly.

Comment 5 Mike Burns 2015-06-12 11:01:32 UTC
Comment 4 appears to be a distinct issue from the redis vip issue, so splitting that to a separate bug

Comment 7 Giulio Fidente 2015-06-12 15:58:21 UTC
Should be fixed by: https://review.openstack.org/#/c/191026/

Comment 8 Dan Sneddon 2015-06-16 21:10:58 UTC
This should be fixed on the most recent puddle/poodles by this fix which was merged downstream: https://review.openstack.org/#/c/191026/

Comment 10 Alexander Chuzhoy 2015-06-19 15:47:48 UTC
Verified:

Environment:
instack-undercloud-2.1.2-1.el7ost.noarch


The command to deploy overcloud is now:
openstack overcloud deploy --plan-uuid [UUID]

The deployment completes successfully.

Comment 12 errata-xmlrpc 2015-08-05 13:53:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549