Bug 1231184 - Overcloud post deployment fails with Pacemaker enabled - nodes active, CREATE_FAILED (stonith)
Summary: Overcloud post deployment fails with Pacemaker enabled - nodes active, CREATE...
Keywords:
Status: CLOSED DUPLICATE of bug 1232269
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ga
: Director
Assignee: Giulio Fidente
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-12 11:04 UTC by Mike Burns
Modified: 2023-02-22 23:02 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1230966
Environment:
Last Closed: 2015-06-17 08:12:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Mike Burns 2015-06-12 11:04:24 UTC
+++ This bug was initially created as a clone of Bug #1230966 +++

Description of problem:

virt env is installed with bits from the latest poodle where Pacemaker is used by default for the overcloud.

<snipping redis error, see cloned bug 1230966 for that detail>

Version-Release number of selected component (if applicable):

[stack@instack ~]$ rpm -qa  | grep openstack
openstack-nova-console-2015.1.0-10.el7ost.noarch
openstack-neutron-2015.1.0-2.el7ost.noarch
openstack-ironic-conductor-2015.1.0-4.el7ost.noarch
openstack-ceilometer-alarm-2015.1.0-2.el7ost.noarch
openstack-swift-account-2.3.0-1.el7ost.noarch
openstack-tuskar-ui-0.3.0-2.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-4.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.0-3.el7ost.noarch
openstack-ceilometer-notification-2015.1.0-2.el7ost.noarch
openstack-neutron-openvswitch-2015.1.0-2.el7ost.noarch
openstack-nova-api-2015.1.0-10.el7ost.noarch
openstack-tripleo-image-elements-0.9.6-1.el7ost.noarch
python-openstackclient-1.0.3-2.el7ost.noarch
openstack-ironic-discoverd-1.1.0-3.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-2.el7ost.noarch
openstack-swift-object-2.3.0-1.el7ost.noarch
openstack-tripleo-0.0.6-0.1.git812abe0.el7ost.noarch
openstack-utils-2014.2-1.el7ost.noarch
openstack-nova-common-2015.1.0-10.el7ost.noarch
openstack-heat-common-2015.1.0-3.el7ost.noarch
openstack-tuskar-0.4.18-2.el7ost.noarch
python-django-openstack-auth-1.2.0-2.el7ost.noarch
openstack-dashboard-theme-2015.1.0-9.el7ost.noarch
openstack-tuskar-ui-extras-0.0.3-3.el7ost.noarch
openstack-tempest-kilo-20150507.2.el7ost.noarch
openstack-swift-2.3.0-1.el7ost.noarch
openstack-neutron-ml2-2015.1.0-2.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-10.el7ost.noarch
openstack-keystone-2015.1.0-1.el7ost.noarch
openstack-swift-plugin-swift3-1.7-3.el7ost.noarch
openstack-tripleo-common-0.0.1.dev6-0.git49b57eb.el7ost.noarch
openstack-neutron-common-2015.1.0-2.el7ost.noarch
openstack-heat-engine-2015.1.0-3.el7ost.noarch
openstack-ceilometer-common-2015.1.0-2.el7ost.noarch
openstack-heat-api-cfn-2015.1.0-3.el7ost.noarch
openstack-ceilometer-api-2015.1.0-2.el7ost.noarch
openstack-ironic-api-2015.1.0-4.el7ost.noarch
openstack-swift-proxy-2.3.0-1.el7ost.noarch
openstack-ceilometer-collector-2015.1.0-2.el7ost.noarch
openstack-ironic-common-2015.1.0-4.el7ost.noarch
openstack-selinux-0.6.31-1.el7ost.noarch
openstack-nova-compute-2015.1.0-10.el7ost.noarch
openstack-nova-conductor-2015.1.0-10.el7ost.noarch
openstack-swift-container-2.3.0-1.el7ost.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-glance-2015.1.0-6.el7ost.noarch
openstack-heat-api-2015.1.0-3.el7ost.noarch
openstack-ceilometer-central-2015.1.0-2.el7ost.noarch
openstack-puppet-modules-2015.1.4-1.el7ost.noarch
openstack-nova-scheduler-2015.1.0-10.el7ost.noarch
openstack-nova-cert-2015.1.0-10.el7ost.noarch
openstack-dashboard-2015.1.0-9.el7ost.noarch



How reproducible:
Always with latest poodle  confirmed with two installs

Steps to Reproduce:
1. Install and set up virt env with bits from latest poodle (06/11)
2. Run instack-deploy-overcloud --tuskar
3. See failures/ERRORS/CREATE_FAILED in  heat stack-show overcloud

Actual results:
Overcloud deploy is CREATE_FAILED

Expected results:
Should be CREATE_COMPLETE

Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-06-11 17:49:14 EDT ---

Since this issue was entered in bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Ronelle Landy on 2015-06-11 17:51:52 EDT ---

See virt job results:

https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RDO/view/rdo-manager/job/rdo_manager-periodic-rhos-7_director-poodle-rhel-7.1-nodes_virt-virt-instack-neutron-gre-rabbitmq-tempest-rpm-minimal/91/consoleFull

Note that the job before that - with previous poodle passed

Confirmed failure on baremetal: 
https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/refactored-rdo_manager-url_trigger-none-rhos-7_director-poodle-rhel-7.1-baremetal-dell_pe_r630-minimal-neutron-gre/1/console

--- Additional comment from Ronelle Landy on 2015-06-11 18:37:07 EDT ---

Three controller deploy showed some other issues:

[heat-admin@ov-ik3glkjldcc-0-bgdxz5dw33jc-controller-b6mgf742iqfj ~]$ sudo grep -i error /var/log/messages
Jun 11 17:51:39 localhost kdumpctl: cat: write error: Broken pipe
Jun 11 18:23:58 localhost pengine[17793]: error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
Jun 11 18:23:58 localhost pengine[17793]: error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
Jun 11 18:23:58 localhost pengine[17793]: error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Jun 11 18:23:58 localhost pengine[17793]: notice: process_pe_message: Configuration ERRORs found during PE processing.  Please run "crm_verify -L" to identify issues.
[heat-admin@ov-ik3glkjldcc-0-bgdxz5dw33jc-controller-b6mgf742iqfj ~]$ 
[heat-admin@ov-ik3glkjldcc-0-bgdxz5dw33jc-controller-b6mgf742iqfj ~]$ crm_verify -L
Live CIB query failed: Transport endpoint is not connected


overcloud was still CREATE_IN PROGRESS .. assuming this will timeout shortly.

--- Additional comment from Mike Burns on 2015-06-12 07:01:32 EDT ---

Comment 4 appears to be a distinct issue from the redis vip issue, so splitting that to a separate bug

Comment 3 Marios Andreou 2015-06-12 11:09:04 UTC
I don't this this is an error or connected to the overcloud create_fail - I get this on most runs it is rectified eventually, (i.e. after some time crm_verify -L is clean)

Jun 12 06:31:58 ov-lpfdo57qqbw-0-v5g4sd5x3xmk-controller-lz3nlrds2mtg.novalocal pengine[19805]: error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
Jun 12 06:31:58 ov-lpfdo57qqbw-0-v5g4sd5x3xmk-controller-lz3nlrds2mtg.novalocal pengine[19805]: error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
Jun 12 06:31:58 ov-lpfdo57qqbw-0-v5g4sd5x3xmk-controller-lz3nlrds2mtg.novalocal pengine[19805]: error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Jun 12 06:31:58 ov-lpfdo57qqbw-0-v5g4sd5x3xmk-controller-lz3nlrds2mtg.novalocal pengine[19805]: notice: process_pe_message: Configuration ERRORs found during PE processing.  Please run "crm_verify -L" to identify issues.
Jun 12 06:31:58 ov-lpfdo57qqbw-0-v5g4sd5x3xmk-controller-lz3nlrds2mtg.novalocal pengine[19805]: error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
Jun 12 06:31:58 ov-lpfdo57qqbw-0-v5g4sd5x3xmk-controller-lz3nlrds2mtg.novalocal pengine[19805]: error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
Jun 12 06:31:58 ov-lpfdo57qqbw-0-v5g4sd5x3xmk-controller-lz3nlrds2mtg.novalocal pengine[19805]: error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Jun 12 06:31:58 ov-lpfdo57qqbw-0-v5g4sd5x3xmk-controller-lz3nlrds2mtg.novalocal pengine[19805]: notice: process_pe_message: Configuration ERRORs found during PE processing.  Please run "crm_verify -L" to identify issues.

Comment 5 Giulio Fidente 2015-06-17 08:12:25 UTC
Due to hostname resolution failures, the cluster never moved from the initialization state into configuration steps (during which stonith would have been disabled due to missing fencing config). I think this can safely be considered clone duplicate of bz #1232269

*** This bug has been marked as a duplicate of bug 1232269 ***


Note You need to log in before you can comment on or make changes to this bug.