Bug 1483920 - Deployment of native fencing occasionally fails
Summary: Deployment of native fencing occasionally fails
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z3
: 11.0 (Ocata)
Assignee: Chris Jones
QA Contact: pkomarov
URL:
Whiteboard:
Depends On:
Blocks: 1444621
TreeView+ depends on / blocked
 
Reported: 2017-08-22 09:24 UTC by Tomas Jamrisko
Modified: 2017-10-31 17:37 UTC (History)
12 users (show)

Fixed In Version: puppet-tripleo-6.5.1-1.el7ost
Doc Type: Bug Fix
Doc Text:
In the release version of OSP11, there was a bug that caused the generation of overcloud fencing configuration to occasionally fail. This update includes improvements to the generator so that overcloud fencing configuration generation is now reliable.
Clone Of:
Environment:
Last Closed: 2017-10-31 17:37:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1712605 0 None None None 2017-08-23 15:56:23 UTC
OpenStack gerrit 497732 0 None None None 2017-08-25 05:32:10 UTC
Red Hat Product Errata RHBA-2017:3098 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 11.0 director Bug Fix Advisory 2017-10-31 21:33:28 UTC

Description Tomas Jamrisko 2017-08-22 09:24:42 UTC
Description of problem:

Trying to enable native fencing can fail because of:
Error: Could not find resource 'Class[Pacemaker::Stonith]' for relationship from 'Class[Tripleo::Fencing]' on node controller-2.localdomain

if you need

How reproducible:
~ 50%

Steps to Reproduce:
1. Deploy overcloud
2. Try enabling native fencing

Comment 1 Michele Baldessari 2017-08-25 05:32:10 UTC
Linking the stable/ocata review only as the master one has merged

Comment 4 pkomarov 2017-10-03 12:04:23 UTC
Verified , controller fencing using overcloud deploy was used : 

verified initial pacemaker setup :

[root@controller-2 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.16-12.el7_4.2-94ff4df) - partition with quorum
Last updated: Tue Oct  3 10:50:28 2017
Last change: Tue Oct  3 09:54:07 2017 by root via cibadmin on controller-2

3 nodes configured
19 resources configured

Online: [ controller-0 controller-1 controller-2 ]

Full list of resources:

 Master/Slave Set: galera-master [galera]
     Masters: [ controller-0 controller-1 controller-2 ]
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ controller-2 ]
     Slaves: [ controller-0 controller-1 ]
 ip-192.168.24.8        (ocf::heartbeat:IPaddr2):       Started controller-0
 ip-10.35.180.18        (ocf::heartbeat:IPaddr2):       Started controller-1
 ip-172.17.0.18 (ocf::heartbeat:IPaddr2):       Started controller-2
 ip-172.17.0.14 (ocf::heartbeat:IPaddr2):       Started controller-0
 ip-172.18.0.16 (ocf::heartbeat:IPaddr2):       Started controller-1
 ip-172.19.0.19 (ocf::heartbeat:IPaddr2):       Started controller-2
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller-0 controller-1 controller-2 ]
 openstack-cinder-volume        (systemd:openstack-cinder-volume):      Started controller-0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Verify STONITH is disabled:

[root@controller-2 ~]# sudo pcs property show
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: tripleo_cluster
 dc-version: 1.1.16-12.el7_4.2-94ff4df
 have-watchdog: false
 last-lrm-refresh: 1507024323
 maintenance-mode: false
 redis_REPL_INFO: controller-2
 stonith-enabled: false
Node Attributes:
 controller-0: cinder-volume-role=true galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@controller-0
 controller-1: cinder-volume-role=true galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@controller-1
 controller-2: cinder-volume-role=true galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@controller-2

Generate the fencing.yaml file:

openstack overcloud generate fencing --ipmi-lanplus --ipmi-level administrator --output fencing.yaml instackenv.json

update the overcloud with the fencing configuration : 
penstack overcloud deploy \
--templates /usr/share/openstack-tripleo-heat-templates \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/deployment_files/network/network-environment.yaml \
-e /home/stack/deployment_files/hostnames.yml \
-e /home/stack/deployment_files/nodes_data.yaml \
-e /home/stack/deployment_files/debug.yaml \
-e /home/stack/deployment_files/docker-images.yaml \
-e /home/stack/deployment_files/workaround_params.yaml \
-e /home/stack/fencing.yaml \
--log-file overcloud_deployment_95.log

...
OUTPUT : 
 Stack overcloud UPDATE_COMPLETE

Check that new stonith resources were created : 

[root@controller-2 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.16-12.el7_4.2-94ff4df) - partition with quorum
Last updated: Tue Oct  3 11:55:05 2017
Last change: Tue Oct  3 11:35:15 2017 by root via cibadmin on controller-0

3 nodes configured
22 resources configured

Online: [ controller-0 controller-1 controller-2 ]

Full list of resources:

 Master/Slave Set: galera-master [galera]
     Masters: [ controller-0 controller-1 controller-2 ]
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ controller-2 ]
     Slaves: [ controller-0 controller-1 ]
 ip-192.168.24.8        (ocf::heartbeat:IPaddr2):       Started controller-0
 ip-10.35.180.18        (ocf::heartbeat:IPaddr2):       Started controller-1
 ip-172.17.0.18 (ocf::heartbeat:IPaddr2):       Started controller-2
 ip-172.17.0.14 (ocf::heartbeat:IPaddr2):       Started controller-0
 ip-172.18.0.16 (ocf::heartbeat:IPaddr2):       Started controller-1
 ip-172.19.0.19 (ocf::heartbeat:IPaddr2):       Started controller-2
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller-0 controller-1 controller-2 ]
 openstack-cinder-volume        (systemd:openstack-cinder-volume):      Started controller-0
 stonith-fence_ipmilan-441ea173385f     (stonith:fence_ipmilan):        Started controller-2
 stonith-fence_ipmilan-441ea1733d43     (stonith:fence_ipmilan):        Started controller-1
 stonith-fence_ipmilan-441ea1733991     (stonith:fence_ipmilan):        Started controller-1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Comment 7 errata-xmlrpc 2017-10-31 17:37:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3098


Note You need to log in before you can comment on or make changes to this bug.