Bug 1483920

Summary: Deployment of native fencing occasionally fails
Product: Red Hat OpenStack Reporter: Tomas Jamrisko <tjamrisk>
Component: puppet-tripleoAssignee: Chris Jones <chjones>
Status: CLOSED ERRATA QA Contact: pkomarov
Severity: high Docs Contact:
Priority: high    
Version: 11.0 (Ocata)CC: aschultz, ccollett, chjones, fdinitto, jjoyce, jschluet, mburns, michele, rhel-osp-director-maint, slinaber, tvignaud, ushkalim
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-6.5.1-1.el7ost Doc Type: Bug Fix
Doc Text:
In the release version of OSP11, there was a bug that caused the generation of overcloud fencing configuration to occasionally fail. This update includes improvements to the generator so that overcloud fencing configuration generation is now reliable.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-31 17:37:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1444621    

Description Tomas Jamrisko 2017-08-22 09:24:42 UTC
Description of problem:

Trying to enable native fencing can fail because of:
Error: Could not find resource 'Class[Pacemaker::Stonith]' for relationship from 'Class[Tripleo::Fencing]' on node controller-2.localdomain

if you need

How reproducible:
~ 50%

Steps to Reproduce:
1. Deploy overcloud
2. Try enabling native fencing

Comment 1 Michele Baldessari 2017-08-25 05:32:10 UTC
Linking the stable/ocata review only as the master one has merged

Comment 4 pkomarov 2017-10-03 12:04:23 UTC
Verified , controller fencing using overcloud deploy was used : 

verified initial pacemaker setup :

[root@controller-2 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.16-12.el7_4.2-94ff4df) - partition with quorum
Last updated: Tue Oct  3 10:50:28 2017
Last change: Tue Oct  3 09:54:07 2017 by root via cibadmin on controller-2

3 nodes configured
19 resources configured

Online: [ controller-0 controller-1 controller-2 ]

Full list of resources:

 Master/Slave Set: galera-master [galera]
     Masters: [ controller-0 controller-1 controller-2 ]
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ controller-2 ]
     Slaves: [ controller-0 controller-1 ]
 ip-192.168.24.8        (ocf::heartbeat:IPaddr2):       Started controller-0
 ip-10.35.180.18        (ocf::heartbeat:IPaddr2):       Started controller-1
 ip-172.17.0.18 (ocf::heartbeat:IPaddr2):       Started controller-2
 ip-172.17.0.14 (ocf::heartbeat:IPaddr2):       Started controller-0
 ip-172.18.0.16 (ocf::heartbeat:IPaddr2):       Started controller-1
 ip-172.19.0.19 (ocf::heartbeat:IPaddr2):       Started controller-2
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller-0 controller-1 controller-2 ]
 openstack-cinder-volume        (systemd:openstack-cinder-volume):      Started controller-0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Verify STONITH is disabled:

[root@controller-2 ~]# sudo pcs property show
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: tripleo_cluster
 dc-version: 1.1.16-12.el7_4.2-94ff4df
 have-watchdog: false
 last-lrm-refresh: 1507024323
 maintenance-mode: false
 redis_REPL_INFO: controller-2
 stonith-enabled: false
Node Attributes:
 controller-0: cinder-volume-role=true galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@controller-0
 controller-1: cinder-volume-role=true galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@controller-1
 controller-2: cinder-volume-role=true galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@controller-2

Generate the fencing.yaml file:

openstack overcloud generate fencing --ipmi-lanplus --ipmi-level administrator --output fencing.yaml instackenv.json

update the overcloud with the fencing configuration : 
penstack overcloud deploy \
--templates /usr/share/openstack-tripleo-heat-templates \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/deployment_files/network/network-environment.yaml \
-e /home/stack/deployment_files/hostnames.yml \
-e /home/stack/deployment_files/nodes_data.yaml \
-e /home/stack/deployment_files/debug.yaml \
-e /home/stack/deployment_files/docker-images.yaml \
-e /home/stack/deployment_files/workaround_params.yaml \
-e /home/stack/fencing.yaml \
--log-file overcloud_deployment_95.log

...
OUTPUT : 
 Stack overcloud UPDATE_COMPLETE

Check that new stonith resources were created : 

[root@controller-2 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.16-12.el7_4.2-94ff4df) - partition with quorum
Last updated: Tue Oct  3 11:55:05 2017
Last change: Tue Oct  3 11:35:15 2017 by root via cibadmin on controller-0

3 nodes configured
22 resources configured

Online: [ controller-0 controller-1 controller-2 ]

Full list of resources:

 Master/Slave Set: galera-master [galera]
     Masters: [ controller-0 controller-1 controller-2 ]
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ controller-2 ]
     Slaves: [ controller-0 controller-1 ]
 ip-192.168.24.8        (ocf::heartbeat:IPaddr2):       Started controller-0
 ip-10.35.180.18        (ocf::heartbeat:IPaddr2):       Started controller-1
 ip-172.17.0.18 (ocf::heartbeat:IPaddr2):       Started controller-2
 ip-172.17.0.14 (ocf::heartbeat:IPaddr2):       Started controller-0
 ip-172.18.0.16 (ocf::heartbeat:IPaddr2):       Started controller-1
 ip-172.19.0.19 (ocf::heartbeat:IPaddr2):       Started controller-2
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller-0 controller-1 controller-2 ]
 openstack-cinder-volume        (systemd:openstack-cinder-volume):      Started controller-0
 stonith-fence_ipmilan-441ea173385f     (stonith:fence_ipmilan):        Started controller-2
 stonith-fence_ipmilan-441ea1733d43     (stonith:fence_ipmilan):        Started controller-1
 stonith-fence_ipmilan-441ea1733991     (stonith:fence_ipmilan):        Started controller-1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Comment 7 errata-xmlrpc 2017-10-31 17:37:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3098