Bug 1472928

Summary: On pacemaker remote node stonith is set unconditionally
Product: Red Hat OpenStack Reporter: Marian Krcmarik <mkrcmari>
Component: puppet-tripleoAssignee: Michele Baldessari <michele>
Status: CLOSED ERRATA QA Contact: Udi Shkalim <ushkalim>
Severity: high Docs Contact:
Priority: high    
Version: 11.0 (Ocata)CC: aschultz, emacchi, fdinitto, jjoyce, jschluet, slinaber, tvignaud
Target Milestone: z2Keywords: Triaged, ZStream
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-6.5.0-6.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-13 21:43:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marian Krcmarik 2017-07-19 15:43:40 UTC
Description of problem:
We have the following code currently in the tripleo pacemaker_remote manifest:
class tripleo::profile::base::pacemaker_remote (
  $remote_authkey,
  $pcs_tries = hiera('pcs_tries', 20),
  $enable_fencing = hiera('enable_fencing', false),
  $step = hiera('step'),
) {
  class { '::pacemaker::remote':
    remote_authkey => $remote_authkey,
  }
  $enable_fencing_real = str2bool($enable_fencing) and $step >= 5

  class { '::pacemaker::stonith':
    disable => !$enable_fencing_real,
    tries => $pcs_tries,
  }
....

It makes no sense to enforce the stonith on the remote nodes and we should probably just enforce
it on $pacemaker_master anyway. While this works in general it creates extra CIB changes for nothing and we did see an issue when working with container HA (due to the remotes not being up already)

Version-Release number of selected component (if applicable):


How reproducible:
"Always" in recent days

Steps to Reproduce:
1. Deploy OSP11 on pacemaker remote nodes with composable roles of rabbitmq and galera.

Actual results:
Error: Could not find dependency Exec[wait-for-settle] for Pcmk_property[property--stonith-enabled] at /etc/puppet/modules/pacemaker/manifests/property.pp:78

Expected results:
Successful deployment

Additional info:

Comment 6 Udi Shkalim 2017-08-31 09:59:06 UTC
verified on: puppet-tripleo-6.5.0-8.el7ost.noarch

Deployment passed:
[root@controller-2 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Wed Aug 30 16:30:07 2017
Last change: Wed Aug 30 16:24:51 2017 by root via cibadmin on controller-0

6 nodes configured
34 resources configured

Online: [ controller-0 controller-1 controller-2 ]
RemoteOnline: [ messaging-0 messaging-1 messaging-2 ]

Full list of resources:

 messaging-0    (ocf::pacemaker:remote):        Started controller-0
 messaging-1    (ocf::pacemaker:remote):        Started controller-1
 messaging-2    (ocf::pacemaker:remote):        Started controller-2
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ messaging-0 messaging-1 messaging-2 ]
     Stopped: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ controller-0 controller-1 controller-2 ]
     Stopped: [ messaging-0 messaging-1 messaging-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ controller-2 ]
     Slaves: [ controller-0 controller-1 ]
     Stopped: [ messaging-0 messaging-1 messaging-2 ]
 ip-192.168.24.12       (ocf::heartbeat:IPaddr2):       Started controller-0
 ip-10.0.0.107  (ocf::heartbeat:IPaddr2):       Started controller-1
 ip-172.17.1.19 (ocf::heartbeat:IPaddr2):       Started controller-2
 ip-172.17.1.14 (ocf::heartbeat:IPaddr2):       Started controller-0
 ip-172.17.3.18 (ocf::heartbeat:IPaddr2):       Started controller-1
 ip-172.17.4.18 (ocf::heartbeat:IPaddr2):       Started controller-2
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller-0 controller-1 controller-2 ]
     Stopped: [ messaging-0 messaging-1 messaging-2 ]
 openstack-cinder-volume        (systemd:openstack-cinder-volume):      Started controller-0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Comment 8 errata-xmlrpc 2017-09-13 21:43:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2721