Bug 1571765

Summary: Pending notify actions are recorded as complete [OpenStack-13.0]
Product: Red Hat OpenStack Reporter: Andrew Beekhof <abeekhof>
Component: rhosp-director-imagesAssignee: Jon Schlueter <jschluet>
Status: CLOSED ERRATA QA Contact: Udi Shkalim <ushkalim>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: abeekhof, aherr, amit.banerjee, cluster-maint, cluster-qe, dbecker, jamsmith, jschluet, kgaillot, lmarsh, mburns, mnovacek, morazi, pkomarov, sclewis, toneata
Target Milestone: betaKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: rhosp-director-images-13.0-20180425.2.el7ost Doc Type: If docs needed, set a value
Doc Text:
When record-pending is true (which is the default as of 7.5), Pacemaker sees two notifications that notify actions completed: one when the action is initiated and another when it completes. This confuses the cluster and triggers a re-computation of the cluster state, which results in more notifications and more re-computations. The issue was resolved by configuring Pacemaker to only record completed clone-notify actions as successful, not those that are pending, which enables the cluster to complete the process of recovering or starting services.
Story Points: ---
Clone Of: 1570618 Environment:
Last Closed: 2018-06-27 13:24:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1570618    
Bug Blocks:    

Comment 1 Andrew Beekhof 2018-04-25 11:49:36 UTC
Critical bugfix from RHEL-HA that prevents control plane recovery from stalling.

Comment 3 Jon Schlueter 2018-04-25 12:57:06 UTC
Confirmed this needs to make it into overcloud-full images for controller nodes

Comment 4 Lon Hohberger 2018-04-25 13:08:07 UTC
According to the parent bug, this is resolved by pacemaker-1.1.18-11.el7_5.2

Comment 10 Lon Hohberger 2018-04-26 11:59:49 UTC
pacemaker-1.1.18-11.el7_5.2 is in new images.

Comment 13 pkomarov 2018-04-30 10:56:24 UTC
Verified , 

Tested on core_puddle_version: 2018-04-26.1

verification of the pacemaker version on overcloud images: 

(undercloud) [stack@undercloud-0 ~]$ ansible controller -b -mshell -a'rpm -qa |grep pacemaker-1.1.18-11.el7_5.2'

controller-2 | SUCCESS | rc=0 >>
pacemaker-1.1.18-11.el7_5.2.x86_64

controller-1 | SUCCESS | rc=0 >>
pacemaker-1.1.18-11.el7_5.2.x86_64

controller-0 | SUCCESS | rc=0 >>
pacemaker-1.1.18-11.el7_5.2.x86_64

Functionality of the bug fix was verified on BZ1570130 and BZ1570618

Comment 16 errata-xmlrpc 2018-06-27 13:24:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2083