Bug 1503064

Summary: Resource deletion may fail when resource state is modified concurrently
Product: Red Hat OpenStack Reporter: Damien Ciabrini <dciabrin>
Component: ansible-pacemakerAssignee: mathieu bultel <mbultel>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: urgent Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: jschluet, mcornea, michele, tvignaud
Target Milestone: betaKeywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ansible-pacemaker-1.0.4-0.20171012091929.0e4d7c0.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-13 22:15:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1475404    

Description Damien Ciabrini 2017-10-17 10:06:29 UTC
Description of problem:

During major upgrades of overcloud from OSP11 -> OSP12, one of the upgrade tasks of HA services is to delete existing pacemaker resource (e.g. galera-master).

We have notice that in some cases (e.g. overcloud with services splitted across dedicated server ) the resource deletion task is triggered, it returns a successful rc, but the resource is not delete from the CIB.

From the logs we see that this happens when a concurrent operation is scheduled in pacemaker at the same time of the deletion, for instance, a resource cleanup.
This is because "pcs delete" is not an atomic action, so any concurrent action on the resource can impact how resource deletion will success. 

Version-Release number of selected component (if applicable):


How reproducible:
Randomly

Steps to Reproduce:
1. Deploy OSP11 on composable HA (split services on specific nodes)
2. Upgrade to OSP12
3.

Actual results:
OSP12 upgrade should succeed

Expected results:
Sometimes old OSP11 resources are not deleted and this breaks the creation of new containerized resources, so OSP12 upgrade fails.

Additional info:

Comment 1 Damien Ciabrini 2017-10-17 10:09:41 UTC
Fix [1] proposed and merged upstream

[1] https://review.gerrithub.io/#/c/382117/

Comment 6 errata-xmlrpc 2017-12-13 22:15:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462