Bug 1503064 - Resource deletion may fail when resource state is modified concurrently
Summary: Resource deletion may fail when resource state is modified concurrently
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: ansible-pacemaker
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: beta
: 12.0 (Pike)
Assignee: mathieu bultel
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks: 1475404
TreeView+ depends on / blocked
 
Reported: 2017-10-17 10:06 UTC by Damien Ciabrini
Modified: 2018-02-05 19:15 UTC (History)
4 users (show)

Fixed In Version: ansible-pacemaker-1.0.4-0.20171012091929.0e4d7c0.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 22:15:46 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gerrithub.io 382117 0 None None None 2017-10-17 10:09:06 UTC
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Damien Ciabrini 2017-10-17 10:06:29 UTC
Description of problem:

During major upgrades of overcloud from OSP11 -> OSP12, one of the upgrade tasks of HA services is to delete existing pacemaker resource (e.g. galera-master).

We have notice that in some cases (e.g. overcloud with services splitted across dedicated server ) the resource deletion task is triggered, it returns a successful rc, but the resource is not delete from the CIB.

From the logs we see that this happens when a concurrent operation is scheduled in pacemaker at the same time of the deletion, for instance, a resource cleanup.
This is because "pcs delete" is not an atomic action, so any concurrent action on the resource can impact how resource deletion will success. 

Version-Release number of selected component (if applicable):


How reproducible:
Randomly

Steps to Reproduce:
1. Deploy OSP11 on composable HA (split services on specific nodes)
2. Upgrade to OSP12
3.

Actual results:
OSP12 upgrade should succeed

Expected results:
Sometimes old OSP11 resources are not deleted and this breaks the creation of new containerized resources, so OSP12 upgrade fails.

Additional info:

Comment 1 Damien Ciabrini 2017-10-17 10:09:41 UTC
Fix [1] proposed and merged upstream

[1] https://review.gerrithub.io/#/c/382117/

Comment 6 errata-xmlrpc 2017-12-13 22:15:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.