Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1503064 - Resource deletion may fail when resource state is modified concurrently
Resource deletion may fail when resource state is modified concurrently
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: ansible-pacemaker (Show other bugs)
12.0 (Pike)
Unspecified Unspecified
high Severity urgent
: beta
: 12.0 (Pike)
Assigned To: mathieu bultel
Marius Cornea
: Triaged
Depends On:
Blocks: 1475404
  Show dependency treegraph
 
Reported: 2017-10-17 06:06 EDT by Damien Ciabrini
Modified: 2018-02-05 14:15 EST (History)
4 users (show)

See Also:
Fixed In Version: ansible-pacemaker-1.0.4-0.20171012091929.0e4d7c0.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-13 17:15:46 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Gerrithub.io 382117 None None None 2017-10-17 06:09 EDT
Red Hat Product Errata RHEA-2017:3462 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-15 20:43:25 EST

  None (edit)
Description Damien Ciabrini 2017-10-17 06:06:29 EDT
Description of problem:

During major upgrades of overcloud from OSP11 -> OSP12, one of the upgrade tasks of HA services is to delete existing pacemaker resource (e.g. galera-master).

We have notice that in some cases (e.g. overcloud with services splitted across dedicated server ) the resource deletion task is triggered, it returns a successful rc, but the resource is not delete from the CIB.

From the logs we see that this happens when a concurrent operation is scheduled in pacemaker at the same time of the deletion, for instance, a resource cleanup.
This is because "pcs delete" is not an atomic action, so any concurrent action on the resource can impact how resource deletion will success. 

Version-Release number of selected component (if applicable):


How reproducible:
Randomly

Steps to Reproduce:
1. Deploy OSP11 on composable HA (split services on specific nodes)
2. Upgrade to OSP12
3.

Actual results:
OSP12 upgrade should succeed

Expected results:
Sometimes old OSP11 resources are not deleted and this breaks the creation of new containerized resources, so OSP12 upgrade fails.

Additional info:
Comment 1 Damien Ciabrini 2017-10-17 06:09:41 EDT
Fix [1] proposed and merged upstream

[1] https://review.gerrithub.io/#/c/382117/
Comment 6 errata-xmlrpc 2017-12-13 17:15:46 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Note You need to log in before you can comment on or make changes to this bug.