1503064 – Resource deletion may fail when resource state is modified concurrently

Bug 1503064 - Resource deletion may fail when resource state is modified concurrently

Summary: Resource deletion may fail when resource state is modified concurrently

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	ansible-pacemaker
Sub Component:
Version:	12.0 (Pike)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	beta
Target Release:	12.0 (Pike)
Assignee:	mathieu bultel
QA Contact:	Marius Cornea
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1475404
TreeView+	depends on / blocked

Reported:	2017-10-17 10:06 UTC by Damien Ciabrini
Modified:	2018-02-05 19:15 UTC (History)
CC List:	4 users (show)
Fixed In Version:	ansible-pacemaker-1.0.4-0.20171012091929.0e4d7c0.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-12-13 22:15:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gerrithub.io	382117	0	None	None	None	2017-10-17 10:09:06 UTC
Red Hat Product Errata	RHEA-2017:3462	0	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 12.0 Enhancement Advisory	2018-02-16 01:43:25 UTC

Description Damien Ciabrini 2017-10-17 10:06:29 UTC

Description of problem:

During major upgrades of overcloud from OSP11 -> OSP12, one of the upgrade tasks of HA services is to delete existing pacemaker resource (e.g. galera-master).

We have notice that in some cases (e.g. overcloud with services splitted across dedicated server ) the resource deletion task is triggered, it returns a successful rc, but the resource is not delete from the CIB.

From the logs we see that this happens when a concurrent operation is scheduled in pacemaker at the same time of the deletion, for instance, a resource cleanup.
This is because "pcs delete" is not an atomic action, so any concurrent action on the resource can impact how resource deletion will success. 

Version-Release number of selected component (if applicable):


How reproducible:
Randomly

Steps to Reproduce:
1. Deploy OSP11 on composable HA (split services on specific nodes)
2. Upgrade to OSP12
3.

Actual results:
OSP12 upgrade should succeed

Expected results:
Sometimes old OSP11 resources are not deleted and this breaks the creation of new containerized resources, so OSP12 upgrade fails.

Additional info:

Comment 1 Damien Ciabrini 2017-10-17 10:09:41 UTC

Fix [1] proposed and merged upstream

[1] https://review.gerrithub.io/#/c/382117/

Comment 6 errata-xmlrpc 2017-12-13 22:15:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Note You need to log in before you can comment on or make changes to this bug.