Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1482116 - Ansible-pacemaker lacks retry logics to deal with CIB update concurrency
Ansible-pacemaker lacks retry logics to deal with CIB update concurrency
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: ansible-pacemaker (Show other bugs)
12.0 (Pike)
Unspecified Unspecified
urgent Severity high
: beta
: 12.0 (Pike)
Assigned To: mathieu bultel
Marius Cornea
: Triaged
: 1481987 (view as bug list)
Depends On:
Blocks: 1481987
  Show dependency treegraph
 
Reported: 2017-08-16 09:41 EDT by Damien Ciabrini
Modified: 2018-02-05 14:12 EST (History)
3 users (show)

See Also:
Fixed In Version: ansible-pacemaker-1.0.3-0.20170929170820.1279294.el7ost openstack-tripleo-heat-templates-7.0.1-0.20170927205938.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-13 16:52:15 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 498499 None None None 2017-08-28 11:28 EDT
OpenStack gerrit 504044 None None None 2017-10-10 10:16 EDT
Gerrithub.io 375982 None None None 2017-08-28 11:18 EDT
Red Hat Product Errata RHEA-2017:3462 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-15 20:43:25 EST

  None (edit)
Description Damien Ciabrini 2017-08-16 09:41:32 EDT
Description of problem:
As detailed in https://bugzilla.redhat.com/show_bug.cgi?id=1481987, when we made a Ocata -> Pike upgrade, some of the ansible steps implemented for the rabbitmq service upgrade involved deleting the existing rabbitmq-clone [1] resource and creating a new containerized resource rabbitmq-bundle.

While the resource disabling task succeeded, the resource deletion task [2] did not, and the journal logged the following error:

Aug 15 22:48:24 messaging-0 ansible-pacemaker_resource[175956]: Invoked with check_mode=False state=delete resource=rabbitmq timeout=300 wait_for_resource=True
Aug 15 22:48:25 messaging-0 cib[16463]:    error: IDREF attribute rsc references an unknown ID "rabbitmq-clone"
Aug 15 22:48:25 messaging-0 cib[16463]:    error: IDREF attribute rsc references an unknown ID "rabbitmq-clone"
Aug 15 22:48:25 messaging-0 cib[16463]:  warning: Updated CIB does not validate against pacemaker-2.8 schema/dtd
Aug 15 22:48:25 messaging-0 cib[16463]:  warning: Local-only Change (client:cibadmin, call: 2): 0.78.0 (Update does not conform to the configured schema)
Aug 15 22:48:25 messaging-0 cib[16463]:  warning: Completed cib_delete operation for section //clone/primitive[@id="rabbitmq"]/..: Update does not conform to the configured schema (rc=-203, 

Such logs are most probably the symptoms that another pcs command was run on the cluster and updated the CIB in the middle of the "resource delete" pcs command implemented in the ansible task.

In such condition, ansible-pacemaker should retry the requested command (e.g. delete) to make sure that it succeeds. Also, it should report the error appropriately to ansible if the retry logics yield a failure.   

[1] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/pacemaker/rabbitmq.yaml#L175
[2] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/pacemaker/rabbitmq.yaml#L182


Version-Release number of selected component (if applicable):


How reproducible:
Random (when pcs commands are competing to update the CIB)

Steps to Reproduce:
1. Install Ocata
2. Upgrade to Pike

Actual results:
Some upgrade task may fail to execute properly if other pcs command updated the CIB.


Expected results:
The ansible pacemaker module should retry the requested action if it detected some concurrent update to the CIB prevented the action to finish.

Additional info:
Comment 1 Marius Cornea 2017-09-14 08:29:18 EDT
Cherry pick on stable/pike: https://review.openstack.org/#/c/504044/
Comment 4 Chris Jones 2017-10-25 05:43:41 EDT
*** Bug 1481987 has been marked as a duplicate of this bug. ***
Comment 7 errata-xmlrpc 2017-12-13 16:52:15 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Note You need to log in before you can comment on or make changes to this bug.