Hide Forgot
Description of problem: As detailed in https://bugzilla.redhat.com/show_bug.cgi?id=1481987, when we made a Ocata -> Pike upgrade, some of the ansible steps implemented for the rabbitmq service upgrade involved deleting the existing rabbitmq-clone [1] resource and creating a new containerized resource rabbitmq-bundle. While the resource disabling task succeeded, the resource deletion task [2] did not, and the journal logged the following error: Aug 15 22:48:24 messaging-0 ansible-pacemaker_resource[175956]: Invoked with check_mode=False state=delete resource=rabbitmq timeout=300 wait_for_resource=True Aug 15 22:48:25 messaging-0 cib[16463]: error: IDREF attribute rsc references an unknown ID "rabbitmq-clone" Aug 15 22:48:25 messaging-0 cib[16463]: error: IDREF attribute rsc references an unknown ID "rabbitmq-clone" Aug 15 22:48:25 messaging-0 cib[16463]: warning: Updated CIB does not validate against pacemaker-2.8 schema/dtd Aug 15 22:48:25 messaging-0 cib[16463]: warning: Local-only Change (client:cibadmin, call: 2): 0.78.0 (Update does not conform to the configured schema) Aug 15 22:48:25 messaging-0 cib[16463]: warning: Completed cib_delete operation for section //clone/primitive[@id="rabbitmq"]/..: Update does not conform to the configured schema (rc=-203, Such logs are most probably the symptoms that another pcs command was run on the cluster and updated the CIB in the middle of the "resource delete" pcs command implemented in the ansible task. In such condition, ansible-pacemaker should retry the requested command (e.g. delete) to make sure that it succeeds. Also, it should report the error appropriately to ansible if the retry logics yield a failure. [1] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/pacemaker/rabbitmq.yaml#L175 [2] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/pacemaker/rabbitmq.yaml#L182 Version-Release number of selected component (if applicable): How reproducible: Random (when pcs commands are competing to update the CIB) Steps to Reproduce: 1. Install Ocata 2. Upgrade to Pike Actual results: Some upgrade task may fail to execute properly if other pcs command updated the CIB. Expected results: The ansible pacemaker module should retry the requested action if it detected some concurrent update to the CIB prevented the action to finish. Additional info:
Cherry pick on stable/pike: https://review.openstack.org/#/c/504044/
*** Bug 1481987 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462