Bug 1571549

Summary: Updating pcs resource bundle names during OSP13 upgrade fails if resource doesn't exist
Product: Red Hat OpenStack Reporter: Marios Andreou <mandreou>
Component: openstack-tripleo-heat-templatesAssignee: Marios Andreou <mandreou>
Status: CLOSED ERRATA QA Contact: Yurii Prokulevych <yprokule>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: ccamacho, dciabrin, mbultel, mburns, scohen
Target Milestone: betaKeywords: Triaged
Target Release: 13.0 (Queens)Flags: scohen: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.0.2-5.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:53:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marios Andreou 2018-04-25 06:20:46 UTC
Description of problem:
The fix for BZ 1564449 at https://review.openstack.org/#/q/Ic87a66753b104b9f15db70fdccbd66d88cef94df 
allows us to update the name for pcs resource bundle resources if this is changed as part of the upgrade configuration. If the upgrade is interrupted the target pacemaker resource bundle may not even have been created yet. This groups stop/update/start the bundle resource and adds a new conditional to check if the cluster resource exists before trying to update the container image being used. Otherwise a re-run of the upgrade tasks may fail if the cluster resource doesn't exist. 

How reproducible:
Everytime

Steps to Reproduce:
1. Deploy OSP12
2. Before starting the upgrade take a bundle resource down (e.g. pcs or even remove or stop the cluster alltogether). Note resources could already be down if this is a rerun of the upgrade. 
3. Upgrade to OSP13 and change name of container image for a bundle resource (e.g. rabbit). 


Actual results:

 u'TASK [Disable the cinder_volume cluster resource before container upgrade] *****',
 u'FAILED - RETRYING: Disable the cinder_volume cluster resource before container upgrade (5 retries left).']
[u'FAILED - RETRYING: Disable the cinder_volume cluster resource before container upgrade (4 retries left).',
 u'FAILED - RETRYING: Disable the cinder_volume cluster resource before container upgrade (3 retries left).',
 u'FAILED - RETRYING: Disable the cinder_volume cluster resource before container upgrade (2 retries left).',
 u'FAILED - RETRYING: Disable the cinder_volume cluster resource before container upgrade (1 retries left).']
[u'fatal: [192.168.24.14]: FAILED! => {"attempts": 5, "changed": false, "error": "Error: resource/clone/master/group/bundle \'openstack-cinder-volume\' does not exist\\n", "msg": "Failed, to set the resource openstack-cinder-volume to the state disable", "output": "", "rc": 1}',
 u'',
 u'PLAY RECAP *********************************************************************',
 u'192.168.24.14              : ok=124  changed=76   unreachable=0    failed=1   ',

more debug info: some selected tasks I could pick out of the logs:

Expected results:

Shouldn't try to update a cluster resource if it isn't currently defined/active.

Comment 3 Marios Andreou 2018-04-27 13:28:20 UTC
Information for build openstack-tripleo-heat-templates-8.0.2-5.el7ost https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=679206

Comment 9 Yurii Prokulevych 2018-05-31 12:11:39 UTC
Verified with openstack-tripleo-heat-templates-8.0.2-28.el7ost.noarch

Resource rabbitmq-bundle was disabled during one run and completely removed during another one.

Comment 11 errata-xmlrpc 2018-06-27 13:53:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086