Bug 1475665 - OSP11 -> OSP12 upgrade: upgrade fails when cinder-volume runs on host because cinder-manage db sync runs when galera is unavailable
Summary: OSP11 -> OSP12 upgrade: upgrade fails when cinder-volume runs on host because...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: beta
: 12.0 (Pike)
Assignee: Marios Andreou
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks: 1399762
TreeView+ depends on / blocked
 
Reported: 2017-07-27 06:46 UTC by Marius Cornea
Modified: 2023-02-22 23:02 UTC (History)
10 users (show)

Fixed In Version: openstack-tripleo-heat-templates-7.0.1-0.20170927205938.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 21:45:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1706951 0 None None None 2017-07-27 11:35:36 UTC
OpenStack gerrit 505603 0 None None None 2017-09-22 11:21:56 UTC
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Marius Cornea 2017-07-27 06:46:31 UTC
Description of problem:
OSP11 -> OSP12 upgrade: upgrade fails when cinder-volume runs on host because cinder-manage db sync runs when galera is unavailable.

With https://review.openstack.org/#/c/486121/ cinder-voume service is running on the host. At the time when running the upgrade_tasks for the cinder-volume puppet service[1] there is no database available because the galera pcs resource gets deleted so it can be moved to container[2]

[1] https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/pacemaker/cinder-volume.yaml#L69-L71
[2] https://review.openstack.org/#/c/480202/

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-7.0.0-0.20170718190543.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP11
2. Upgrade to OSP12

Actual results:
Upgrade fails during major-upgrade-composable-steps-docker.yaml because cinder-manage db sync is unable to complete as there is no db available.

Expected results:
Upgrade moves forward.

Additional info:

Comment 1 Marios Andreou 2017-07-27 11:42:52 UTC
Just spent some time thinking about this and filed the upstream LP bug (attached to trackers). I just posted a review as a first step (also on trackers) which just moves the dbsync to step1, before we take away galera in step2.

There is quite a bit of history here... besides the commits mcornea points at in comment #0, the cinder-volume dbsync was added in https://review.openstack.org/#/c/467280/ 

If we can't run this in step1 then the alternative is to explore doing it in puppet (but if you see the commit message of /#/c/467280/ seems this may not always be possible)

Comment 2 Marius Cornea 2017-07-28 08:26:34 UTC
Another scenario that is affected by this bug is the ability to rerun major-upgrade-composable-steps in case of a failure. To give an example major-upgrade-composable-steps can fail while pulling the container images because the nodes were unable to reach the registry. At that point the galera pcs resource has already been deleted so if we try to rerun major-upgrade-composable-steps after fixing the connection to the registry then it will fail while running cinder-mange db sync for the 2nd time.

Comment 3 Marios Andreou 2017-08-15 15:42:03 UTC
Update, as discussed on the initial proposal @ https://review.openstack.org/#/c/487815/ and the launchpad bug in trackers, this is slightly more complex than at first appears. We need to start the non containerized cinder-volume only after the upgrade_tasks and then docker/container deploy steps are executed.

Posted https://review.openstack.org/493878 as a proposal today (added to trackers)

Comment 7 Marios Andreou 2017-09-22 11:21:56 UTC
updated tracker to point to stable/pike which is merged so POST

Comment 13 errata-xmlrpc 2017-12-13 21:45:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.