Bug 1475665 - OSP11 -> OSP12 upgrade: upgrade fails when cinder-volume runs on host because cinder-manage db sync runs when galera is unavailable
OSP11 -> OSP12 upgrade: upgrade fails when cinder-volume runs on host because...
Status: VERIFIED
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
12.0 (Pike)
Unspecified Unspecified
high Severity urgent
: beta
: 12.0 (Pike)
Assigned To: Marios Andreou
Marius Cornea
: Triaged
Depends On:
Blocks: 1399762
  Show dependency treegraph
 
Reported: 2017-07-27 02:46 EDT by Marius Cornea
Modified: 2017-11-08 13:36 EST (History)
11 users (show)

See Also:
Fixed In Version: openstack-tripleo-heat-templates-7.0.1-0.20170927205938.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1706951 None None None 2017-07-27 07:35 EDT
OpenStack gerrit 505603 None None None 2017-09-22 07:21 EDT

  None (edit)
Description Marius Cornea 2017-07-27 02:46:31 EDT
Description of problem:
OSP11 -> OSP12 upgrade: upgrade fails when cinder-volume runs on host because cinder-manage db sync runs when galera is unavailable.

With https://review.openstack.org/#/c/486121/ cinder-voume service is running on the host. At the time when running the upgrade_tasks for the cinder-volume puppet service[1] there is no database available because the galera pcs resource gets deleted so it can be moved to container[2]

[1] https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/pacemaker/cinder-volume.yaml#L69-L71
[2] https://review.openstack.org/#/c/480202/

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-7.0.0-0.20170718190543.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP11
2. Upgrade to OSP12

Actual results:
Upgrade fails during major-upgrade-composable-steps-docker.yaml because cinder-manage db sync is unable to complete as there is no db available.

Expected results:
Upgrade moves forward.

Additional info:
Comment 1 Marios Andreou 2017-07-27 07:42:52 EDT
Just spent some time thinking about this and filed the upstream LP bug (attached to trackers). I just posted a review as a first step (also on trackers) which just moves the dbsync to step1, before we take away galera in step2.

There is quite a bit of history here... besides the commits mcornea points at in comment #0, the cinder-volume dbsync was added in https://review.openstack.org/#/c/467280/ 

If we can't run this in step1 then the alternative is to explore doing it in puppet (but if you see the commit message of /#/c/467280/ seems this may not always be possible)
Comment 2 Marius Cornea 2017-07-28 04:26:34 EDT
Another scenario that is affected by this bug is the ability to rerun major-upgrade-composable-steps in case of a failure. To give an example major-upgrade-composable-steps can fail while pulling the container images because the nodes were unable to reach the registry. At that point the galera pcs resource has already been deleted so if we try to rerun major-upgrade-composable-steps after fixing the connection to the registry then it will fail while running cinder-mange db sync for the 2nd time.
Comment 3 Marios Andreou 2017-08-15 11:42:03 EDT
Update, as discussed on the initial proposal @ https://review.openstack.org/#/c/487815/ and the launchpad bug in trackers, this is slightly more complex than at first appears. We need to start the non containerized cinder-volume only after the upgrade_tasks and then docker/container deploy steps are executed.

Posted https://review.openstack.org/493878 as a proposal today (added to trackers)
Comment 7 Marios Andreou 2017-09-22 07:21:56 EDT
updated tracker to point to stable/pike which is merged so POST

Note You need to log in before you can comment on or make changes to this bug.