Hide Forgot
Description of problem: OSP11 -> OSP12 upgrade: upgrade fails when cinder-volume runs on host because cinder-manage db sync runs when galera is unavailable. With https://review.openstack.org/#/c/486121/ cinder-voume service is running on the host. At the time when running the upgrade_tasks for the cinder-volume puppet service[1] there is no database available because the galera pcs resource gets deleted so it can be moved to container[2] [1] https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/pacemaker/cinder-volume.yaml#L69-L71 [2] https://review.openstack.org/#/c/480202/ Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-7.0.0-0.20170718190543.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP11 2. Upgrade to OSP12 Actual results: Upgrade fails during major-upgrade-composable-steps-docker.yaml because cinder-manage db sync is unable to complete as there is no db available. Expected results: Upgrade moves forward. Additional info:
Just spent some time thinking about this and filed the upstream LP bug (attached to trackers). I just posted a review as a first step (also on trackers) which just moves the dbsync to step1, before we take away galera in step2. There is quite a bit of history here... besides the commits mcornea points at in comment #0, the cinder-volume dbsync was added in https://review.openstack.org/#/c/467280/ If we can't run this in step1 then the alternative is to explore doing it in puppet (but if you see the commit message of /#/c/467280/ seems this may not always be possible)
Another scenario that is affected by this bug is the ability to rerun major-upgrade-composable-steps in case of a failure. To give an example major-upgrade-composable-steps can fail while pulling the container images because the nodes were unable to reach the registry. At that point the galera pcs resource has already been deleted so if we try to rerun major-upgrade-composable-steps after fixing the connection to the registry then it will fail while running cinder-mange db sync for the 2nd time.
Update, as discussed on the initial proposal @ https://review.openstack.org/#/c/487815/ and the launchpad bug in trackers, this is slightly more complex than at first appears. We need to start the non containerized cinder-volume only after the upgrade_tasks and then docker/container deploy steps are executed. Posted https://review.openstack.org/493878 as a proposal today (added to trackers)
updated tracker to point to stable/pike which is merged so POST
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462