Bug 1554122
Summary: | FFU: post upgrade attaching cinder volume to instance fails with: ServiceTooOld: One of cinder-volume services is too old to accept attachment_update request. Required RPC API version is 3.9. Are you running mixed versions of cinder-volumes? | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> | |
Component: | openstack-tripleo-heat-templates | Assignee: | Lukas Bezdicka <lbezdick> | |
Status: | CLOSED ERRATA | QA Contact: | Marius Cornea <mcornea> | |
Severity: | urgent | Docs Contact: | ||
Priority: | high | |||
Version: | 13.0 (Queens) | CC: | dbecker, geguileo, jfrancoa, jschluet, lbezdick, mbracho, mburns, morazi, rhel-osp-director-maint | |
Target Milestone: | beta | Keywords: | Triaged | |
Target Release: | 13.0 (Queens) | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | openstack-tripleo-heat-templates-8.0.2-0.20180416194362.29a5ad5.el7ost | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1557331 (view as bug list) | Environment: | ||
Last Closed: | 2018-06-27 13:35:18 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1557331 | |||
Bug Blocks: |
Description
Marius Cornea
2018-03-11 16:22:16 UTC
This will be is_bootstrap|bool in cinder FFU tasks. This is a problem with the start/restart of the services that comes from Cinder's rolling upgrade mechanism. Looking at the upstream documentation for upgrades [1] it looks like its not 100% accurate, and following it will lead to this error. The issue is basically that you need to restart the API and Scheduler services twice, be it during a normal upgrade or a rolling upgrade for the services to get the right RPC and Versioned Objects pinning. So you upgrade, start APIs, then Schedulers, the Volume service, then restart Schedulers and APIs (order not important), and if you have more than 1 volume service you'll have to restart all of them but the last one that was restarted. If we don't want to have to go through all this trouble of restarting the services twice, we can use the `cinder-manage service remove` command to remove all the services that are present in the DB and then we can restart all services in any order. We can even run a SQL command that deletes all entries in the Cinder Services table: "delete from services;" [1] https://docs.openstack.org/cinder/pike/upgrade.html I have been thinking a bit more about this upgrade issue and I think the removal of services from the table (using SQL or cinder-manage command) is not the best solution because doing this we could unintentionally re-enable a service that was disable during the upgrade, and we would create problems if a volume service is in a failover state. And while the other solution, doing a second restart of the Cinder services, will work as expected (as long as we have purged any service that will no longer be working, and as long as the volume service doesn't have any problem on start) it is bad in terms of user experience and a big headache for the FFU flow. So I have created an upstream Cinder bug and proposed a solution that I believe will be easy to integrate in the FFU flow. The idea of this patch is that in the FFU flow we will just have to pass a new parameter to the db sync command: "cinder-manage db sync --bump-versions" and with that all services will run as expected when we start them after the upgrade and we'll no longer see the ServiceTooOld issue. Will clone this BZ to Cinder to keep track of the backport to OSP13. https://review.openstack.org/#/q/I5676132be477695838c59a0d59c62e09e335a8f0 Workaround applied and fix tracked in different bug. Now that we have the "--bump-version" option in db sync (rhbz #1557331) we should replace the workaround to use it instead, as it has the added benefit of not losing the status of the services (disabled, failed over, etc.). This item has been properly Triaged and planned for the OSP13 release, and is being tagged for tracking. For details, see https://url.corp.redhat.com/1851efd Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086 |