Bug 1578901 - [UPGRADES] TempestFailure: One of cinder-scheduler services is too old to accept create_snapshot request
Summary: [UPGRADES] TempestFailure: One of cinder-scheduler services is too old to acc...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 13.0 (Queens)
Assignee: Alan Bishop
QA Contact: Tzach Shefi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-16 14:59 UTC by Yurii Prokulevych
Modified: 2018-06-27 13:57 UTC (History)
14 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.0.2-29.el7ost
Doc Type: Bug Fix
Doc Text:
After upgrading to a new release, Block Storage services (cinder) were stuck using the old RPC versions from the prior release. Because of this, all cinder API requests requiring the latest RPC versions failed. When upgrading to a new release, all cinder RPC versions are updated to match the latest release.
Clone Of:
Environment:
Last Closed: 2018-06-27 13:56:23 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Launchpad 1774262 None None None 2018-05-30 19:43:32 UTC
OpenStack gerrit 571291 None MERGED Reset Cinder RPC versions after upgrade 2020-06-11 21:37:47 UTC
Red Hat Product Errata RHEA-2018:2086 None None None 2018-06-27 13:57:37 UTC

Description Yurii Prokulevych 2018-05-16 14:59:46 UTC
Description of problem:
-----------------------
Few tests from tempest's scenario suit fail after major upgrade
<testcase classname="tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern" name="test_create_ebs_image_and_check_boot[compute,id-36c34c67-7b54-4b59-b188-02a2f458a63b,image,volume]"

classname="tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern" name="test_create_server_from_volume_snapshot[compute,id-05795fb2-b2a7-4c9f-8fac-ff25aedb1489,image,slow,volume]"

classname="tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern" name="test_volume_boot_pattern[compute,id-557cd2c2-4eb8-4dce-98be-f86765ff311b,image,volume]"

...
traceback-1: {{{
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/utils/test_utils.py", line 84, in call_and_ignore_notfound_exc
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/tempest/lib/services/volume/v2/volumes_client.py", line 103, in delete_volume
    resp, body = self.delete(url)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 310, in delete
    return self.request('DELETE', url, extra_headers, headers, body)
  File "/usr/lib/python2.7/site-packages/tempest/lib/services/volume/base_client.py", line 38, in request
    method, url, extra_headers, headers, body, chunked)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 668, in request
    self._error_checker(resp, resp_body)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 779, in _error_checker
    raise exceptions.BadRequest(resp_body, resp=resp)
tempest.lib.exceptions.BadRequest: Bad request
Details: {u'message': u'Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots or be disassociated from snapshots after volume transfer.', u'code': 400}
}}}

traceback-2: {{{
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 880, in wait_for_resource_deletion
    raise exceptions.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (TestVolumeBootPattern:_run_cleanups) Failed to delete volume 60dd4644-df86-4590-a885-faa9dd711b20 within the required time (300 s).
}}}

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tempest/common/utils/__init__.py", line 88, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_volume_boot_pattern.py", line 135, in test_volume_boot_pattern
    snapshot = self.create_volume_snapshot(volume_origin['id'], force=True)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 251, in create_volume_snapshot
    metadata=metadata)['snapshot']
  File "/usr/lib/python2.7/site-packages/tempest/lib/services/volume/v2/snapshots_client.py", line 65, in create_snapshot
    resp, body = self.post('snapshots', post_body)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 279, in post
    return self.request('POST', url, extra_headers, headers, body, chunked)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 668, in request
    self._error_checker(resp, resp_body)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 779, in _error_checker
    raise exceptions.BadRequest(resp_body, resp=resp)
tempest.lib.exceptions.BadRequest: Bad request
Details: {u'message': u'One of cinder-scheduler services is too old to accept create_snapshot request. Required RPC API version is 3.9. Are you running mixed versions of cinder-schedulers?', u'code': 400}



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
puppet-cinder-12.4.1-0.20180329071637.4011a82.el7ost.noarch
python-cinder-12.0.1-0.20180418194613.c476898.el7ost.noarch
python2-cinderclient-3.5.0-1.el7ost.noarch
openstack-cinder-12.0.1-0.20180418194613.c476898.el7ost.noarch

openstack-tripleo-heat-templates-8.0.2-19.el7ost.noarch

Steps to Reproduce:
-------------------
1. Run major upgrade of RHOS-12 to RHOS-13
2. Launch tempest scenarios suite after upgrade

Additional info:
----------------
Virtual setup: 3controllers + 3messaging + 3database + 3ceph + 2network + 2compute
               IPv6, custom overcloud name - 'qe-Cloud-0'

Related BZs for ffwd:
---------------------
https://bugzilla.redhat.com/show_bug.cgi?id=1554122
https://bugzilla.redhat.com/show_bug.cgi?id=1557331

Comment 2 Alan Bishop 2018-05-16 15:42:40 UTC
This seems to be an upgrade issue similar to bug #1554122. That BZ contains a reference to a patch [1] that relates to sequencing the cinder-volume service restarts under pacemaker. This BZ describes a similar problem about mixed versions of the cinder-scheduler service, except that cinder-scheduler does not run under pacemaker.

Comment 5 Carlos Camacho 2018-05-28 14:11:52 UTC
Hey Alan,

In this case, we have specifically an upgrade_tasks section on THT where you can restart any service you want. Let's sync up for a proper fix.

Comment 6 Alan Bishop 2018-05-29 21:16:17 UTC
Yuri, can you try a local patch to verify it works before I propose it upstream?

After upgrading the undercloud but before you upgrade the overcloud, patch the cinder-manage command at [1] to add the "--bump-versions" option, like this:

"su cinder -s /bin/bash -c 'cinder-manage db sync --bump-versions'"

[1] https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/docker/services/cinder-api.yaml#L139

Tzach, maybe you could also try this?

Comment 7 Tzach Shefi 2018-05-30 17:46:51 UTC
FYI Alan,Alex,Yurri 
I'd "cherry picked" (manually added) --bump-versions, 
on an upgraded undercloud before overcloud upgrade started. 

Suggested fix worked, I can do cinder create and cinder create snapshot. Not getting version conflict error Yuri and I got before. 

Before fix on an upgraded system, I got 19 Cinder related failures due to version issue, now only 3 failed (known reason). 

This would be OK to verify once fix lands in RPM build/deployment.

Comment 8 Alan Bishop 2018-05-30 17:58:18 UTC
Thanks, Tzach! I will propose a patch upstream, and backport to OSP-13 ASAP.

Comment 9 Alan Bishop 2018-05-31 12:43:58 UTC
Patch has been approved upstream.

Comment 13 Tzach Shefi 2018-06-03 12:41:06 UTC
Verified on:
openstack-tripleo-heat-templates-8.0.2-29.el7ost.noarch

Upgraded a system from OSP12 to OSP13. 
Post upgrade ran some Cinder commands without errors: 
cinder create 
cinder snapshot-create .. 

No mention of  original issue ->    One of cinder-scheduler services is too old to accept create_snapshot 
OK to verify.

Comment 16 errata-xmlrpc 2018-06-27 13:56:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086


Note You need to log in before you can comment on or make changes to this bug.