Description of problem: ----------------------- Few tests from tempest's scenario suit fail after major upgrade <testcase classname="tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern" name="test_create_ebs_image_and_check_boot[compute,id-36c34c67-7b54-4b59-b188-02a2f458a63b,image,volume]" classname="tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern" name="test_create_server_from_volume_snapshot[compute,id-05795fb2-b2a7-4c9f-8fac-ff25aedb1489,image,slow,volume]" classname="tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern" name="test_volume_boot_pattern[compute,id-557cd2c2-4eb8-4dce-98be-f86765ff311b,image,volume]" ... traceback-1: {{{ Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tempest/lib/common/utils/test_utils.py", line 84, in call_and_ignore_notfound_exc return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/tempest/lib/services/volume/v2/volumes_client.py", line 103, in delete_volume resp, body = self.delete(url) File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 310, in delete return self.request('DELETE', url, extra_headers, headers, body) File "/usr/lib/python2.7/site-packages/tempest/lib/services/volume/base_client.py", line 38, in request method, url, extra_headers, headers, body, chunked) File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 668, in request self._error_checker(resp, resp_body) File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 779, in _error_checker raise exceptions.BadRequest(resp_body, resp=resp) tempest.lib.exceptions.BadRequest: Bad request Details: {u'message': u'Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots or be disassociated from snapshots after volume transfer.', u'code': 400} }}} traceback-2: {{{ Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 880, in wait_for_resource_deletion raise exceptions.TimeoutException(message) tempest.lib.exceptions.TimeoutException: Request timed out Details: (TestVolumeBootPattern:_run_cleanups) Failed to delete volume 60dd4644-df86-4590-a885-faa9dd711b20 within the required time (300 s). }}} Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tempest/common/utils/__init__.py", line 88, in wrapper return f(*func_args, **func_kwargs) File "/usr/lib/python2.7/site-packages/tempest/scenario/test_volume_boot_pattern.py", line 135, in test_volume_boot_pattern snapshot = self.create_volume_snapshot(volume_origin['id'], force=True) File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 251, in create_volume_snapshot metadata=metadata)['snapshot'] File "/usr/lib/python2.7/site-packages/tempest/lib/services/volume/v2/snapshots_client.py", line 65, in create_snapshot resp, body = self.post('snapshots', post_body) File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 279, in post return self.request('POST', url, extra_headers, headers, body, chunked) File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 668, in request self._error_checker(resp, resp_body) File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 779, in _error_checker raise exceptions.BadRequest(resp_body, resp=resp) tempest.lib.exceptions.BadRequest: Bad request Details: {u'message': u'One of cinder-scheduler services is too old to accept create_snapshot request. Required RPC API version is 3.9. Are you running mixed versions of cinder-schedulers?', u'code': 400} Version-Release number of selected component (if applicable): ------------------------------------------------------------- puppet-cinder-12.4.1-0.20180329071637.4011a82.el7ost.noarch python-cinder-12.0.1-0.20180418194613.c476898.el7ost.noarch python2-cinderclient-3.5.0-1.el7ost.noarch openstack-cinder-12.0.1-0.20180418194613.c476898.el7ost.noarch openstack-tripleo-heat-templates-8.0.2-19.el7ost.noarch Steps to Reproduce: ------------------- 1. Run major upgrade of RHOS-12 to RHOS-13 2. Launch tempest scenarios suite after upgrade Additional info: ---------------- Virtual setup: 3controllers + 3messaging + 3database + 3ceph + 2network + 2compute IPv6, custom overcloud name - 'qe-Cloud-0' Related BZs for ffwd: --------------------- https://bugzilla.redhat.com/show_bug.cgi?id=1554122 https://bugzilla.redhat.com/show_bug.cgi?id=1557331
This seems to be an upgrade issue similar to bug #1554122. That BZ contains a reference to a patch [1] that relates to sequencing the cinder-volume service restarts under pacemaker. This BZ describes a similar problem about mixed versions of the cinder-scheduler service, except that cinder-scheduler does not run under pacemaker.
Hey Alan, In this case, we have specifically an upgrade_tasks section on THT where you can restart any service you want. Let's sync up for a proper fix.
Yuri, can you try a local patch to verify it works before I propose it upstream? After upgrading the undercloud but before you upgrade the overcloud, patch the cinder-manage command at [1] to add the "--bump-versions" option, like this: "su cinder -s /bin/bash -c 'cinder-manage db sync --bump-versions'" [1] https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/docker/services/cinder-api.yaml#L139 Tzach, maybe you could also try this?
FYI Alan,Alex,Yurri I'd "cherry picked" (manually added) --bump-versions, on an upgraded undercloud before overcloud upgrade started. Suggested fix worked, I can do cinder create and cinder create snapshot. Not getting version conflict error Yuri and I got before. Before fix on an upgraded system, I got 19 Cinder related failures due to version issue, now only 3 failed (known reason). This would be OK to verify once fix lands in RPM build/deployment.
Thanks, Tzach! I will propose a patch upstream, and backport to OSP-13 ASAP.
Patch has been approved upstream.
Verified on: openstack-tripleo-heat-templates-8.0.2-29.el7ost.noarch Upgraded a system from OSP12 to OSP13. Post upgrade ran some Cinder commands without errors: cinder create cinder snapshot-create .. No mention of original issue -> One of cinder-scheduler services is too old to accept create_snapshot OK to verify.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086