Bug 1578901

Summary: [UPGRADES] TempestFailure: One of cinder-scheduler services is too old to accept create_snapshot request
Product: Red Hat OpenStack Reporter: Yurii Prokulevych <yprokule>
Component: openstack-tripleo-heat-templatesAssignee: Alan Bishop <abishop>
Status: CLOSED ERRATA QA Contact: Tzach Shefi <tshefi>
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: abishop, augol, ccamacho, cschwede, jschluet, knylande, lbezdick, mbultel, mburns, mcornea, scohen, srevivo, tshefi, yprokule
Target Milestone: rcKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.0.2-29.el7ost Doc Type: Bug Fix
Doc Text:
After upgrading to a new release, Block Storage services (cinder) were stuck using the old RPC versions from the prior release. Because of this, all cinder API requests requiring the latest RPC versions failed. When upgrading to a new release, all cinder RPC versions are updated to match the latest release.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:56:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yurii Prokulevych 2018-05-16 14:59:46 UTC
Description of problem:
-----------------------
Few tests from tempest's scenario suit fail after major upgrade
<testcase classname="tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern" name="test_create_ebs_image_and_check_boot[compute,id-36c34c67-7b54-4b59-b188-02a2f458a63b,image,volume]"

classname="tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern" name="test_create_server_from_volume_snapshot[compute,id-05795fb2-b2a7-4c9f-8fac-ff25aedb1489,image,slow,volume]"

classname="tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern" name="test_volume_boot_pattern[compute,id-557cd2c2-4eb8-4dce-98be-f86765ff311b,image,volume]"

...
traceback-1: {{{
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/utils/test_utils.py", line 84, in call_and_ignore_notfound_exc
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/tempest/lib/services/volume/v2/volumes_client.py", line 103, in delete_volume
    resp, body = self.delete(url)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 310, in delete
    return self.request('DELETE', url, extra_headers, headers, body)
  File "/usr/lib/python2.7/site-packages/tempest/lib/services/volume/base_client.py", line 38, in request
    method, url, extra_headers, headers, body, chunked)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 668, in request
    self._error_checker(resp, resp_body)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 779, in _error_checker
    raise exceptions.BadRequest(resp_body, resp=resp)
tempest.lib.exceptions.BadRequest: Bad request
Details: {u'message': u'Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots or be disassociated from snapshots after volume transfer.', u'code': 400}
}}}

traceback-2: {{{
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 880, in wait_for_resource_deletion
    raise exceptions.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (TestVolumeBootPattern:_run_cleanups) Failed to delete volume 60dd4644-df86-4590-a885-faa9dd711b20 within the required time (300 s).
}}}

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tempest/common/utils/__init__.py", line 88, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_volume_boot_pattern.py", line 135, in test_volume_boot_pattern
    snapshot = self.create_volume_snapshot(volume_origin['id'], force=True)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 251, in create_volume_snapshot
    metadata=metadata)['snapshot']
  File "/usr/lib/python2.7/site-packages/tempest/lib/services/volume/v2/snapshots_client.py", line 65, in create_snapshot
    resp, body = self.post('snapshots', post_body)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 279, in post
    return self.request('POST', url, extra_headers, headers, body, chunked)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 668, in request
    self._error_checker(resp, resp_body)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 779, in _error_checker
    raise exceptions.BadRequest(resp_body, resp=resp)
tempest.lib.exceptions.BadRequest: Bad request
Details: {u'message': u'One of cinder-scheduler services is too old to accept create_snapshot request. Required RPC API version is 3.9. Are you running mixed versions of cinder-schedulers?', u'code': 400}



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
puppet-cinder-12.4.1-0.20180329071637.4011a82.el7ost.noarch
python-cinder-12.0.1-0.20180418194613.c476898.el7ost.noarch
python2-cinderclient-3.5.0-1.el7ost.noarch
openstack-cinder-12.0.1-0.20180418194613.c476898.el7ost.noarch

openstack-tripleo-heat-templates-8.0.2-19.el7ost.noarch

Steps to Reproduce:
-------------------
1. Run major upgrade of RHOS-12 to RHOS-13
2. Launch tempest scenarios suite after upgrade

Additional info:
----------------
Virtual setup: 3controllers + 3messaging + 3database + 3ceph + 2network + 2compute
               IPv6, custom overcloud name - 'qe-Cloud-0'

Related BZs for ffwd:
---------------------
https://bugzilla.redhat.com/show_bug.cgi?id=1554122
https://bugzilla.redhat.com/show_bug.cgi?id=1557331

Comment 2 Alan Bishop 2018-05-16 15:42:40 UTC
This seems to be an upgrade issue similar to bug #1554122. That BZ contains a reference to a patch [1] that relates to sequencing the cinder-volume service restarts under pacemaker. This BZ describes a similar problem about mixed versions of the cinder-scheduler service, except that cinder-scheduler does not run under pacemaker.

Comment 5 Carlos Camacho 2018-05-28 14:11:52 UTC
Hey Alan,

In this case, we have specifically an upgrade_tasks section on THT where you can restart any service you want. Let's sync up for a proper fix.

Comment 6 Alan Bishop 2018-05-29 21:16:17 UTC
Yuri, can you try a local patch to verify it works before I propose it upstream?

After upgrading the undercloud but before you upgrade the overcloud, patch the cinder-manage command at [1] to add the "--bump-versions" option, like this:

"su cinder -s /bin/bash -c 'cinder-manage db sync --bump-versions'"

[1] https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/docker/services/cinder-api.yaml#L139

Tzach, maybe you could also try this?

Comment 7 Tzach Shefi 2018-05-30 17:46:51 UTC
FYI Alan,Alex,Yurri 
I'd "cherry picked" (manually added) --bump-versions, 
on an upgraded undercloud before overcloud upgrade started. 

Suggested fix worked, I can do cinder create and cinder create snapshot. Not getting version conflict error Yuri and I got before. 

Before fix on an upgraded system, I got 19 Cinder related failures due to version issue, now only 3 failed (known reason). 

This would be OK to verify once fix lands in RPM build/deployment.

Comment 8 Alan Bishop 2018-05-30 17:58:18 UTC
Thanks, Tzach! I will propose a patch upstream, and backport to OSP-13 ASAP.

Comment 9 Alan Bishop 2018-05-31 12:43:58 UTC
Patch has been approved upstream.

Comment 13 Tzach Shefi 2018-06-03 12:41:06 UTC
Verified on:
openstack-tripleo-heat-templates-8.0.2-29.el7ost.noarch

Upgraded a system from OSP12 to OSP13. 
Post upgrade ran some Cinder commands without errors: 
cinder create 
cinder snapshot-create .. 

No mention of  original issue ->    One of cinder-scheduler services is too old to accept create_snapshot 
OK to verify.

Comment 16 errata-xmlrpc 2018-06-27 13:56:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086