Bug 1859370 - Retype of RBD snapshot volume is failing
Summary: Retype of RBD snapshot volume is failing
Keywords:
Status: CLOSED DUPLICATE of bug 1764324
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Cinder Bugs List
QA Contact: Tzach Shefi
Chuck Copello
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-21 20:16 UTC by James Parker
Modified: 2021-04-01 09:48 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-01 09:48:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description James Parker 2020-07-21 20:16:17 UTC
Description of problem:
RBD Retype if failing when running tempest.api.volume.admin.test_volume_retype.VolumeRetypeWithMigrationTest.test_volume_from_snapshot_retype_with_migration from [1].  This is a multi-backend deployment consisting of RBD, NFS, NETAPP, and ISCSI.  Any instance where the test attempts to retype the RBD volume to the other backend it will fail the testcase. Retyping from another backend to RBD does not show any issues.  Below results are from RBD to NFS.

(overcloud) [stack@undercloud-0 tempest_workspace]$ tempest run --serial --regex tempest.api.volume.admin.test_volume_retype
{0} tempest.api.volume.admin.test_volume_retype.VolumeRetypeWithMigrationTest.test_available_volume_retype_with_migration [27.072103s] ... ok
{0} tempest.api.volume.admin.test_volume_retype.VolumeRetypeWithMigrationTest.test_volume_from_snapshot_retype_with_migration [304.284310s] ... FAILED

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    b'Traceback (most recent call last):'
    b'  File "/usr/lib/python3.6/site-packages/tempest/api/volume/admin/test_volume_retype.py", line 142, in test_volume_from_snapshot_retype_with_migration'
    b'    src_vol = self._create_volume_from_snapshot()'
    b'  File "/usr/lib/python3.6/site-packages/tempest/api/volume/admin/test_volume_retype.py", line 67, in _create_volume_from_snapshot'
    b"    self.snapshots_client.wait_for_resource_deletion(snapshot['id'])"
    b'  File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 899, in wait_for_resource_deletion'
    b'    raise exceptions.TimeoutException(message)'
    b'tempest.lib.exceptions.TimeoutException: Request timed out'
    b'Details: (VolumeRetypeWithMigrationTest:test_volume_from_snapshot_retype_with_migration) Failed to delete volume-snapshot 81982518-e552-4f6e-a805-0647e9ea2cbf within the required time (300 s).'
    b''


Version-Release number of selected component (if applicable):
16.1

How reproducible:
100% reproducible

Steps to Reproduce:
1. Create a multi-backend deployment consisting of RBD backend and another backend
2. Execute tempest test tempest.api.volume.admin.test_volume_retype.VolumeRetypeWithMigrationTest.test_volume_from_snapshot_retype_with_migration
3.

Actual results:
Testcase times out when attempting to clean up snapshot

Expected results:
Test should successfully retype the volume created from the snapshot from RBD to destination backend


Additional info:

[1] https://github.com/openstack/tempest/blob/master/tempest/api/volume/admin/test_volume_retype.py#L141

Comment 1 Alan Bishop 2020-07-21 20:40:25 UTC
We need to see the cinder logs (with DEBUG).

Comment 2 Alan Bishop 2020-07-22 14:12:18 UTC
I reviewed the logs (thanks for saving them on the hypervisor, James!), and see this in the cinder-volume log:

2020-07-21 14:07:15.071 79 INFO cinder.volume.drivers.rbd [req-a624464a-f0f3-41e7-b52e-54308ee5fccc 38438ec8d3d44978b417b1153327b587 8591c28be4224f9cbb6fb59556b50db8 - default default] Image volumes/volume-c5b26f69-87bc-494f-8d4f-9ea03c3a304d is dependent on the snapshot snapshot-81982518-e552-4f6e-a805-0647e9ea2cbf.
2020-07-21 14:07:15.079 79 ERROR cinder.volume.manager [req-a624464a-f0f3-41e7-b52e-54308ee5fccc 38438ec8d3d44978b417b1153327b587 8591c28be4224f9cbb6fb59556b50db8 - default default] Delete snapshot failed, due to snapshot busy.: cinder.exception.SnapshotIsBusy: deleting snapshot snapshot-81982518-e552-4f6e-a805-0647e9ea2cbf that has dependent volumes

This occurs because tempest is using this [1] sequence to create the volume it plans to retype.

[1] https://github.com/openstack/tempest/blob/6cb37d68b2cb40cec9dcbb9e26c0649c6e6c877a/tempest/api/volume/admin/test_volume_retype.py#L61-L67

The tempest test fails because the snapshot cannot be deleted, and this happens before attempting the actual migration/retype. The reason the snapshot cannot be deleted is the RBD driver creates a fast COW clone of the snapshot, and that creates a dependency on the snapshot that prevents it from being deleted.

One solution is to configure the RBD driver with rbd_flatten_volume_from_snapshot=True, but a better solution is to rework the tempest test to defer deleting the snapshot until after the retype operation completes.

Unless others object, I think this should be handled as a tempest bug.

Comment 3 Alan Bishop 2020-07-22 15:21:03 UTC
Ignore my previous comment about this being a tempest bug. Apparently the RBD driver is *not* supposed to behave this way, and the rbd_flatten_volume_from_snapshot parameter is not intended to address the behavior.

There are other open BZs covering this problem (e.g. bug #1437392), and the cinder squad needs to do some bz cleanup and determine a course of action.

Comment 4 Luigi Toscano 2021-04-01 09:48:23 UTC
This is going to be addressed in OSP 16.2 thanks to the usage of RBD Clone v2 API. Please see bug 1764324.

*** This bug has been marked as a duplicate of bug 1764324 ***


Note You need to log in before you can comment on or make changes to this bug.