Bug 1287696 - [RBD] Failed to delete a detached volume - resource is busy
Summary: [RBD] Failed to delete a detached volume - resource is busy
Keywords:
Status: CLOSED DUPLICATE of bug 1293607
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: unspecified
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 8.0 (Liberty)
Assignee: Lee Yarwood
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks: 1405319
TreeView+ depends on / blocked
 
Reported: 2015-12-02 14:05 UTC by Yogev Rabl
Modified: 2019-09-09 17:13 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1405319 (view as bug list)
Environment:
Last Closed: 2016-02-22 20:29:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1464259 0 None None None 2015-12-22 19:25:57 UTC
Launchpad 1522036 0 None None None Never
OpenStack gerrit 258695 0 None MERGED Refresh stale volume BDMs in terminate_connection 2020-10-28 22:53:46 UTC

Description Yogev Rabl 2015-12-02 14:05:27 UTC
Description of problem:
A volume that was created from a snapshot was attached as a bootable volume to an instance, after terminating the instance I tried to delete the volume. The deletion failed with the error:

http://pastebin.test.redhat.com/332225

The volume's status according to rbd info is: 
rbd image 'volume-e6ce4c0a-90a1-4e47-89dc-fc721429ac13':
        size 1024 MB in 128 objects
        order 23 (8192 kB objects)
        block_name_prefix: rbd_data.11b0f036b20dc2
        format: 2
        features: layering, striping
        parent: yrabl-cinder/volume-641c5ae4-e227-46b0-a93d-76de2448ec80@snapshot-902a826f-b616-4df4-b0e0-f3418fdd9186
        overlap: 1024 MB
        stripe unit: 4096 kB
        stripe count: 1
The volume has no snapshots of its own, the command
# rbd -p yrabl-cinder snap ls volume-e6ce4c0a-90a1-4e47-89dc-fc721429ac13
provided no output, as expected. 

Version-Release number of selected component (if applicable):
openstack-cinder-7.0.0-2.el7ost.noarch
python-cinderclient-1.4.0-1.el7ost.noarch
python-cinder-7.0.0-2.el7ost.noarch


How reproducible:
unknown

Steps to Reproduce:
1. Create a volume from a Cirros image
# cinder create --image-id <cirros image uuid> --display-name base 1 
2. Create a snapshot of base volume
# cinder snapshot-create --name snap <base uuid>
3. Extend base volume to 10G
# cinder extend <base uuid> 10
4. Create a volume from snap
# cinder create snapshot-id <snap uuid> --display-name from-snap 1 
5. Launch an instance with from-snap volume
# nova boot --flavor m1.small --boot-volume <from-snap uuid> --net... 
6. Terminate the instance 
# nova delete ... 
7. Delete from-snap volume
# cinder delete <from-snap uuid>

Actual results:
The volume changed to deleted status and back to available

Expected results:
The volume is deleted

Additional info:

Comment 2 Sergey Gotliv 2015-12-06 15:51:10 UTC
Yogev, is this a one time issue or something you can reproduce?
Did you try to wait 30+ seconds and delete it again?

Comment 3 Yogev Rabl 2015-12-07 08:08:44 UTC
Sergey, I can reproduce it easily

Comment 4 Yogev Rabl 2015-12-09 13:37:09 UTC
(In reply to Sergey Gotliv from comment #2)
> Yogev, is this a one time issue or something you can reproduce?
> Did you try to wait 30+ seconds and delete it again?

After further investigation it seems like the resource

Comment 5 Ariel Opincaru 2015-12-09 13:45:51 UTC
Yogev,

This issue also happens on tempest, not necessarily from a snapshot!
For example there is a tempest test [1] which creates a server, creates a volume, attach & detach the volume with server and then delete it.
When cinder tries to delete the volume, the return code is zero but the volume is not deleted!
The test is fails on timeout waiting for the volume to be deleted...

From what I tried, sometimes it possible to delete the volume only after few minutes and sometimes it doesn't.
(usind RBD driver)

[1] tempest.api.compute.volumes.test_attach_volume.AttachVolumeTestJSON.test_list_get_volume_attachments

From /var/log/cinder/volume.log: 2015-12-08 08:12:46.827 9222 WARNING cinder.volume.drivers.rbd [req-7dcb2245-7294-481f-9e19-8b89a6114915 a3a73b549cd54e8bbb4e0800e48e2b04 6755e3b38f09432b9a2cab05e7f73303 - - -] ImageBusy error raised while deleting rbd volume. This may have been caused by a connection from a client that has crashed and, if so, may be resolved by retrying the delete after 30 seconds has elapsed.
2015-12-08 08:12:46.834 9222 ERROR cinder.volume.manager [req-7dcb2245-7294-481f-9e19-8b89a6114915 a3a73b549cd54e8bbb4e0800e48e2b04 6755e3b38f09432b9a2cab05e7f73303 - - -] Cannot delete volume bab96bc0-d924-416c-8d66-7b7e8c5612f1: volume is busy

Please let me know if you need aditional info.

Ariel.

Comment 6 Ariel Opincaru 2015-12-10 09:29:35 UTC
tempest.api.compute.volumes.test_attach_volume.AttachVolumeTestJSON.test_list_get_volume_attachments[id-7fa563fe-f0f7-43eb-9e22-a1ece036b513]
---------------------------------------------------------------------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/home/stack/tempest/tempest/api/compute/volumes/test_attach_volume.py", line 58, in _delete_volume
        self.volumes_client.wait_for_resource_deletion(self.volume['id'])
      File "/usr/lib/python2.7/site-packages/tempest_lib/common/rest_client.py", line 771, in wait_for_resource_deletion
        raise exceptions.TimeoutException(message)
    TimeoutException: Request timed out
    Details: (AttachVolumeTestJSON:_run_cleanups) Failed to delete volume d11fa083-08b0-47fa-a38a-7ea9ddf57384 within the required time (300 s).
    Traceback (most recent call last):
    _StringException: Empty attachments:
      stderr
      stdout

Comment 7 Jon Bernard 2015-12-22 19:20:53 UTC
Are you scripting this?  Manually, these commands succeed.  If you're scripting this and don't give the instance enough time to reach active state, you might be hitting https://bugs.launchpad.net/cinder/+bug/1464259

Comment 8 Yogev Rabl 2015-12-29 11:50:46 UTC
(In reply to Jon Bernard from comment #7)
> Are you scripting this?  Manually, these commands succeed.  If you're
> scripting this and don't give the instance enough time to reach active
> state, you might be hitting https://bugs.launchpad.net/cinder/+bug/1464259

The bug was open after a manual test, I didn't script it. Are you using the same version? 

I can't be sure that it is the same bug, cause I can't see any logs there

Comment 9 Lee Yarwood 2016-01-22 15:53:12 UTC
BZ#1293607 pulls in the associated fix from c#7 with the rebase to 12.0.1.

Yogev, can you confirm if your issue still reproduces with this version?

Comment 10 Lee Yarwood 2016-02-22 20:29:52 UTC
Closing out as a duplicate of the rebase bug that pulls in the associated fix.

*** This bug has been marked as a duplicate of bug 1293607 ***


Note You need to log in before you can comment on or make changes to this bug.