Bug 1287696

Summary:	[RBD] Failed to delete a detached volume - resource is busy
Product:	Red Hat OpenStack	Reporter:	Yogev Rabl <yrabl>
Component:	openstack-nova	Assignee:	Lee Yarwood <lyarwood>
Status:	CLOSED DUPLICATE	QA Contact:	nlevinki <nlevinki>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	unspecified	CC:	aopincar, berrange, dasmith, eglynn, eharney, jobernar, kchamart, lyarwood, ndipanov, sbauza, scohen, sferdjao, sgordon, sgotliv, vromanso, yeylon, yrabl
Target Milestone:	---	Keywords:	Automation
Target Release:	8.0 (Liberty)
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1405319 (view as bug list)		Environment:
Last Closed:	2016-02-22 20:29:52 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1405319

Description Yogev Rabl 2015-12-02 14:05:27 UTC

Description of problem:
A volume that was created from a snapshot was attached as a bootable volume to an instance, after terminating the instance I tried to delete the volume. The deletion failed with the error:

http://pastebin.test.redhat.com/332225

The volume's status according to rbd info is: 
rbd image 'volume-e6ce4c0a-90a1-4e47-89dc-fc721429ac13':
        size 1024 MB in 128 objects
        order 23 (8192 kB objects)
        block_name_prefix: rbd_data.11b0f036b20dc2
        format: 2
        features: layering, striping
        parent: yrabl-cinder/volume-641c5ae4-e227-46b0-a93d-76de2448ec80@snapshot-902a826f-b616-4df4-b0e0-f3418fdd9186
        overlap: 1024 MB
        stripe unit: 4096 kB
        stripe count: 1
The volume has no snapshots of its own, the command
# rbd -p yrabl-cinder snap ls volume-e6ce4c0a-90a1-4e47-89dc-fc721429ac13
provided no output, as expected. 

Version-Release number of selected component (if applicable):
openstack-cinder-7.0.0-2.el7ost.noarch
python-cinderclient-1.4.0-1.el7ost.noarch
python-cinder-7.0.0-2.el7ost.noarch


How reproducible:
unknown

Steps to Reproduce:
1. Create a volume from a Cirros image
# cinder create --image-id <cirros image uuid> --display-name base 1 
2. Create a snapshot of base volume
# cinder snapshot-create --name snap <base uuid>
3. Extend base volume to 10G
# cinder extend <base uuid> 10
4. Create a volume from snap
# cinder create snapshot-id <snap uuid> --display-name from-snap 1 
5. Launch an instance with from-snap volume
# nova boot --flavor m1.small --boot-volume <from-snap uuid> --net... 
6. Terminate the instance 
# nova delete ... 
7. Delete from-snap volume
# cinder delete <from-snap uuid>

Actual results:
The volume changed to deleted status and back to available

Expected results:
The volume is deleted

Additional info:

Comment 2 Sergey Gotliv 2015-12-06 15:51:10 UTC

Yogev, is this a one time issue or something you can reproduce?
Did you try to wait 30+ seconds and delete it again?

Comment 3 Yogev Rabl 2015-12-07 08:08:44 UTC

Sergey, I can reproduce it easily

Comment 4 Yogev Rabl 2015-12-09 13:37:09 UTC

(In reply to Sergey Gotliv from comment #2)
> Yogev, is this a one time issue or something you can reproduce?
> Did you try to wait 30+ seconds and delete it again?

After further investigation it seems like the resource

Comment 5 Ariel Opincaru 2015-12-09 13:45:51 UTC

Yogev,

This issue also happens on tempest, not necessarily from a snapshot!
For example there is a tempest test [1] which creates a server, creates a volume, attach & detach the volume with server and then delete it.
When cinder tries to delete the volume, the return code is zero but the volume is not deleted!
The test is fails on timeout waiting for the volume to be deleted...

From what I tried, sometimes it possible to delete the volume only after few minutes and sometimes it doesn't.
(usind RBD driver)

[1] tempest.api.compute.volumes.test_attach_volume.AttachVolumeTestJSON.test_list_get_volume_attachments

From /var/log/cinder/volume.log: 2015-12-08 08:12:46.827 9222 WARNING cinder.volume.drivers.rbd [req-7dcb2245-7294-481f-9e19-8b89a6114915 a3a73b549cd54e8bbb4e0800e48e2b04 6755e3b38f09432b9a2cab05e7f73303 - - -] ImageBusy error raised while deleting rbd volume. This may have been caused by a connection from a client that has crashed and, if so, may be resolved by retrying the delete after 30 seconds has elapsed.
2015-12-08 08:12:46.834 9222 ERROR cinder.volume.manager [req-7dcb2245-7294-481f-9e19-8b89a6114915 a3a73b549cd54e8bbb4e0800e48e2b04 6755e3b38f09432b9a2cab05e7f73303 - - -] Cannot delete volume bab96bc0-d924-416c-8d66-7b7e8c5612f1: volume is busy

Please let me know if you need aditional info.

Ariel.

Comment 6 Ariel Opincaru 2015-12-10 09:29:35 UTC

tempest.api.compute.volumes.test_attach_volume.AttachVolumeTestJSON.test_list_get_volume_attachments[id-7fa563fe-f0f7-43eb-9e22-a1ece036b513]
---------------------------------------------------------------------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/home/stack/tempest/tempest/api/compute/volumes/test_attach_volume.py", line 58, in _delete_volume
        self.volumes_client.wait_for_resource_deletion(self.volume['id'])
      File "/usr/lib/python2.7/site-packages/tempest_lib/common/rest_client.py", line 771, in wait_for_resource_deletion
        raise exceptions.TimeoutException(message)
    TimeoutException: Request timed out
    Details: (AttachVolumeTestJSON:_run_cleanups) Failed to delete volume d11fa083-08b0-47fa-a38a-7ea9ddf57384 within the required time (300 s).
    Traceback (most recent call last):
    _StringException: Empty attachments:
      stderr
      stdout

Comment 7 Jon Bernard 2015-12-22 19:20:53 UTC

Are you scripting this?  Manually, these commands succeed.  If you're scripting this and don't give the instance enough time to reach active state, you might be hitting https://bugs.launchpad.net/cinder/+bug/1464259

Comment 8 Yogev Rabl 2015-12-29 11:50:46 UTC

(In reply to Jon Bernard from comment #7)
> Are you scripting this?  Manually, these commands succeed.  If you're
> scripting this and don't give the instance enough time to reach active
> state, you might be hitting https://bugs.launchpad.net/cinder/+bug/1464259

The bug was open after a manual test, I didn't script it. Are you using the same version? 

I can't be sure that it is the same bug, cause I can't see any logs there

Comment 9 Lee Yarwood 2016-01-22 15:53:12 UTC

BZ#1293607 pulls in the associated fix from c#7 with the rebase to 12.0.1.

Yogev, can you confirm if your issue still reproduces with this version?

Comment 10 Lee Yarwood 2016-02-22 20:29:52 UTC

Closing out as a duplicate of the rebase bug that pulls in the associated fix.

*** This bug has been marked as a duplicate of bug 1293607 ***