Created attachment 1120039 [details] Nova compute logs Description of problem: Detached a volume from server and tried to delete the volume. Delete failed with "Volume is busy" error. The problem seems to be with detaching volume. It fails silently without any errors. Version-Release number of selected component (if applicable): RHEL 7.2 ceph-common-0.94.5-3.el7cp.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.4.x86_64 libvirt-daemon-driver-qemu-1.2.17-13.el7_2.2.x86_64 qemu-kvm-common-rhev-2.3.0-31.el7_2.4.x86_64 qemu-img-rhev-2.3.0-31.el7_2.4.x86_64 libvirt-daemon-1.2.17-13.el7_2.2.x86_64 openstack-nova-compute-2015.1.2-13.el7ost.noarch How reproducible: Always Steps to Reproduce: 1. Configure ceph with openstack for nova and cinder 2. Attach rbd volume to nova instance 3. Detach the volume. 4. Try to delete the volume. Actual results: Delete volume fails because the disk is still in use by the instance even after detaching it. Expected results: Detach should be successful. Currently detach fails silently. Additional info: # nova volume-detach testvm bcc54e44-7f3d-41aa-af40-5d3b6108e7e3 # cinder list +--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+ | ID | Status | Display Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+ | bcc54e44-7f3d-41aa-af40-5d3b6108e7e3| detaching | test | 1 | - | false | 6da5a34d-478c-4368-a703-01a6e087e0a4 +--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+ # cinder delete testvol2 # cinder list +--------------------------------------+-----------+--------------+------+-------------+----------+-------------+ | ID | Status | Display Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+--------------+------+-------------+----------+-------------+ | bcc54e44-7f3d-41aa-af40-5d3b6108e7e3 | available | test | 1 | - | false | | +--------------------------------------+-----------+--------------+------+-------------+----------+-------- The volume is still available. But the logs show that the volume was not deleted: 2016-02-01 02:27:59.748 15003 INFO cinder.volume.manager [req-67a6b442-42f0-4e97-926b-5bd3c0c9a14b ffb1c580e14b467dbf6f2b7f1e032504 71d9b72db9764dbda22dc115501dd459 - - -] volume bcc54e44-7f3d-41aa-af40-5d3b6108e7e3: deleting 2016-02-01 02:27:59.853 15003 WARNING cinder.volume.drivers.rbd [req-67a6b442-42f0-4e97-926b-5bd3c0c9a14b ffb1c580e14b467dbf6f2b7f1e032504 71d9b72db9764dbda22dc115501dd459 - - -] ImageBusy error raised while deleting rbd volume. This may have been caused by a connection from a client that has crashed and, if so, may be resolved by retrying the delete after 30 seconds has elapsed. 2016-02-01 02:27:59.857 15003 ERROR cinder.volume.manager [req-67a6b442-42f0-4e97-926b-5bd3c0c9a14b ffb1c580e14b467dbf6f2b7f1e032504 71d9b72db9764dbda22dc115501dd459 - - -] Cannot delete volume bcc54e44-7f3d-41aa-af40-5d3b6108e7e3: volume is busy The logs show that the volume is detached: 2016-01-29 23:02:29.085 31981 INFO nova.compute.manager [req-10857225-ed6f-48aa-8f7b-c7c6a92c76d9 ffb1c580e14b467dbf6f2b7f1e032504 71d9b72db9764dbda22dc115501dd459 - - -] [instance: 01c1011e-5ec7-4e73-afa0-8f839bdfde59] Detach volume bcc54e44-7f3d-41aa-af40-5d3b6108e7e3 from mountpoint /dev/vdc But from the dumpxml, we can still see the disk attached. </disk> <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <auth username='cinder'> <secret type='ceph' uuid='46ed6c5d-9ae0-44bd-94cc-15e3ec731f99'/> </auth> <source protocol='rbd' name='volumes/volume-bcc54e44-7f3d-41aa-af40-5d3b6108e7e3'> <host name='10.x.x.x' port='6789'/> <host name='10.x.x.x' port='6789'/> <host name='10.x.x.x' port='6789'/> </source> <backingStore/> <target dev='vdc' bus='virtio'/> <serial>bcc54e44-7f3d-41aa-af40-5d3b6108e7e3</serial> <alias name='virtio-disk2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> On the ceph side, enabled qemu logging with extra debug level for rbd: The qemu-guest logs show the following: $ rados -p volumes listwatchers rbd_header.13b4554210e48 watcher=10.8.x.x:0/3011801 client.91015 cookie=140313423562112 watcher=10.8.x.x:0/2016911 client.91021 cookie=140553067563008 2016-01-29 23:00:27.069499 7fd50c7a2cc0 20 librbd: open_image: ictx = 0x7fd511233200 name = 'volume-bcc54e44-7f3d-41aa-af40-5d3b6108e7e3' id = '' snap_name = '' 2016-01-29 23:00:27.075416 7fd50c7a2cc0 10 librbd::ImageWatcher: 0x7fd50f974f40 registering image watcher 2016-01-29 23:00:27.075447 7fd50c7a2cc0 1 -- 10.8.x.x:0/2016911 --> 10.8.128.96:6808/104846 -- osd_op(client.91021.0:6 rbd_header.13b4554210e48 [watch watch cookie 140553067563008] 2.b0e12566 ondisk+write+known_if_redirected e250) v5 -- ?+0 0x7fd50faa0300 con 0x7fd510b24d60 2016-01-29 23:00:27.092122 7fd4618ee700 1 -- 10.8.x.x:0/2016911 <== osd.2 10.8.x.x:6808/104846 3 ==== osd_op_reply(6 rbd_header.13b4554210e48 [watch watch cookie 140553067563008] v250'7 uv1 ondisk = 0) v6 ==== 191+0+0 (844780309 0 0) 0x7fd50fa5b8c0 con 0x7fd510b24d60 2016-02-01 02:34:17.506005 7fd467bfd700 1 -- 10.8.x.x:0/2016911 --> 10.8.x.x:6808/104846 -- osd_op(client.91021.0:37134 rbd_header.13b4554210e48 [watch ping cookie 140553067563008] 2.b0e12566 ondisk+write+known_if_redirected e250) v5 -- ?+0 0x7fd50fa9ec80 con 0x7fd510b24d60 2016-02-01 02:34:17.506046 7fd467bfd700 1 -- 10.8.x.x:0/2016911 --> 10.8.x.x:6808/104846 -- ping magic: 0 v1 -- ?+0 0x7fd511309480 con 0x7fd510b24d60 2016-02-01 02:34:17.506525 7fd4618ee700 1 -- 10.8.x.x:0/2016911 <== osd.2 10.8.x.x:6808/104846 37130 ==== osd_op_reply(37134 rbd_header.13b4554210e48 [watch ping cookie 140553067563008] v0'0 uv1 ondisk = 0) v6 ==== 191+0+0 (370994486 0 0) 0x7fd50fa5b8c0 con 0x7fd510b24d60 So there is a watch that is still being kept alive (and the image is still open). Seems like a detach volume did not close the rbd volume image
Created attachment 1120635 [details] Libvirt debug logs during attach and detach volume
Please feel free to reopen with updated information if the problem happen again.
Closed without a fix therefore QE won't automate
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days