Bug 1303549

Summary: Detaching rbd cinder volume from server fails silently.
Product: Red Hat OpenStack Reporter: shilpa <smanjara>
Component: openstack-novaAssignee: Sahid Ferdjaoui <sferdjao>
Status: CLOSED INSUFFICIENT_DATA QA Contact: nlevinki <nlevinki>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: berrange, dasmith, eglynn, kchamart, sbauza, sferdjao, sgordon, smanjara, srevivo, vromanso
Target Milestone: ---Keywords: ZStream
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-15 09:09:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Nova compute logs
none
Libvirt debug logs during attach and detach volume none

Description shilpa 2016-02-01 09:41:00 UTC
Created attachment 1120039 [details]
Nova compute logs

Description of problem:
Detached a volume from server and tried to delete the volume. Delete failed with "Volume is busy" error. The problem seems to be with detaching volume. It fails silently without any errors.

Version-Release number of selected component (if applicable):
RHEL 7.2
ceph-common-0.94.5-3.el7cp.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.4.x86_64
libvirt-daemon-driver-qemu-1.2.17-13.el7_2.2.x86_64
qemu-kvm-common-rhev-2.3.0-31.el7_2.4.x86_64
qemu-img-rhev-2.3.0-31.el7_2.4.x86_64
libvirt-daemon-1.2.17-13.el7_2.2.x86_64
openstack-nova-compute-2015.1.2-13.el7ost.noarch


How reproducible:
Always

Steps to Reproduce:
1. Configure ceph with openstack for nova and cinder
2. Attach rbd volume to nova instance
3. Detach the volume.
4. Try to delete the volume.

Actual results:
Delete volume fails because the disk is still in use by the instance even after detaching it.

Expected results:
Detach should be successful. Currently detach fails silently.

Additional info:


# nova volume-detach testvm bcc54e44-7f3d-41aa-af40-5d3b6108e7e3
# cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| bcc54e44-7f3d-41aa-af40-5d3b6108e7e3| detaching |     test     |  1   |      -      |  false   |  6da5a34d-478c-4368-a703-01a6e087e0a4
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+

# cinder delete testvol2
# cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| bcc54e44-7f3d-41aa-af40-5d3b6108e7e3 | available |     test     |  1   |      -      |  false   |             |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------

The volume is still available. But the logs show that the volume was not deleted:

2016-02-01 02:27:59.748 15003 INFO cinder.volume.manager [req-67a6b442-42f0-4e97-926b-5bd3c0c9a14b ffb1c580e14b467dbf6f2b7f1e032504 71d9b72db9764dbda22dc115501dd459 - - -] volume bcc54e44-7f3d-41aa-af40-5d3b6108e7e3: deleting
2016-02-01 02:27:59.853 15003 WARNING cinder.volume.drivers.rbd [req-67a6b442-42f0-4e97-926b-5bd3c0c9a14b ffb1c580e14b467dbf6f2b7f1e032504 71d9b72db9764dbda22dc115501dd459 - - -] ImageBusy error raised while deleting rbd volume. This may have been caused by a connection from a client that has crashed and, if so, may be resolved by retrying the delete after 30 seconds has elapsed.
2016-02-01 02:27:59.857 15003 ERROR cinder.volume.manager [req-67a6b442-42f0-4e97-926b-5bd3c0c9a14b ffb1c580e14b467dbf6f2b7f1e032504 71d9b72db9764dbda22dc115501dd459 - - -] Cannot delete volume bcc54e44-7f3d-41aa-af40-5d3b6108e7e3: volume is busy

The logs show that the volume is detached:

2016-01-29 23:02:29.085 31981 INFO nova.compute.manager [req-10857225-ed6f-48aa-8f7b-c7c6a92c76d9 ffb1c580e14b467dbf6f2b7f1e032504 71d9b72db9764dbda22dc115501dd459 - - -] [instance: 01c1011e-5ec7-4e73-afa0-8f839bdfde59] Detach volume bcc54e44-7f3d-41aa-af40-5d3b6108e7e3 from mountpoint /dev/vdc


But from the dumpxml, we can still see the disk attached.

 </disk>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <auth username='cinder'>
        <secret type='ceph' uuid='46ed6c5d-9ae0-44bd-94cc-15e3ec731f99'/>
      </auth>
      <source protocol='rbd' name='volumes/volume-bcc54e44-7f3d-41aa-af40-5d3b6108e7e3'>
        <host name='10.x.x.x' port='6789'/>
        <host name='10.x.x.x' port='6789'/>
        <host name='10.x.x.x' port='6789'/>
      </source>
      <backingStore/>
      <target dev='vdc' bus='virtio'/>
      <serial>bcc54e44-7f3d-41aa-af40-5d3b6108e7e3</serial>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>

On the ceph side, enabled qemu logging with extra debug level for rbd:

The qemu-guest logs show the following:

$ rados -p volumes listwatchers rbd_header.13b4554210e48
watcher=10.8.x.x:0/3011801 client.91015 cookie=140313423562112
watcher=10.8.x.x:0/2016911 client.91021 cookie=140553067563008


2016-01-29 23:00:27.069499 7fd50c7a2cc0 20 librbd: open_image: ictx = 0x7fd511233200 name = 'volume-bcc54e44-7f3d-41aa-af40-5d3b6108e7e3' id = '' snap_name = ''

2016-01-29 23:00:27.075416 7fd50c7a2cc0 10 librbd::ImageWatcher: 0x7fd50f974f40 registering image watcher
2016-01-29 23:00:27.075447 7fd50c7a2cc0  1 -- 10.8.x.x:0/2016911 --> 10.8.128.96:6808/104846 -- osd_op(client.91021.0:6 rbd_header.13b4554210e48 [watch watch cookie 140553067563008] 2.b0e12566 ondisk+write+known_if_redirected e250) v5 -- ?+0 0x7fd50faa0300 con 0x7fd510b24d60
2016-01-29 23:00:27.092122 7fd4618ee700  1 -- 10.8.x.x:0/2016911 <== osd.2 10.8.x.x:6808/104846 3 ==== osd_op_reply(6 rbd_header.13b4554210e48 [watch watch cookie 140553067563008] v250'7 uv1 ondisk = 0) v6 ==== 191+0+0 (844780309 0 0) 0x7fd50fa5b8c0 con 0x7fd510b24d60
2016-02-01 02:34:17.506005 7fd467bfd700  1 -- 10.8.x.x:0/2016911 --> 10.8.x.x:6808/104846 -- osd_op(client.91021.0:37134 rbd_header.13b4554210e48 [watch ping cookie 140553067563008] 2.b0e12566 ondisk+write+known_if_redirected e250) v5 -- ?+0 0x7fd50fa9ec80 con 0x7fd510b24d60
2016-02-01 02:34:17.506046 7fd467bfd700  1 -- 10.8.x.x:0/2016911 --> 10.8.x.x:6808/104846 -- ping magic: 0 v1 -- ?+0 0x7fd511309480 con 0x7fd510b24d60
2016-02-01 02:34:17.506525 7fd4618ee700  1 -- 10.8.x.x:0/2016911 <== osd.2 10.8.x.x:6808/104846 37130 ==== osd_op_reply(37134 rbd_header.13b4554210e48 [watch ping cookie 140553067563008] v0'0 uv1 ondisk = 0) v6 ==== 191+0+0 (370994486 0 0) 0x7fd50fa5b8c0 con 0x7fd510b24d60
 

So there is a watch that is still being kept alive (and the image
is still open). Seems like a detach volume did not close the rbd volume image

Comment 2 shilpa 2016-02-03 06:27:22 UTC
Created attachment 1120635 [details]
Libvirt debug logs during attach and detach volume

Comment 4 Sahid Ferdjaoui 2017-02-15 09:09:53 UTC
Please feel free to reopen with updated information if the problem happen again.

Comment 5 awaugama 2017-09-07 19:13:02 UTC
Closed without a fix therefore QE won't automate

Comment 6 Red Hat Bugzilla 2023-09-14 03:17:02 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days