Bug 1256044

Summary: alias for virtio disk is not removed from internal structures right at the time the DEVICE_DELETED event is delivered
Product: Red Hat Enterprise Linux 7 Reporter: Amit Aviram <aaviram>
Component: qemu-kvm-rhevAssignee: Markus Armbruster <armbru>
Status: CLOSED DUPLICATE QA Contact: CongLi <coli>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: aaviram, amureini, armbru, bmcclain, chayang, derez, dyuan, eblake, hhan, huding, juzhang, knoel, michen, pbonzini, pkrempa, pzhang, rbalakri, snagar, virt-maint, xfu, xuzhang, ylavi
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1262399 (view as bug list) Environment:
Last Closed: 2017-12-05 14:04:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1172230, 1288337, 1310118    
Attachments:
Description Flags
Scripts and logs none

Description Amit Aviram 2015-08-23 15:38:25 UTC
Description of problem:
Multiple detaching and attaching disk devices causes disks to be undetachable.
VM must be powered off for the disk to be detached again.

Version-Release number of selected component (if applicable):
libvirt version: 1.2.8, package: 16.el7_1.3 (x86-024.build.eng.bos.redhat.com)

How reproducible:
100%

Steps to Reproduce:
1. Have a VM with 2 disk devices
2. multiple attach & detach the disks simultaneously.

***
The bug is easily reproduces with 2 scripts, each attaches & detaches a specific disk. running these scripts together causes that error. the scripts used to reproduce the error are attached to the bug.

Actual results:
at some point, the disk can no longer be detached. it is shows in the XMLDesc(), detachDevice() return 0, but nothing happens, it remains attached.

Expected results:
The disk should be detached, or the operation should fail.

Additional info:
It seems that the following log is the one that is related to the origin of the bug:

2015-08-23 15:06:47.761+0000: 11702: error : qemuMonitorJSONCheckError:381 : internal error: unable to execute QEMU command '__com.redhat_drive_del': An undefined error has occurred

after this point, the disk can no longer be detached.

Comment 1 Amit Aviram 2015-08-23 16:08:17 UTC
Created attachment 1066062 [details]
Scripts and logs

Comment 2 Yaniv Lavi 2015-08-25 12:00:28 UTC
Can you add this to the 7.2\3.6 tracking?

Comment 5 Eric Blake 2015-08-27 14:44:00 UTC
Your script calls dom.detachDevice() but does not ensure that the detach completes.  Detaching a device requires guest cooperation, while the command dom.detachDevice() merely returns success if the detach was requested without waiting for it to be completed.  You MUST poll the domain XML to see if the guest has acted on the request before attempting to re-attach another device of the same name.

Comment 6 Eric Blake 2015-08-27 14:45:20 UTC
It might be nice if libvirt added a flag to dom.detachDevice() that waited for the guest to respond before returning control; but commands that require guest cooperation are generally risky as there is no telling how long they might take.  Also, it is possible to register to receive an event when the detach actually happens, instead of having to poll the domain XML.

Comment 7 Eric Blake 2015-08-27 14:45:46 UTC
At this point, I think it is a usage bug, not a libvirt problem.

Comment 8 Amit Aviram 2015-08-27 15:01:26 UTC
But the script does check if the device is still in the xml:

"""
while 'fc2a7a5e-6e9b-4765-82d3-8bf5f57a643b' in dom.XMLDesc():
   print ('2nd device is stuck.')
"""

Note that this checks if the device's serial is still in the domain's XML.

The script might not describe the error well.. at the point that this serial is actually in dom.XMLDesc(), it NEVER gets removed, and even when I kill the script I still see it at dom.XMLDesc()..

It may be that this bug is a qemu bug. maybe there is some problem with the guest and libvirt doesn't get an answer regarding the detach result?

Comment 9 Yang Yang 2015-09-11 09:23:37 UTC
I cannot reproduce it with libvirt-1.2.17-8.el7.x86_64, qemu-kvm-rhev-2.3.0-22.el7.x86_64.

Steps:
in terminal 1
for i in
{1..1000}; do echo try $i times
virsh attach-device virt-tests-vm1 /dev/stdin <<EOF
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/attach.img'/>
<target dev='vdb' bus='virtio'/>
</disk>
EOF

virsh detach-device virt-tests-vm1 /dev/stdin <<EOF
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/attach.img'/>
<target dev='vdb' bus='virtio'/>
</disk>
EOF
done

in terminal 2
for i in {1..1000};

do echo try $i times
virsh attach-device virt-tests-vm1 /dev/stdin <<EOF
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/attach2.img'/>
<target dev='vdc' bus='virtio'/>
</disk>
EOF

virsh detach-device virt-tests-vm1 /dev/stdin <<EOF
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/attach2.img'/>
<target dev='vdc' bus='virtio'/>
</disk>
EOF
done

Attach&detach device work well.

Comment 10 Yang Yang 2015-09-11 10:25:21 UTC
I also cannot reproduce it with the scripts reporter uploaded on libvirt-1.2.17-8.el7.x86_64, qemu-kvm-rhev-2.3.0-22.el7.x86_64.

Comment 11 Peter Krempa 2015-09-11 13:59:54 UTC
I've managed to reproduce the issue by running just one of the reproducer scripts so the issue is not bound to a race between two disks being attached/detached. The issue reproduces after a few attach/detach cycles of the script.

The following communication happens between qemu and libvirtd (annotated )

### one of the "good" attach/detach cycles:
 # add backend:
13:14:39.756+0000 L: buf={"execute":"__com.redhat_drive_add","arguments":{"file":"/var/lib/libvirt/images/a","id":"drive-virtio-disk0","format":"raw","serial":"9f09e0db-cc0b-4a93-826b-c4a3fda0b208","cache":"none","werror":"stop","rerror":"stop","aio":"threads"},"id":"libvirt-844"}
13:14:39.764+0000 Q: buf={"return": {}, "id": "libvirt-844"}
 # add frontend:
13:14:39.764+0000 L: buf={"execute":"device_add","arguments":{"driver":"virtio-blk-pci","scsi":"off","bus":"pci.0","addr":"0x7","drive":"drive-virtio-disk0","id":"virtio-disk0"},"id":"libvirt-845"}
13:14:39.774+0000 Q: buf={"return": {}, "id": "libvirt-845"}
 # query alias list:
13:14:39.775+0000 L: buf={"execute":"qom-list","arguments":{"path":"/machine/peripheral"},"id":"libvirt-846"}
13:14:39.778+0000 Q: buf={"return": [{"name": "virtio-disk0", "type": "child<virtio-blk-pci>"}, {"name": "balloon0", "type": "child<virtio-balloon-pci>"}, {"name": "video0", "type": "child<cirrus-vga>"}, {"name": "ide0-1-0", "type": "child<ide-cd>"}, {"name": "virtio-serial0", "type": "child<virtio-serial-pci>"}, {"name": "scsi0", "type": "child<virtio-scsi-pci>"}, {"name": "usb", "type": "child<piix3-usb-uhci>"}, {"name": "type", "type": "string"}], "id": "libvirt-846"}
 # unplug frontend:
13:14:39.779+0000 L: buf={"execute":"device_del","arguments":{"id":"virtio-disk0"},"id":"libvirt-847"}
13:14:39.789+0000 Q: buf={"return": {}, "id": "libvirt-847"}
 # get DEVICE_DELETED event for the unplug:
13:14:39.815+0000 Q: buf={"timestamp": {"seconds": 1441890879, "microseconds": 815331}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio-disk0/virtio-backend"}}
13:14:39.816+0000 Q: buf={"timestamp": {"seconds": 1441890879, "microseconds": 816058}, "event": "DEVICE_DELETED", "data": {"device": "virtio-disk0", "path": "/machine/peripheral/virtio-disk0"}}
 # try to unplug disk backend ...
13:14:39.816+0000 L: buf={"execute":"__com.redhat_drive_del","arguments":{"id":"drive-virtio-disk0"},"id":"libvirt-848"}
 # unplugging of backend fails, since frontend removes the backend too. Note that the downstream impl is broken since it tries to use the HMP implementation which does not properly report errors so that they are propagated via QMP. We ignore this error, and claim successful detach.
13:14:39.820+0000 Q: buf={"id": "libvirt-848", "error": {"class": "GenericError", "desc": "An undefined error has occurred"}}
 # refresh alias list
13:14:39.821+0000 L: buf={"execute":"qom-list","arguments":{"path":"/machine/peripheral"},"id":"libvirt-849"}
13:14:39.824+0000 Q: buf={"return": [{"name": "balloon0", "type": "child<virtio-balloon-pci>"}, {"name": "video0", "type": "child<cirrus-vga>"}, {"name": "ide0-1-0", "type": "child<ide-cd>"}, {"name": "virtio-serial0", "type": "child<virtio-serial-pci>"}, {"name": "scsi0", "type": "child<virtio-scsi-pci>"}, {"name": "usb", "type": "child<piix3-usb-uhci>"}, {"name": "type", "type": "string"}], "id": "libvirt-849"}
 #done

 # NEXT CYCLE
13:14:39.831+0000 L: buf={"execute":"__com.redhat_drive_add","arguments":{"file":"/var/lib/libvirt/images/a","id":"drive-virtio-disk0","format":"raw","serial":"9f09e0db-cc0b-4a93-826b-c4a3fda0b208","cache":"none","werror":"stop","rerror":"stop","aio":"threads"},"id":"libvirt-850"}
13:14:39.839+0000 Q: buf={"return": {}, "id": "libvirt-850"}
13:14:39.840+0000 L: buf={"execute":"device_add","arguments":{"driver":"virtio-blk-pci","scsi":"off","bus":"pci.0","addr":"0x7","drive":"drive-virtio-disk0","id":"virtio-disk0"},"id":"libvirt-851"}
13:14:39.849+0000 Q: buf={"return": {}, "id": "libvirt-851"}
13:14:39.850+0000 L: buf={"execute":"qom-list","arguments":{"path":"/machine/peripheral"},"id":"libvirt-852"}
13:14:39.853+0000 Q: buf={"return": [{"name": "virtio-disk0", "type": "child<virtio-blk-pci>"}, {"name": "balloon0", "type": "child<virtio-balloon-pci>"}, {"name": "video0", "type": "child<cirrus-vga>"}, {"name": "ide0-1-0", "type": "child<ide-cd>"}, {"name": "virtio-serial0", "type": "child<virtio-serial-pci>"}, {"name": "scsi0", "type": "child<virtio-scsi-pci>"}, {"name": "usb", "type": "child<piix3-usb-uhci>"}, {"name": "type", "type": "string"}], "id": "libvirt-852"}
13:14:39.854+0000 L: buf={"execute":"device_del","arguments":{"id":"virtio-disk0"},"id":"libvirt-853"}
13:14:39.863+0000 Q: buf={"return": {}, "id": "libvirt-853"}
13:14:39.882+0000 Q: buf={"timestamp": {"seconds": 1441890879, "microseconds": 882492}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio-disk0/virtio-backend"}}
13:14:39.883+0000 Q: buf={"timestamp": {"seconds": 1441890879, "microseconds": 883033}, "event": "DEVICE_DELETED", "data": {"device": "virtio-disk0", "path": "/machine/peripheral/virtio-disk0"}}

 #Note that we've received the DEVICE_DELETED event ...

13:14:39.883+0000 L: buf={"execute":"__com.redhat_drive_del","arguments":{"id":"drive-virtio-disk0"},"id":"libvirt-854"}
13:14:39.887+0000 Q: buf={"id": "libvirt-854", "error": {"class": "GenericError", "desc": "An undefined error has occurred"}}
13:14:39.889+0000 L: buf={"execute":"qom-list","arguments":{"path":"/machine/peripheral"},"id":"libvirt-855"}
13:14:39.891+0000 Q: buf={"return": [{"name": "balloon0", "type": "child<virtio-balloon-pci>"}, {"name": "video0", "type": "child<cirrus-vga>"}, {"name": "ide0-1-0", "type": "child<ide-cd>"}, {"name": "virtio-serial0", "type": "child<virtio-serial-pci>"}, {"name": "scsi0", "type": "child<virtio-scsi-pci>"}, {"name": "usb", "type": "child<piix3-usb-uhci>"}, {"name": "type", "type": "string"}], "id": "libvirt-855"}

 # and virtio-disk-0 is not among the devices, but ..

 # NEXT cycle (the one that fails)

13:14:39.899+0000 L: buf={"execute":"__com.redhat_drive_add","arguments":{"file":"/var/lib/libvirt/images/a","id":"drive-virtio-disk0","format":"raw","serial":"9f09e0db-cc0b-4a93-826b-c4a3fda0b208","cache":"none","werror":"stop","rerror":"stop","aio":"threads"},"id":"libvirt-856"}
13:14:39.907+0000 Q: buf={"return": {}, "id": "libvirt-856"}
13:14:39.907+0000 L: buf={"execute":"device_add","arguments":{"driver":"virtio-blk-pci","scsi":"off","bus":"pci.0","addr":"0x7","drive":"drive-virtio-disk0","id":"virtio-disk0"},"id":"libvirt-857"}
 # we try to add 'virtio-disk0' yet again but it fails!
13:14:39.912+0000 Q: buf={"id": "libvirt-857", "error": {"class": "GenericError", "desc": "Duplicate ID 'virtio-disk0' for device"}}
 # now we try to delete the backend, but due to a bug in libvirt we use invalid string, so the backend won't be removed.
13:14:39.912+0000 L: buf={"execute":"__com.redhat_drive_del","arguments":{"id":"file=/var/lib/libvirt/images/a,if=none,id=drive-virtio-disk0,format=raw,serial=9f09e0db-cc0b-4a93-826b-c4a3fda0b208,cache=none,werror=stop,rerror=stop,aio=threads"},"id":"libvirt-858"}
13:14:39.919+0000 Q: buf={"id": "libvirt-858", "error": {"class": "GenericError", "desc": "An undefined error has occurred"}}
13:20:51.685+0000 L: buf={"execute":"__com.redhat_drive_del","arguments":{"id":"drive-virtio-disk0"},"id":"libvirt-859"}
13:20:51.689+0000 Q: buf={"return": {}, "id": "libvirt-859"}
 # After a manual delete of the dangling backend, the script can be restarted and works for a few cycles again.

The conclusion from the above log is that QEMU has a race window where it fires the DEVICE_DELETED event but it does not yet free the device alias from it's internal structures. In a tight loop as used by the reproducer script (causeError.py, in the archive attached)

The libvirt bug described above is caused by bug in commit 8125113c.

Comment 12 Peter Krempa 2015-09-11 14:01:36 UTC
Tested with:
libvirt-1.2.17-8.el7.x86_64
qemu-kvm-rhev-2.3.0-22.el7.x86_64

Comment 13 Peter Krempa 2015-09-11 14:15:36 UTC
Finishing the sentence from comment 11:
In a tight loop as used by the reproducer script (causeError.py, in the archive attached) libvirt sometimes hits the window and attach of a disk with same alias gets rejected.

Adding a short sleep to the original reproducer allows the script to complete all 1000 plug/unplug cycles. 

Reassigning to qemu and lowering severity since a tight loop of attach/detaching disks is not really a viable use case. I'll clone this bug to track the bug in the cleanup path in libvirt that does not correctly remove the backend so that after the failure to attach we will be able to recover.

Comment 15 Markus Armbruster 2015-09-18 17:33:52 UTC
Excellent analysis, Peter.

Clue: device_add fails with "Duplicate ID 'virtio-disk0' for device".
For this to happen, the QemuOpts with this ID must still exist.

Precondition for that: DEVICE_DELETED gets sent before the QemuOpts
gets deleted.  Is this the case?  Turns out it is.  Instead of
demonstrating it with virtio-blk-pci, let me demonstrate with scsi-hd
disk, because that's *much* simpler to understand: (1) it's only one
device (virtio-blk-pci is actually two below the hood, virtio-blk-pci
and virtio-blk-device), and (2) it doesn't do the PCI hot unplug ACPI
dance.  Here's my reproducer:

* Run QEMU under GDB.  I use latest upstream here, shouldn't matter.

* Set a breakpoint on qapi_event_send_device_deleted() and on
  drive_info_del().

* Start a guest.

* Once the guest has booted, run in the monitor:

    (qemu) device_add virtio-scsi-pci,id=scsi0
    (qemu) drive_add "" if=none,file=tmp.qcow2,id=dr0
    OK
    (qemu) device_add scsi-hd,drive=dr0,id=sd0
    (qemu) device_del sd0

  GDB breaks at qapi_event_send_device_deleted(), backtrace appended.

* Continue.  GDB breaks at drive_info_del(), backtrace appended.

  Note this is still within the same object_unparent() call.  Shows
  that we send the event *before* we delete the QemuOpts.  If you
  manage to get a 

Now we have to figure out:

* How can the next device_add get in between?  I suspect it can only
  with a PCI unplug.

* How can we fix this?

  - Can we prevent a device_add getting in between?

  - Can we delay the DEVICE_DELETED until the QemuOpts is gone?

  - Can we move deletion of QemuOpts back in time, before sending
    DEVICE_DELETED?


First backtrace, cut off after qmp_device_del():
#0  0x0000555555a57ef1 in qapi_event_send_device_deleted (has_device=true, device=0x5555575ecb40 "sd0", path=0x555557269b50 "/machine/peripheral/sd0", errp=0x555556411ea8 <error_abort>) at qapi-event.c:831
#1  0x0000555555816474 in device_unparent (obj=0x555556c41840)
    at /work/armbru/qemu/hw/core/qdev.c:1242
#2  0x000055555596549c in object_finalize_child_property (obj=0x5555564b5d00, name=0x555557269ad0 "sd0", opaque=0x555556c41840)
    at /work/armbru/qemu/qom/object.c:1234
#3  0x0000555555964955 in object_property_del (obj=0x5555564b5d00, name=0x555557269ad0 "sd0", errp=0x0) at /work/armbru/qemu/qom/object.c:928
#4  0x000055555596367f in object_property_del_child (obj=0x5555564b5d00, child=0x555556c41840, errp=0x0) at /work/armbru/qemu/qom/object.c:384
#5  0x00005555559636cc in object_unparent (obj=0x555556c41840)
    at /work/armbru/qemu/qom/object.c:393
#6  0x0000555555813f05 in qdev_simple_device_unplug_cb (hotplug_dev=0x5555564b2360, dev=0x555556c41840, errp=0x7fffffffc9a8)
    at /work/armbru/qemu/hw/core/qdev.c:344
#7  0x00005555556b0699 in virtio_scsi_hotunplug (hotplug_dev=0x5555564b2360, dev=0x555556c41840, errp=0x7fffffffc9a8)
    at /work/armbru/qemu/hw/scsi/virtio-scsi.c:809
#8  0x0000555555819908 in hotplug_handler_unplug (plug_handler=0x5555564b2360, plugged_dev=0x555556c41840, errp=0x7fffffffc9a8)
    at /work/armbru/qemu/hw/core/hotplug.c:44
#9  0x0000555555813dc4 in qdev_unplug (dev=0x555556c41840, errp=0x7fffffffc9a8)
    at /work/armbru/qemu/hw/core/qdev.c:303
#10 0x000055555576be85 in qmp_device_del (id=0x555557962100 "sd0", errp=0x7fffffffc9a8) at /work/armbru/qemu/qdev-monitor.c:801

Second backtrace, cut off after qmp_device_del():
#0  0x0000555555a026e4 in drive_info_del (dinfo=0x555557814860)
    at /work/armbru/qemu/block/block-backend.c:159
#1  0x0000555555a026c9 in blk_delete (blk=0x555556b545a0)
    at /work/armbru/qemu/block/block-backend.c:153
#2  0x0000555555a0279d in blk_unref (blk=0x555556b545a0)
    at /work/armbru/qemu/block/block-backend.c:186
#3  0x0000555555762560 in blockdev_auto_del (blk=0x555556b545a0)
    at /work/armbru/qemu/blockdev.c:144
#4  0x0000555555820a8b in release_drive (obj=0x555556c41840, name=0x5555564b2600 "drive", opaque=0x555555f446c0 <scsi_hd_properties>)
    at /work/armbru/qemu/hw/core/qdev-properties-system.c:105
#5  0x00005555559635d6 in object_property_del_all (obj=0x555556c41840)
    at /work/armbru/qemu/qom/object.c:368
#6  0x000055555596375c in object_finalize (data=0x555556c41840)
    at /work/armbru/qemu/qom/object.c:413
#7  0x000055555596460f in object_unref (obj=0x555556c41840)
    at /work/armbru/qemu/qom/object.c:848
#8  0x00005555559654b4 in object_finalize_child_property (obj=0x5555564b5d00, name=0x555557269ad0 "sd0", opaque=0x555556c41840)
    at /work/armbru/qemu/qom/object.c:1237
#9  0x0000555555964955 in object_property_del (obj=0x5555564b5d00, name=0x555557269ad0 "sd0", errp=0x0) at /work/armbru/qemu/qom/object.c:928
x555556c41840, errp=0x0) at /work/armbru/qemu/qom/object.c:384
#11 0x00005555559636cc in object_unparent (obj=0x555556c41840)
    at /work/armbru/qemu/qom/object.c:393
#12 0x0000555555813f05 in qdev_simple_device_unplug_cb (hotplug_dev=0x5555564b2360, dev=0x555556c41840, errp=0x7fffffffc9a8)
    at /work/armbru/qemu/hw/core/qdev.c:344
#13 0x00005555556b0699 in virtio_scsi_hotunplug (hotplug_dev=0x5555564b2360, dev=0x555556c41840, errp=0x7fffffffc9a8)
    at /work/armbru/qemu/hw/scsi/virtio-scsi.c:809
#14 0x0000555555819908 in hotplug_handler_unplug (plug_handler=0x5555564b2360, plugged_dev=0x555556c41840, errp=0x7fffffffc9a8)
    at /work/armbru/qemu/hw/core/hotplug.c:44
#15 0x0000555555813dc4 in qdev_unplug (dev=0x555556c41840, errp=0x7fffffffc9a8)
    at /work/armbru/qemu/hw/core/qdev.c:303
#16 0x000055555576be85 in qmp_device_del (id=0x555557962100 "sd0", errp=0x7fffffffc9a8) at /work/armbru/qemu/qdev-monitor.c:801

Comment 16 Han Han 2015-09-24 02:52:39 UTC
(In reply to yangyang from comment #10)
> I also cannot reproduce it with the scripts reporter uploaded on
> libvirt-1.2.17-8.el7.x86_64, qemu-kvm-rhev-2.3.0-22.el7.x86_64.

The python scripts in attachment seems reproduce the error more easily. I try python scripts, in average every 1000 attach-detach loops can get a 'Duplicate ID' error. But for shell scripts with the same device xml, I have never reproduce it with latest libvirt.

Comment 20 Paolo Bonzini 2015-10-05 11:40:47 UTC
Michael had reported this to me and we had started discussing it, so I can take it too.

Comment 26 Ademar Reis 2016-03-16 16:49:49 UTC
Latest upstream patches, waiting for review: http://lists.gnu.org/archive/html/qemu-devel/2016-02/msg04978.html

Comment 27 Markus Armbruster 2016-03-23 15:10:49 UTC
Patches have been reviewed, general approach looks good, but v2 needed.

Peter Krempa thinks our fixing of libvirt bug 1262399 makes hitting this bug unlikely.

Comment 28 Peter Krempa 2016-03-24 09:07:06 UTC
(In reply to Markus Armbruster from comment #27)
> Patches have been reviewed, general approach looks good, but v2 needed.
> 
> Peter Krempa thinks our fixing of libvirt bug 1262399 makes hitting this bug
> unlikely.

I think that the bug itself is very unlikely to cause problems as it's very unlikely for users to repeatedly unplug and plug disks in a very tight loop.

The libvirt fix allows the operation to be retried "later" after qemu rejected one of the hotplug attempts. Previously we would desync the state and the operation would fail forever after the first occurence.

In conclusion, the bug is VERY unlikely to be hit in the first place, and the operation can be successfuly finished by retrying now that the libvirt bug mentioned above is fixed.

Comment 29 Yaniv Lavi 2016-03-24 09:52:27 UTC
(In reply to Peter Krempa from comment #28)
> (In reply to Markus Armbruster from comment #27)
> > Patches have been reviewed, general approach looks good, but v2 needed.
> > 
> > Peter Krempa thinks our fixing of libvirt bug 1262399 makes hitting this bug
> > unlikely.
> 
> I think that the bug itself is very unlikely to cause problems as it's very
> unlikely for users to repeatedly unplug and plug disks in a very tight loop.

This came up from backup partner which does exactly this.

> 
> The libvirt fix allows the operation to be retried "later" after qemu
> rejected one of the hotplug attempts. Previously we would desync the state
> and the operation would fail forever after the first occurence.
> 
> In conclusion, the bug is VERY unlikely to be hit in the first place, and
> the operation can be successfuly finished by retrying now that the libvirt
> bug mentioned above is fixed.

Comment 30 Amit Aviram 2016-03-27 06:40:23 UTC
(In reply to Peter Krempa from comment #28)
> (In reply to Markus Armbruster from comment #27)
> > Patches have been reviewed, general approach looks good, but v2 needed.
> > 
> > Peter Krempa thinks our fixing of libvirt bug 1262399 makes hitting this bug
> > unlikely.
> 
> I think that the bug itself is very unlikely to cause problems as it's very
> unlikely for users to repeatedly unplug and plug disks in a very tight loop.

> 
> The libvirt fix allows the operation to be retried "later" after qemu
> rejected one of the hotplug attempts. Previously we would desync the state
> and the operation would fail forever after the first occurence.

As Allon said before, the loop that is being demonstrated here just helps us to have this bug reproduced easily- but the real issue is that once in a while our partners experiencing the race described above, when performing a bunch of attaching and detaching actions. In big setups of RHEV, which are typical in partner's environments- this scenario is not that unlikely, unfortunately..

> 
> In conclusion, the bug is VERY unlikely to be hit in the first place, and
> the operation can be successfuly finished by retrying now that the libvirt
> bug mentioned above is fixed.

Comment 31 Amit Aviram 2016-03-27 09:09:38 UTC
(In reply to Peter Krempa from comment #11)
> I've managed to reproduce the issue by running just one of the reproducer
> scripts so the issue is not bound to a race between two disks being
> attached/detached. The issue reproduces after a few attach/detach cycles of
> the script.
> 
> The following communication happens between qemu and libvirtd (annotated )
> 
> ### one of the "good" attach/detach cycles:
>  # add backend:
> 13:14:39.756+0000 L:
> buf={"execute":"__com.redhat_drive_add","arguments":{"file":"/var/lib/
> libvirt/images/a","id":"drive-virtio-disk0","format":"raw","serial":
> "9f09e0db-cc0b-4a93-826b-c4a3fda0b208","cache":"none","werror":"stop",
> "rerror":"stop","aio":"threads"},"id":"libvirt-844"}
> 13:14:39.764+0000 Q: buf={"return": {}, "id": "libvirt-844"}
>  # add frontend:
> 13:14:39.764+0000 L:
> buf={"execute":"device_add","arguments":{"driver":"virtio-blk-pci","scsi":
> "off","bus":"pci.0","addr":"0x7","drive":"drive-virtio-disk0","id":"virtio-
> disk0"},"id":"libvirt-845"}
> 13:14:39.774+0000 Q: buf={"return": {}, "id": "libvirt-845"}
>  # query alias list:
> 13:14:39.775+0000 L:
> buf={"execute":"qom-list","arguments":{"path":"/machine/peripheral"},"id":
> "libvirt-846"}
> 13:14:39.778+0000 Q: buf={"return": [{"name": "virtio-disk0", "type":
> "child<virtio-blk-pci>"}, {"name": "balloon0", "type":
> "child<virtio-balloon-pci>"}, {"name": "video0", "type":
> "child<cirrus-vga>"}, {"name": "ide0-1-0", "type": "child<ide-cd>"},
> {"name": "virtio-serial0", "type": "child<virtio-serial-pci>"}, {"name":
> "scsi0", "type": "child<virtio-scsi-pci>"}, {"name": "usb", "type":
> "child<piix3-usb-uhci>"}, {"name": "type", "type": "string"}], "id":
> "libvirt-846"}
>  # unplug frontend:
> 13:14:39.779+0000 L:
> buf={"execute":"device_del","arguments":{"id":"virtio-disk0"},"id":"libvirt-
> 847"}
> 13:14:39.789+0000 Q: buf={"return": {}, "id": "libvirt-847"}
>  # get DEVICE_DELETED event for the unplug:
> 13:14:39.815+0000 Q: buf={"timestamp": {"seconds": 1441890879,
> "microseconds": 815331}, "event": "DEVICE_DELETED", "data": {"path":
> "/machine/peripheral/virtio-disk0/virtio-backend"}}
> 13:14:39.816+0000 Q: buf={"timestamp": {"seconds": 1441890879,
> "microseconds": 816058}, "event": "DEVICE_DELETED", "data": {"device":
> "virtio-disk0", "path": "/machine/peripheral/virtio-disk0"}}
>  # try to unplug disk backend ...
> 13:14:39.816+0000 L:
> buf={"execute":"__com.redhat_drive_del","arguments":{"id":"drive-virtio-
> disk0"},"id":"libvirt-848"}
>  # unplugging of backend fails, since frontend removes the backend too. Note
> that the downstream impl is broken since it tries to use the HMP
> implementation which does not properly report errors so that they are
> propagated via QMP. We ignore this error, and claim successful detach.
> 13:14:39.820+0000 Q: buf={"id": "libvirt-848", "error": {"class":
> "GenericError", "desc": "An undefined error has occurred"}}
>  # refresh alias list
> 13:14:39.821+0000 L:
> buf={"execute":"qom-list","arguments":{"path":"/machine/peripheral"},"id":
> "libvirt-849"}
> 13:14:39.824+0000 Q: buf={"return": [{"name": "balloon0", "type":
> "child<virtio-balloon-pci>"}, {"name": "video0", "type":
> "child<cirrus-vga>"}, {"name": "ide0-1-0", "type": "child<ide-cd>"},
> {"name": "virtio-serial0", "type": "child<virtio-serial-pci>"}, {"name":
> "scsi0", "type": "child<virtio-scsi-pci>"}, {"name": "usb", "type":
> "child<piix3-usb-uhci>"}, {"name": "type", "type": "string"}], "id":
> "libvirt-849"}
>  #done
> 
>  # NEXT CYCLE
> 13:14:39.831+0000 L:
> buf={"execute":"__com.redhat_drive_add","arguments":{"file":"/var/lib/
> libvirt/images/a","id":"drive-virtio-disk0","format":"raw","serial":
> "9f09e0db-cc0b-4a93-826b-c4a3fda0b208","cache":"none","werror":"stop",
> "rerror":"stop","aio":"threads"},"id":"libvirt-850"}
> 13:14:39.839+0000 Q: buf={"return": {}, "id": "libvirt-850"}
> 13:14:39.840+0000 L:
> buf={"execute":"device_add","arguments":{"driver":"virtio-blk-pci","scsi":
> "off","bus":"pci.0","addr":"0x7","drive":"drive-virtio-disk0","id":"virtio-
> disk0"},"id":"libvirt-851"}
> 13:14:39.849+0000 Q: buf={"return": {}, "id": "libvirt-851"}
> 13:14:39.850+0000 L:
> buf={"execute":"qom-list","arguments":{"path":"/machine/peripheral"},"id":
> "libvirt-852"}
> 13:14:39.853+0000 Q: buf={"return": [{"name": "virtio-disk0", "type":
> "child<virtio-blk-pci>"}, {"name": "balloon0", "type":
> "child<virtio-balloon-pci>"}, {"name": "video0", "type":
> "child<cirrus-vga>"}, {"name": "ide0-1-0", "type": "child<ide-cd>"},
> {"name": "virtio-serial0", "type": "child<virtio-serial-pci>"}, {"name":
> "scsi0", "type": "child<virtio-scsi-pci>"}, {"name": "usb", "type":
> "child<piix3-usb-uhci>"}, {"name": "type", "type": "string"}], "id":
> "libvirt-852"}
> 13:14:39.854+0000 L:
> buf={"execute":"device_del","arguments":{"id":"virtio-disk0"},"id":"libvirt-
> 853"}
> 13:14:39.863+0000 Q: buf={"return": {}, "id": "libvirt-853"}
> 13:14:39.882+0000 Q: buf={"timestamp": {"seconds": 1441890879,
> "microseconds": 882492}, "event": "DEVICE_DELETED", "data": {"path":
> "/machine/peripheral/virtio-disk0/virtio-backend"}}
> 13:14:39.883+0000 Q: buf={"timestamp": {"seconds": 1441890879,
> "microseconds": 883033}, "event": "DEVICE_DELETED", "data": {"device":
> "virtio-disk0", "path": "/machine/peripheral/virtio-disk0"}}
> 
>  #Note that we've received the DEVICE_DELETED event ...
> 
> 13:14:39.883+0000 L:
> buf={"execute":"__com.redhat_drive_del","arguments":{"id":"drive-virtio-
> disk0"},"id":"libvirt-854"}
> 13:14:39.887+0000 Q: buf={"id": "libvirt-854", "error": {"class":
> "GenericError", "desc": "An undefined error has occurred"}}
> 13:14:39.889+0000 L:
> buf={"execute":"qom-list","arguments":{"path":"/machine/peripheral"},"id":
> "libvirt-855"}
> 13:14:39.891+0000 Q: buf={"return": [{"name": "balloon0", "type":
> "child<virtio-balloon-pci>"}, {"name": "video0", "type":
> "child<cirrus-vga>"}, {"name": "ide0-1-0", "type": "child<ide-cd>"},
> {"name": "virtio-serial0", "type": "child<virtio-serial-pci>"}, {"name":
> "scsi0", "type": "child<virtio-scsi-pci>"}, {"name": "usb", "type":
> "child<piix3-usb-uhci>"}, {"name": "type", "type": "string"}], "id":
> "libvirt-855"}
> 
>  # and virtio-disk-0 is not among the devices, but ..
> 
>  # NEXT cycle (the one that fails)
> 
> 13:14:39.899+0000 L:
> buf={"execute":"__com.redhat_drive_add","arguments":{"file":"/var/lib/
> libvirt/images/a","id":"drive-virtio-disk0","format":"raw","serial":
> "9f09e0db-cc0b-4a93-826b-c4a3fda0b208","cache":"none","werror":"stop",
> "rerror":"stop","aio":"threads"},"id":"libvirt-856"}
> 13:14:39.907+0000 Q: buf={"return": {}, "id": "libvirt-856"}
> 13:14:39.907+0000 L:
> buf={"execute":"device_add","arguments":{"driver":"virtio-blk-pci","scsi":
> "off","bus":"pci.0","addr":"0x7","drive":"drive-virtio-disk0","id":"virtio-
> disk0"},"id":"libvirt-857"}
>  # we try to add 'virtio-disk0' yet again but it fails!
> 13:14:39.912+0000 Q: buf={"id": "libvirt-857", "error": {"class":
> "GenericError", "desc": "Duplicate ID 'virtio-disk0' for device"}}
>  # now we try to delete the backend, but due to a bug in libvirt we use
> invalid string, so the backend won't be removed.
> 13:14:39.912+0000 L:
> buf={"execute":"__com.redhat_drive_del","arguments":{"id":"file=/var/lib/
> libvirt/images/a,if=none,id=drive-virtio-disk0,format=raw,serial=9f09e0db-
> cc0b-4a93-826b-c4a3fda0b208,cache=none,werror=stop,rerror=stop,aio=threads"},
> "id":"libvirt-858"}
> 13:14:39.919+0000 Q: buf={"id": "libvirt-858", "error": {"class":
> "GenericError", "desc": "An undefined error has occurred"}}
> 13:20:51.685+0000 L:
> buf={"execute":"__com.redhat_drive_del","arguments":{"id":"drive-virtio-
> disk0"},"id":"libvirt-859"}
> 13:20:51.689+0000 Q: buf={"return": {}, "id": "libvirt-859"}
>  # After a manual delete of the dangling backend, the script can be
> restarted and works for a few cycles again.
> 
> The conclusion from the above log is that QEMU has a race window where it
> fires the DEVICE_DELETED event but it does not yet free the device alias
> from it's internal structures. In a tight loop as used by the reproducer
> script (causeError.py, in the archive attached)
> 
> The libvirt bug described above is caused by bug in commit 8125113c.


Peter, Thanks for the deep analysis.

IIUC, after solving Bug 1262399, libvirt won't block a new try for plugging a disk after failed attempt, as before the fix the internal structures in libvirt regarding this disk was not cleaned as it should.
This is a great fix- however, reading your analysis I'm understanding that detaching a disk will always work, even if QEMU will fail removing the disk- as any error is being ignored.
If that is the case, that means inconsistency, where libvirt is not aware of a plugged disk in qemu. shouldn't a failed return value from detaching, prevent libvirt from deleting the disk in the first place? keeping it in its structures?

Comment 32 Peter Krempa 2016-04-12 13:16:08 UTC
(In reply to Amit Aviram from comment #31)

[...]

> IIUC, after solving Bug 1262399, libvirt won't block a new try for plugging
> a disk after failed attempt, as before the fix the internal structures in
> libvirt regarding this disk was not cleaned as it should.

Yes a new attempt to plug the disk after qemu clears the old device from it's structures will now work after the race condition in qemu clears.

The issue with libvirt was that it did not clean up after the failed hotplug attempt as disk hotplug is a two step operation and the failure happened in the second step. The first step that adds the backend for the disk was not cleaned up.

> This is a great fix- however, reading your analysis I'm understanding that
> detaching a disk will always work, even if QEMU will fail removing the disk-
> as any error is being ignored.

That is not true. Libvirt will remove the disk from the internal structures only after it receives the DEVICE_DELETED event from qemu. This approach was present even prior to this BZ. Until the event is received, libvirt still keeps the disk in the definition.

The issue is just wrong ordering of the DEVICE_DELETED event from qemu and qemu actually clearing the device name thus after it was ejected from the guest. The issue happened as libvirt tried to reuse the device name used previously.

> If that is the case, that means inconsistency, where libvirt is not aware of
> a plugged disk in qemu. shouldn't a failed return value from detaching,
> prevent libvirt from deleting the disk in the first place? keeping it in its
> structures?

If I understand this correctly then the previous paragraph should clear that. The only place where errors are ignored are in the cleanup path of the first step of the hotplug operation if the second step fails. This was the reason why this was not caught in the first place. Since a failure in a cleanup path after a failed operation can't make anything worse the semantics is unfortunately correct.

Comment 33 Markus Armbruster 2016-04-12 14:06:51 UTC
Peter is correct: this bug is about slightly premature emission of DEVICE_DELETED, which opens a race window where an attempt to reuse the same ID fails.  Workaround: try again, possibly after waiting a little.  See comment#19.

Comment 41 Markus Armbruster 2017-11-28 16:33:08 UTC
Some work around DEVICE_DELETED landed in 2.11 (commit f7b879e07), which we backported to fix bug 1445460.  I suspect this bug shares the same root cause, and got fixed, too.  Can you still reproduce it with qemu-kvm-rhev-2.10.0-8.el7 or later?  If yes, I'm wrong.

Comment 42 Allon Mureinik 2017-11-28 16:37:21 UTC
Amit is no longer with Red Har, unfortunately.
Daniel - can you address Markus' needinfo please?

Comment 43 Daniel Erez 2017-12-05 11:29:03 UTC
(In reply to Markus Armbruster from comment #41)
> Some work around DEVICE_DELETED landed in 2.11 (commit f7b879e07), which we
> backported to fix bug 1445460.  I suspect this bug shares the same root
> cause, and got fixed, too.  Can you still reproduce it with
> qemu-kvm-rhev-2.10.0-8.el7 or later?  If yes, I'm wrong.

IIUC the scenario, seems to work with qemu-kvm-ev-2.9.0-16.el7_4.8.1.x86_64

Comment 44 Markus Armbruster 2017-12-05 14:04:27 UTC
Thanks, Daniel!  Closing as duplicate.

*** This bug has been marked as a duplicate of bug 1445460 ***