Bug 1262399

Summary: disk backend is not removed properly when disk frontent hotplug fails
Product: Red Hat Enterprise Linux 7 Reporter: Peter Krempa <pkrempa>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.1CC: aaviram, amureini, bmcclain, dyuan, eblake, hhan, huding, juzhang, ovasik, pkrempa, pzhang, rbalakri, virt-bugs, virt-maint, xfu, xuzhang, yanyang, ylavi
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-1.2.17-9.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1256044
: 1265968 (view as bug list) Environment:
Last Closed: 2015-11-19 06:54:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1172230    

Description Peter Krempa 2015-09-11 14:27:15 UTC
+++ This bug was initially created as a clone of Bug #1256044 +++

Description of problem:
After disk frontend attach fails libvirt uses an incorrect string to unplug the backend that was added previously.

Version-Release number of selected component (if applicable):
libvirt-1.2.17-8.el7.x86_64
qemu-kvm-rhev-2.3.0-22.el7.x86_64

How reproducible:
100% given that qemu disk hotplug fails

--- Additional comment from Peter Krempa on 2015-09-11 15:59:54 CEST ---

I've managed to reproduce the issue by running just one of the reproducer scripts so the issue is not bound to a race between two disks being attached/detached. The issue reproduces after a few attach/detach cycles of the script.

The following communication happens between qemu and libvirtd (annotated )


13:14:39.899+0000 L: buf={"execute":"__com.redhat_drive_add","arguments":{"file":"/var/lib/libvirt/images/a","id":"drive-virtio-disk0","format":"raw","serial":"9f09e0db-cc0b-4a93-826b-c4a3fda0b208","cache":"none","werror":"stop","rerror":"stop","aio":"threads"},"id":"libvirt-856"}
13:14:39.907+0000 Q: buf={"return": {}, "id": "libvirt-856"}
13:14:39.907+0000 L: buf={"execute":"device_add","arguments":{"driver":"virtio-blk-pci","scsi":"off","bus":"pci.0","addr":"0x7","drive":"drive-virtio-disk0","id":"virtio-disk0"},"id":"libvirt-857"}
 # we try to add 'virtio-disk0' yet again but it fails!
13:14:39.912+0000 Q: buf={"id": "libvirt-857", "error": {"class": "GenericError", "desc": "Duplicate ID 'virtio-disk0' for device"}}
 # now we try to delete the backend, but due to a bug in libvirt we use invalid string, so the backend won't be removed.
13:14:39.912+0000 L: buf={"execute":"__com.redhat_drive_del","arguments":{"id":"file=/var/lib/libvirt/images/a,if=none,id=drive-virtio-disk0,format=raw,serial=9f09e0db-cc0b-4a93-826b-c4a3fda0b208,cache=none,werror=stop,rerror=stop,aio=threads"},"id":"libvirt-858"}

The above command uses wrong parameters.

13:14:39.919+0000 Q: buf={"id": "libvirt-858", "error": {"class": "GenericError", "desc": "An undefined error has occurred"}}
13:20:51.685+0000 L: buf={"execute":"__com.redhat_drive_del","arguments":{"id":"drive-virtio-disk0"},"id":"libvirt-859"}
13:20:51.689+0000 Q: buf={"return": {}, "id": "libvirt-859"}
 # After a manual delete of the dangling backend, the script can be restarted and works for a few cycles again.

The libvirt bug described above is caused by bug in commit 8125113c.

Comment 1 Peter Krempa 2015-09-14 07:48:10 UTC
Fixed upstream:

commit 4b4aade59a206b6146d90d87d591386f8fbd68d1
Author: Peter Krempa <pkrempa>
Date:   Fri Sep 11 17:34:18 2015 +0200

    qemu: hotplug: Properly clean up drive backend if frontend hotplug fails
    
    Commit 8125113c added code that should remove the disk backend if the
    fronted hotplug failed for any reason. The code had a bug though as it
    used the disk string for unplug rather than the backend alias. Fix the
    code by pre-creating an alias string and using it instead of the disk
    string. In cases where qemu does not support QEMU_CAPS_DEVICE, we ignore
    the unplug of the backend since we can't really create an alias in that
    case.

Comment 4 Han Han 2015-09-24 02:54:04 UTC
Hi, Peter. 
I test in latest libvirt. After the Error 'unable to execute QEMU command 'device_add': Duplicate ID 'virtio-disk1' occurs, the next attach-device&detach-device is successful. Is it mean the bug solved?

Comment 5 Peter Krempa 2015-09-24 06:16:55 UTC
Yes that is the expected behaviour.

Comment 6 Han Han 2015-09-24 07:05:45 UTC
I can reproduce it in libvirt-1.2.17-8.el7.x86_64.
Verify it in libvirt-1.2.17-10.el7.x86_64:
1. Preparing a healthy running guest
# virsh list 
 Id    Name                           State
----------------------------------------------------
 2     t65                            running

Preparimg a disk xml:
<disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source file='/var/lib/libvirt/images/disk.img'>
        <seclabel model='selinux' labelskip='yes'/>
      </source>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <serial>fc2a7a5e-6e9b-4765-82d3-8bf5f57a643b</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/>
    </disk>


2. Attach&Detach disk device for thousands of loops until error occurs
# for i in {1..10000};do virsh attach-device t65 disk.xml; virsh detach-device t65 disk.xml; done
error: Failed to attach device from disk.xml
error: internal error: unable to execute QEMU command 'device_add': Duplicate ID 'virtio-disk1' for device
error: Failed to detach device from disk.xml
error: operation failed: disk vdb not found

After the error occurs, try to Attach&Detach the disk xml, the error shouldn't appear continuously.
# virsh attach-device t65 disk.xml; virsh detach-device t65 disk.xml;
Device attached successfully

Device detached successfully

Comment 7 Allon Mureinik 2015-09-24 08:25:27 UTC
Peter, I see this patch in the upstream v1.2.18-maint branch, but not in the v1.2.17-maint branch.

Can we please get it backported there too so oVirt on Fedora can use it? Do you need a separate bug for it?
Thanks!

Comment 8 Peter Krempa 2015-09-24 08:33:31 UTC
Please file a fedora bug if you want to backport it to certain fedora versions, since the fedora maintainer needs to pick up the changes.

Comment 10 errata-xmlrpc 2015-11-19 06:54:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html