Red Hat Bugzilla – Bug 807023
libvirt does not check for successful device_del
Last modified: 2016-04-26 11:15:21 EDT
Description of problem: libvirt appears to always assume a device_del is successful. This is not always the case. Issuing a device_del simply registers a request to eject the device with the guest, it doesn't guarantee removal. There are numerous reasons a guest may fail to release the device, however, when this happens libvirt does not report error and continues as if the removal was successful. In my case I see this with a tg3 devices assigned to the guest. The guest driver attempts to do a power state transition on the device, which doesn't work, and gets stuck. The guest never calls the eject method for the device, but libvirt continues to unbind the device from pci-stub and pretend the removal was successful. We need to come up with some strategy of libvirt polling the devices attached to the guest after the device_del, checking whether the removal was successful. Version-Release number of selected component (if applicable): libvirt-0.9.10-5.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. Assign a tg3 device to a guest 2. Bring up the interface in the guest 3. Try to remove the device Actual results: libvirt reports success and unbinds the device from pci-stub, meanwhile the guest still owns the device (/proc/pid/fd still reports all the device file descriptors open). Expected results: libvirt checks for device removal and reports error after some timeout and doesn't unbind device from pci-stub. Additional info:
This seems like a situation that's going to be difficult for libvirt to resolve without additional information from qemu. Jiri, what do you think?
It won't be easy even with additional information from qemu. We would need to change detach API to be just a request for detaching and generate an event (emitted by qemu) when the device gets detached. However, this would mean the semantics of the api changed so it might be difficult to deal with, although one can argue that it didn't work with current semantics anyway.
Reproduce with steps 1. Start a guest with the following hostdev device ... <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address bus='5' slot='0' function='0'/> </source> </hostdev> ... 2. Start a guest 3. At the same time with step2, detach device from the guest asap # virsh detach-device guest pci.xml # cat pci.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address bus='5' slot='0' function='0'/> </source> </hostdev> 4. ll /proc/[pid of guest]/fd Result: In step3, libvirt report that detach-device succeed, but in step4, we still can see the device info ... 13 -> /system/devices/pci0000:00/0000:00:1c.4/0000:05:00.0/resource0 24 -> /system/devices/pci0000:00/0000:00:1c.4/0000:05:00.0/config ...
also related: Bug 848955
Until the guest acts upon device_del command, qemu keeps the device listed. If a managing process wants to poll for the success of the asynchronous command, it can. # virsh -c qemu+tcp://localhost/system qemu-monitor-command vm1 --hmp info network Devices not on any VLAN: net0: model=virtio-net-pci,macaddr=00:1a:4a:16:01:b0 peer=hostnet0 hostnet1: fd=33 peer=net1 net1: model=virtio-net-pci,macaddr=00:1a:4a:16:01:60 peer=hostnet1 However, libvirt hides the fact that updateDevice command hasn't reached completion, and drops the still-residing device from the domxml. That's bad, as it denies crucial information from vdsm. # virsh -c qemu+tcp://localhost/system dumpxml vm1 ... <devices> ... <interface type='bridge'> <mac address='00:1a:4a:16:01:60'/> <source bridge='ovirtmgmt'/> <target dev='vnet0'/> <model type='virtio'/> <link state='up'/> <alias name='net1'/> </interface> ... </devices> Libvirt should let its client check whether the async command has finished. A nice bonus would be an event emitted when the device is released in qemu.
*** Bug 958786 has been marked as a duplicate of this bug. ***
This is not implemented usptream (v1.1.0-278-g0dfb8a1).
s/not/now/ :-P
I seem to be experiencing the same issue with our openstack implementation. If we don't put the disk as "offline" before detaching it, after some attach/detach iterations, it will simply refuse to attach the volume until the qemu process is restarted. I've successfully recreated this issue many times with Windows 7.
*** Bug 1088508 has been marked as a duplicate of this bug. ***
I still can reproduce it using below packages version, after step 3, the hostdev can not be found in domain xml, but it is still in guest OS. Version: libvirt-0.10.2-38.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.427.el6.x86_64 [root@sriov2 ~]# 1. Define a guest with the following hostdev device [root@sriov2 ~]# cat hostdev.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address bus='0x01' slot='0' function='0'/> </source> </hostdev> [root@sriov2 ~]# virsh list --all Id Name State ---------------------------------------------------- - rhel6 shut off - rhel63 shut off [root@sriov2 ~]# virsh dumpxml rhel6 | grep -A10 hostdev <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> 2. Start the guest, at the same time detach that device in another terminal. [root@sriov2 ~]# virsh start rhel6 Domain rhel6 started In another terminal,execute below command immediately: [root@sriov2 ~]# virsh detach-device rhel6 hostdev.xml Device detached successfully 3. Dumpxml domain xml and check process fd. [root@sriov2 ~]# virsh dumpxml rhel6 | grep -A10 hostdev [root@sriov2 ~]# <===== incorrect! [root@sriov2 ~]# virsh dumpxml rhel6 --inactive| grep -A10 hostdev <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> [root@sriov2 ~]# ll /proc/`pidof qemu-kvm`/fd | grep "/sys/devices/" lrwx------. 1 qemu qemu 64 Jun 12 17:02 20 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/resource0 lrwx------. 1 qemu qemu 64 Jun 12 17:02 21 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/resource2 lrwx------. 1 qemu qemu 64 Jun 12 17:02 22 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/resource3 lrwx------. 1 qemu qemu 64 Jun 12 17:02 25 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/config [In guest OS] [root@localhost ~]# lspci | grep Et 00:08.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
That's correct, the new behavior can only be observed with qemu-kvm which supports DEVICE_DEL event. So the good thing is, you verified that with an older qemu-kvm, libvirt works in the way it used to work. To actually test the new behavior, you need to wait for a new qemu-kvm package, specifically for the one which will contain patches for bug 813748 (the patches are ACKed so it should not take a long time for them to be built into a new package).
(In reply to Jiri Denemark from comment #27) > That's correct, the new behavior can only be observed with qemu-kvm which > supports DEVICE_DEL event. So the good thing is, you verified that with an > older qemu-kvm, libvirt works in the way it used to work. > > To actually test the new behavior, you need to wait for a new qemu-kvm > package, specifically for the one which will contain patches for bug 813748 > (the patches are ACKed so it should not take a long time for them to be > built into a new package). Great! Thanks for your information.
I can not reproduce it on below packages. Version: libvirt-0.10.2-38.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.428.el6.x86_64 Scenario 1: [root@sriov2 ~]# cat hostdev.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address bus='0x01' slot='0' function='0'/> </source> </hostdev> [root@sriov2 ~]# virsh list --all Id Name State ---------------------------------------------------- - rhel6 shut off - rhel63 shut off - rhel65 shut off [root@sriov2 ~]# virsh dumpxml rhel6 --inactive| grep -A10 hostdev <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> [root@sriov2 ~]# virsh start rhel6 Domain rhel6 started In another terminal,execute below command immediately: [root@sriov2 ~]# virsh detach-device rhel6 hostdev.xml Device detached successfully [root@sriov2 ~]# virsh dumpxml rhel6 | grep -A10 hostdev <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> [root@sriov2 ~]# ll /proc/`pidof qemu-kvm`/fd | grep "/sys/devices/" lrwx------. 1 qemu qemu 64 Jun 18 11:47 20 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/resource0 lrwx------. 1 qemu qemu 64 Jun 18 11:47 21 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/resource2 lrwx------. 1 qemu qemu 64 Jun 18 11:47 22 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/resource3 lrwx------. 1 qemu qemu 64 Jun 18 11:47 25 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/config [root@sriov2 ~]# virsh console rhel6 [root@localhost ~]# lspci | grep Et 00:08.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection [root@localhost ~]# Scenario 2: 1. Start domian with hostdev device. [root@sriov2 ~]# virsh start rhel6 Domain rhel6 started [root@sriov2 ~]# virsh dumpxml rhel6| grep hostdev -aA 8 <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> 2. Leave OS in grub menu, and detach that hostdev device.(it won't actually detach the device since guest OS is not responding; check domain XML it's still there) [root@sriov2 ~]# vncviewer 127.0.0.1 [root@sriov2 ~]# cat hostdev.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address bus='0x01' slot='0' function='0'/> </source> </hostdev> [root@sriov2 ~]# virsh detach-device rhel6 hostdev.xml Device detached successfully [root@sriov2 ~]# virsh dumpxml rhel6| grep hostdev -aA 8 <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> 3. Stop libvirtd service. [root@sriov2 ~]# service libvirtd stop Stopping libvirtd daemon: [ OK ] [root@sriov2 ~]# service libvirtd status libvirtd is stopped 4.Continue to boot domain, and check the interface in domain, it's in domain. 5. Connect to qemu monitor manually and remove the PCI device [root@sriov2 ~]# socat stdin unix-connect:/var/lib/libvirt/qemu/rhel6.monitor {"QMP": {"version": {"qemu": {"micro": 1, "minor": 12, "major": 0}, "package": "(qemu-kvm-0.12.1.2)"}, "capabilities": []}} {"execute":"qmp_capabilities","id":"libvirt-1"} {"return": {}, "id": "libvirt-1"} {"execute":"device_del","arguments":{"id":"hostdev0"}} {"return": {}} {"timestamp": {"seconds": 1403063829, "microseconds": 94636}, "event": "DEVICE_DELETED", "data": {"device": "hostdev0"}} 6. Start libvirtd service. [root@sriov2 ~]# service libvirtd start Starting libvirtd daemon: [ OK ] 7. Check domain XML, libvirtd updated the domain xml automatically after reconnecting to running domain. [root@sriov2 ~]# virsh list --all Id Name State ---------------------------------------------------- 28 rhel6 running - rhel63 shut off - rhel65 shut off [root@sriov2 ~]# virsh dumpxml rhel6| grep hostdev -aA 8 <====no hostdev device [root@sriov2 ~]# virsh dumpxml rhel6 --inactive| grep hostdev -aA 8 <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> Also tested it using Intel 82576 NIC(on pf/vf), testing results are same as common NIC. We can get expected results in two scenarios, so changed to verified.
Added testing results for backend driver of interface on host OS. Version: libvirt-0.10.2-43.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.428.el6.x86_64 For scenario 1 in comment 29 [root@rhel6 ~]# virsh start r7 Domain r7 started In another terminal,execute below command immediately: [root@rhel6 ~]# virsh detach-device r7 hostdev.xml Device detached successfully (Note that: Here is a rhel7 bug on Bug 993631 - Libvirt should report failure when detach hostdev unsuccessfully, but no bug on rhel6 at present.) [root@rhel6 ~]# virsh dumpxml r7 | grep -A10 hostdev <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/> </hostdev> ... [root@rhel6 ~]# ll /proc/`pidof qemu-kvm`/fd | grep "/sys/devices/" lrwx------. 1 qemu qemu 64 Aug 19 11:39 23 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:02:00.0/resource0 lrwx------. 1 qemu qemu 64 Aug 19 11:39 24 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:02:00.0/config lrwx------. 1 qemu qemu 64 Aug 19 11:39 25 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:02:00.0/resource1 lrwx------. 1 qemu qemu 64 Aug 19 11:39 26 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:02:00.0/resource2 [root@rhel6 ~]# lspci -s 02:00.0 -vv 02:00.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05) Subsystem: Intel Corporation PRO/1000 GT Desktop Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 (63750ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 20 Region 0: Memory at f7c40000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at f7c20000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at d000 [size=64] Expansion ROM at cf300000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] PCI-X non-bridge device Command: DPERE- ERO+ RBC=512 OST=1 Status: Dev=00:00.0 64bit- 133MHz- SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=8 RSCEM- 266MHz- 533MHz- Kernel driver in use: pci-stub Kernel modules: e1000 [root@rhel6 ~]# virsh destroy r7 Domain r7 destroyed [root@rhel6 ~]# lspci -s 02:00.0 -vv 02:00.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05) Subsystem: Intel Corporation PRO/1000 GT Desktop Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 (63750ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 20 Region 0: Memory at f7c40000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at f7c20000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at d000 [size=64] Expansion ROM at cf300000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] PCI-X non-bridge device Command: DPERE- ERO+ RBC=512 OST=1 Status: Dev=00:00.0 64bit- 133MHz- SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=8 RSCEM- 266MHz- 533MHz- Kernel driver in use: e1000 Kernel modules: e1000 For scenario 2 in comment 29 [root@rhel6 ~]# virsh start r7 Domain r7 started Leave OS in grub menu, and detach that hostdev device.(it won't actually detach the device since guest OS is not responding; check domain XML it's still there) [root@rhel6 ~]# virsh detach-device r7 hostdev.xml Device detached successfully [root@rhel6 ~]# virsh dumpxml r7| grep hostdev -aA 8 <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/> </hostdev> ... [root@rhel6 ~]# lspci -s 02:00.0 -vv| grep Kernel Kernel driver in use: pci-stub Kernel modules: e1000 [root@rhel6 ~]# service libvirtd stop Stopping libvirtd daemon: [ OK ] [root@rhel6 ~]# lspci -s 02:00.0 -vv| grep Kernel Kernel driver in use: pci-stub Kernel modules: e1000 [root@rhel6 ~]# socat stdin unix-connect:/var/lib/libvirt/qemu/r7.monitor {"QMP": {"version": {"qemu": {"micro": 1, "minor": 12, "major": 0}, "package": "(qemu-kvm-0.12.1.2)"}, "capabilities": []}} {"execute":"qmp_capabilities","id":"libvirt-1"} {"return": {}, "id": "libvirt-1"} {"execute":"device_del","arguments":{"id":"hostdev0"}} {"return": {}} {"timestamp": {"seconds": 1408420401, "microseconds": 322202}, "event": "DEVICE_DELETED", "data": {"device": "hostdev0"}} ^C [root@rhel6 ~]lspci -s 02:00.0 -vv| grep Kernel Kernel driver in use: pci-stub Kernel modules: e1000 [root@rhel6 ~]# service libvirtd start Starting libvirtd daemon: [ OK ] [root@rhel6 ~]# virsh dumpxml r7| grep hostdev -aA 8 [root@rhel6 ~]# lspci -s 02:00.0 -vv| grep Kernel Kernel driver in use: e1000 Kernel modules: e1000
Hi Alex, Could you agree my testing results in comment 29 and 30? Thanks.
(In reply to Hu Jianwei from comment #31) > Hi Alex, > > Could you agree my testing results in comment 29 and 30? It looks correct to me. Bug 993631 should probably be cloned to rhel6, but can be resolved separately.
> It looks correct to me. Bug 993631 should probably be cloned to rhel6, but > can be resolved separately. Thanks, the bug has been cloned to rhel6. Bug 1133443 - Libvirt should report failure when detach hostdev unsuccessfully.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1374.html