Bug 807023
Summary: | libvirt does not check for successful device_del | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Alex Williamson <alex.williamson> | |
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 6.3 | CC: | ajia, alex.williamson, areis, armbru, cwei, danken, dhill, dron, dyuan, honzhang, iheim, jdenemar, jiahu, juzhang, lpeer, lsu, mzhan, pankaj.kapila, rbalakri, shyu, weizhan, xuzhang, ykawada | |
Target Milestone: | rc | Keywords: | Upstream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | libvirt-0.10.2-38.el6 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 840450 984112 (view as bug list) | Environment: | ||
Last Closed: | 2014-10-14 04:13:51 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 813748, 1090918, 1093033 | |||
Bug Blocks: | 1044466, 1056252 |
Description
Alex Williamson
2012-03-26 19:19:28 UTC
This seems like a situation that's going to be difficult for libvirt to resolve without additional information from qemu. Jiri, what do you think? It won't be easy even with additional information from qemu. We would need to change detach API to be just a request for detaching and generate an event (emitted by qemu) when the device gets detached. However, this would mean the semantics of the api changed so it might be difficult to deal with, although one can argue that it didn't work with current semantics anyway. Reproduce with steps 1. Start a guest with the following hostdev device ... <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address bus='5' slot='0' function='0'/> </source> </hostdev> ... 2. Start a guest 3. At the same time with step2, detach device from the guest asap # virsh detach-device guest pci.xml # cat pci.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address bus='5' slot='0' function='0'/> </source> </hostdev> 4. ll /proc/[pid of guest]/fd Result: In step3, libvirt report that detach-device succeed, but in step4, we still can see the device info ... 13 -> /system/devices/pci0000:00/0000:00:1c.4/0000:05:00.0/resource0 24 -> /system/devices/pci0000:00/0000:00:1c.4/0000:05:00.0/config ... also related: Bug 848955 Until the guest acts upon device_del command, qemu keeps the device listed. If a managing process wants to poll for the success of the asynchronous command, it can. # virsh -c qemu+tcp://localhost/system qemu-monitor-command vm1 --hmp info network Devices not on any VLAN: net0: model=virtio-net-pci,macaddr=00:1a:4a:16:01:b0 peer=hostnet0 hostnet1: fd=33 peer=net1 net1: model=virtio-net-pci,macaddr=00:1a:4a:16:01:60 peer=hostnet1 However, libvirt hides the fact that updateDevice command hasn't reached completion, and drops the still-residing device from the domxml. That's bad, as it denies crucial information from vdsm. # virsh -c qemu+tcp://localhost/system dumpxml vm1 ... <devices> ... <interface type='bridge'> <mac address='00:1a:4a:16:01:60'/> <source bridge='ovirtmgmt'/> <target dev='vnet0'/> <model type='virtio'/> <link state='up'/> <alias name='net1'/> </interface> ... </devices> Libvirt should let its client check whether the async command has finished. A nice bonus would be an event emitted when the device is released in qemu. *** Bug 958786 has been marked as a duplicate of this bug. *** This is not implemented usptream (v1.1.0-278-g0dfb8a1). s/not/now/ :-P I seem to be experiencing the same issue with our openstack implementation. If we don't put the disk as "offline" before detaching it, after some attach/detach iterations, it will simply refuse to attach the volume until the qemu process is restarted. I've successfully recreated this issue many times with Windows 7. *** Bug 1088508 has been marked as a duplicate of this bug. *** *** Bug 1088508 has been marked as a duplicate of this bug. *** I still can reproduce it using below packages version, after step 3, the hostdev can not be found in domain xml, but it is still in guest OS. Version: libvirt-0.10.2-38.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.427.el6.x86_64 [root@sriov2 ~]# 1. Define a guest with the following hostdev device [root@sriov2 ~]# cat hostdev.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address bus='0x01' slot='0' function='0'/> </source> </hostdev> [root@sriov2 ~]# virsh list --all Id Name State ---------------------------------------------------- - rhel6 shut off - rhel63 shut off [root@sriov2 ~]# virsh dumpxml rhel6 | grep -A10 hostdev <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> 2. Start the guest, at the same time detach that device in another terminal. [root@sriov2 ~]# virsh start rhel6 Domain rhel6 started In another terminal,execute below command immediately: [root@sriov2 ~]# virsh detach-device rhel6 hostdev.xml Device detached successfully 3. Dumpxml domain xml and check process fd. [root@sriov2 ~]# virsh dumpxml rhel6 | grep -A10 hostdev [root@sriov2 ~]# <===== incorrect! [root@sriov2 ~]# virsh dumpxml rhel6 --inactive| grep -A10 hostdev <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> [root@sriov2 ~]# ll /proc/`pidof qemu-kvm`/fd | grep "/sys/devices/" lrwx------. 1 qemu qemu 64 Jun 12 17:02 20 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/resource0 lrwx------. 1 qemu qemu 64 Jun 12 17:02 21 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/resource2 lrwx------. 1 qemu qemu 64 Jun 12 17:02 22 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/resource3 lrwx------. 1 qemu qemu 64 Jun 12 17:02 25 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/config [In guest OS] [root@localhost ~]# lspci | grep Et 00:08.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection That's correct, the new behavior can only be observed with qemu-kvm which supports DEVICE_DEL event. So the good thing is, you verified that with an older qemu-kvm, libvirt works in the way it used to work. To actually test the new behavior, you need to wait for a new qemu-kvm package, specifically for the one which will contain patches for bug 813748 (the patches are ACKed so it should not take a long time for them to be built into a new package). (In reply to Jiri Denemark from comment #27) > That's correct, the new behavior can only be observed with qemu-kvm which > supports DEVICE_DEL event. So the good thing is, you verified that with an > older qemu-kvm, libvirt works in the way it used to work. > > To actually test the new behavior, you need to wait for a new qemu-kvm > package, specifically for the one which will contain patches for bug 813748 > (the patches are ACKed so it should not take a long time for them to be > built into a new package). Great! Thanks for your information. I can not reproduce it on below packages. Version: libvirt-0.10.2-38.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.428.el6.x86_64 Scenario 1: [root@sriov2 ~]# cat hostdev.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address bus='0x01' slot='0' function='0'/> </source> </hostdev> [root@sriov2 ~]# virsh list --all Id Name State ---------------------------------------------------- - rhel6 shut off - rhel63 shut off - rhel65 shut off [root@sriov2 ~]# virsh dumpxml rhel6 --inactive| grep -A10 hostdev <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> [root@sriov2 ~]# virsh start rhel6 Domain rhel6 started In another terminal,execute below command immediately: [root@sriov2 ~]# virsh detach-device rhel6 hostdev.xml Device detached successfully [root@sriov2 ~]# virsh dumpxml rhel6 | grep -A10 hostdev <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> [root@sriov2 ~]# ll /proc/`pidof qemu-kvm`/fd | grep "/sys/devices/" lrwx------. 1 qemu qemu 64 Jun 18 11:47 20 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/resource0 lrwx------. 1 qemu qemu 64 Jun 18 11:47 21 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/resource2 lrwx------. 1 qemu qemu 64 Jun 18 11:47 22 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/resource3 lrwx------. 1 qemu qemu 64 Jun 18 11:47 25 -> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/config [root@sriov2 ~]# virsh console rhel6 [root@localhost ~]# lspci | grep Et 00:08.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection [root@localhost ~]# Scenario 2: 1. Start domian with hostdev device. [root@sriov2 ~]# virsh start rhel6 Domain rhel6 started [root@sriov2 ~]# virsh dumpxml rhel6| grep hostdev -aA 8 <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> 2. Leave OS in grub menu, and detach that hostdev device.(it won't actually detach the device since guest OS is not responding; check domain XML it's still there) [root@sriov2 ~]# vncviewer 127.0.0.1 [root@sriov2 ~]# cat hostdev.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address bus='0x01' slot='0' function='0'/> </source> </hostdev> [root@sriov2 ~]# virsh detach-device rhel6 hostdev.xml Device detached successfully [root@sriov2 ~]# virsh dumpxml rhel6| grep hostdev -aA 8 <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> 3. Stop libvirtd service. [root@sriov2 ~]# service libvirtd stop Stopping libvirtd daemon: [ OK ] [root@sriov2 ~]# service libvirtd status libvirtd is stopped 4.Continue to boot domain, and check the interface in domain, it's in domain. 5. Connect to qemu monitor manually and remove the PCI device [root@sriov2 ~]# socat stdin unix-connect:/var/lib/libvirt/qemu/rhel6.monitor {"QMP": {"version": {"qemu": {"micro": 1, "minor": 12, "major": 0}, "package": "(qemu-kvm-0.12.1.2)"}, "capabilities": []}} {"execute":"qmp_capabilities","id":"libvirt-1"} {"return": {}, "id": "libvirt-1"} {"execute":"device_del","arguments":{"id":"hostdev0"}} {"return": {}} {"timestamp": {"seconds": 1403063829, "microseconds": 94636}, "event": "DEVICE_DELETED", "data": {"device": "hostdev0"}} 6. Start libvirtd service. [root@sriov2 ~]# service libvirtd start Starting libvirtd daemon: [ OK ] 7. Check domain XML, libvirtd updated the domain xml automatically after reconnecting to running domain. [root@sriov2 ~]# virsh list --all Id Name State ---------------------------------------------------- 28 rhel6 running - rhel63 shut off - rhel65 shut off [root@sriov2 ~]# virsh dumpxml rhel6| grep hostdev -aA 8 <====no hostdev device [root@sriov2 ~]# virsh dumpxml rhel6 --inactive| grep hostdev -aA 8 <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> Also tested it using Intel 82576 NIC(on pf/vf), testing results are same as common NIC. We can get expected results in two scenarios, so changed to verified. Added testing results for backend driver of interface on host OS. Version: libvirt-0.10.2-43.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.428.el6.x86_64 For scenario 1 in comment 29 [root@rhel6 ~]# virsh start r7 Domain r7 started In another terminal,execute below command immediately: [root@rhel6 ~]# virsh detach-device r7 hostdev.xml Device detached successfully (Note that: Here is a rhel7 bug on Bug 993631 - Libvirt should report failure when detach hostdev unsuccessfully, but no bug on rhel6 at present.) [root@rhel6 ~]# virsh dumpxml r7 | grep -A10 hostdev <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/> </hostdev> ... [root@rhel6 ~]# ll /proc/`pidof qemu-kvm`/fd | grep "/sys/devices/" lrwx------. 1 qemu qemu 64 Aug 19 11:39 23 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:02:00.0/resource0 lrwx------. 1 qemu qemu 64 Aug 19 11:39 24 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:02:00.0/config lrwx------. 1 qemu qemu 64 Aug 19 11:39 25 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:02:00.0/resource1 lrwx------. 1 qemu qemu 64 Aug 19 11:39 26 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:02:00.0/resource2 [root@rhel6 ~]# lspci -s 02:00.0 -vv 02:00.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05) Subsystem: Intel Corporation PRO/1000 GT Desktop Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 (63750ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 20 Region 0: Memory at f7c40000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at f7c20000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at d000 [size=64] Expansion ROM at cf300000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] PCI-X non-bridge device Command: DPERE- ERO+ RBC=512 OST=1 Status: Dev=00:00.0 64bit- 133MHz- SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=8 RSCEM- 266MHz- 533MHz- Kernel driver in use: pci-stub Kernel modules: e1000 [root@rhel6 ~]# virsh destroy r7 Domain r7 destroyed [root@rhel6 ~]# lspci -s 02:00.0 -vv 02:00.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05) Subsystem: Intel Corporation PRO/1000 GT Desktop Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 (63750ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 20 Region 0: Memory at f7c40000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at f7c20000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at d000 [size=64] Expansion ROM at cf300000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] PCI-X non-bridge device Command: DPERE- ERO+ RBC=512 OST=1 Status: Dev=00:00.0 64bit- 133MHz- SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=8 RSCEM- 266MHz- 533MHz- Kernel driver in use: e1000 Kernel modules: e1000 For scenario 2 in comment 29 [root@rhel6 ~]# virsh start r7 Domain r7 started Leave OS in grub menu, and detach that hostdev device.(it won't actually detach the device since guest OS is not responding; check domain XML it's still there) [root@rhel6 ~]# virsh detach-device r7 hostdev.xml Device detached successfully [root@rhel6 ~]# virsh dumpxml r7| grep hostdev -aA 8 <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/> </hostdev> ... [root@rhel6 ~]# lspci -s 02:00.0 -vv| grep Kernel Kernel driver in use: pci-stub Kernel modules: e1000 [root@rhel6 ~]# service libvirtd stop Stopping libvirtd daemon: [ OK ] [root@rhel6 ~]# lspci -s 02:00.0 -vv| grep Kernel Kernel driver in use: pci-stub Kernel modules: e1000 [root@rhel6 ~]# socat stdin unix-connect:/var/lib/libvirt/qemu/r7.monitor {"QMP": {"version": {"qemu": {"micro": 1, "minor": 12, "major": 0}, "package": "(qemu-kvm-0.12.1.2)"}, "capabilities": []}} {"execute":"qmp_capabilities","id":"libvirt-1"} {"return": {}, "id": "libvirt-1"} {"execute":"device_del","arguments":{"id":"hostdev0"}} {"return": {}} {"timestamp": {"seconds": 1408420401, "microseconds": 322202}, "event": "DEVICE_DELETED", "data": {"device": "hostdev0"}} ^C [root@rhel6 ~]lspci -s 02:00.0 -vv| grep Kernel Kernel driver in use: pci-stub Kernel modules: e1000 [root@rhel6 ~]# service libvirtd start Starting libvirtd daemon: [ OK ] [root@rhel6 ~]# virsh dumpxml r7| grep hostdev -aA 8 [root@rhel6 ~]# lspci -s 02:00.0 -vv| grep Kernel Kernel driver in use: e1000 Kernel modules: e1000 Hi Alex, Could you agree my testing results in comment 29 and 30? Thanks. (In reply to Hu Jianwei from comment #31) > Hi Alex, > > Could you agree my testing results in comment 29 and 30? It looks correct to me. Bug 993631 should probably be cloned to rhel6, but can be resolved separately. > It looks correct to me. Bug 993631 should probably be cloned to rhel6, but > can be resolved separately. Thanks, the bug has been cloned to rhel6. Bug 1133443 - Libvirt should report failure when detach hostdev unsuccessfully. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1374.html |