Bug 781985

Summary: When detach PCI device from guest, unknown error occurs.
Product: Red Hat Enterprise Linux 6 Reporter: hongming <honzhang>
Component: libvirtAssignee: Osier Yang <jyang>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.3CC: acathrow, ajia, dallan, eblake, mzhan, rwu, weizhan, zhpeng
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.9.10-1.el6 Doc Type: Bug Fix
Doc Text:
No documentation needed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 06:46:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
libvirt debug log
none
libvirt debug log none

Description hongming 2012-01-16 09:14:19 UTC
Created attachment 555455 [details]
libvirt debug log

Description of problem:

When run "virsh detach-device domain xmlfile" command,error occurs as follows,And the PCI device actually has detached from guest.The bug can always be reproduced in libvirt-0.9.9-1.It can't be reproduced in libvirt-0.9.8-1.
-error:Failed to detach device from hostdev.xml
-error:An error occurred,but the cause is unknown.




Version-Release number of selected component (if applicable):

-kernel-2.6.32-220.el6.x86_64
-libvirt-0.9.9-1.el6.x86_64
-qemu-kvm-0.12.1.2-2.213.el6.x86_64


How reproducible:
Always

Steps to Reproduce:
1.enable kernel iommu. edit grub.conf
add intel_iommu=on at the end of  kernel line.  
2.For platform just support vt-d1(host kernel) and host kernel
larger than 171 kernel, do the following steps.
      modprobe -r kvm_intel
      modprobe -r kvm
      modprobe kvm allow_unsafe_assigned_interrupts=1
      modprobe kvm_intel  
3.Check device list, prepare hotplug network from host to guest.
computer
   |
     +- pci_0000_00_19_0
   |        |
   |       +- net_eth0_44_37_e6_67_11_a2
4. # virsh nodedev-dumpxml pci_0000_00_19_0 
5. # readlink /sys/bus/pci/devices/0000\:00\:19.0/driver/ -f
/sys/bus/pci/drivers/e1000e
6. # virsh nodedev-dettach pci_0000_00_19_0
7. # readlink /sys/bus/pci/devices/0000\:00\:19.0/driver/ -f
/sys/bus/pci/drivers/pci-stub
8.# virsh nodedev-reset pci_0000_00_19_0
Device pci_0000_00_19_0 reset
9.virsh attach-device rhel6 hostdev.xml

hostdev.xml is like as following:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x00' slot='0x19' function='0x0'/>
      </source>
    </hostdev> 
  
10. In guest, using lspci, and ping to check the network device is working fine.
11. virsh detach-device rhel6 hostdev.xml

Actual results:
-error:Failed to detach device from **.xml
-error:An error occurred,but the cause is unknown 
The PCI device actually has detached from guest.If don't destroy the guest, the command nodedev-reattach can't reattach the PCI device to host. 

Expected results:
Successfully detach device from guest.


Additional info:

Comment 4 Alex Jia 2012-01-16 10:25:10 UTC
The issue is introduced in commit a0aec36, the following comment is from this path:

This patch fixes two problems:
        1) The device will be reattached to host even if it's not
           managed, as there is a "pciDeviceSetManaged".

And in codes:

1960 static int
1961 qemuDomainDetachHostPciDevice(struct qemud_driver *driver,
1962                               virDomainObjPtr vm,
1963                               virDomainDeviceDefPtr dev,
1964                               virDomainHostdevDefPtr *detach_ret)
1965 {
......
2026     pci = pciGetDevice(detach->source.subsys.u.pci.domain,
2027                        detach->source.subsys.u.pci.bus,
2028                        detach->source.subsys.u.pci.slot,
2029                        detach->source.subsys.u.pci.function);
2030     if (pci) {
2031         activePci = pciDeviceListSteal(driver->activePciHostdevs, pci);
2032         if (pciResetDevice(activePci, driver->activePciHostdevs, NULL))
2033             qemuReattachPciDevice(activePci, driver);
2034         else
2035             ret = -1;
2036         pciFreeDevice(pci);
2037         pciFreeDevice(activePci);
2038     } else {
2039         ret = -1;
2040     }
......

In fact, the function qemuReattachPciDevice will call pciDeviceGetManaged function, if the pci device isn't managed mode, it will directly return, so the pci devices can't also been reattached to host.

In addition, in line 2032, also should judge pciResetDevice function return value, if return value < 0, then XXXX and change ret = -1 etc.

Comment 5 Osier Yang 2012-01-18 03:41:16 UTC
Upstream commit 6be610bfaae08655eaf93f9638d4c6636c00343f fixed the problem indicentally.

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index dc40d2f..4b60839 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -2029,7 +2029,8 @@ qemuDomainDetachHostPciDevice(struct qemud_driver *driver,
                        detach->source.subsys.u.pci.function);
     if (pci) {
         activePci = pciDeviceListSteal(driver->activePciHostdevs, pci);
-        if (pciResetDevice(activePci, driver->activePciHostdevs, NULL))
+        if (pciResetDevice(activePci, driver->activePciHostdevs,
+                           driver->inactivePciHostdevs) == 0)
             qemuReattachPciDevice(activePci, driver);
         else
             ret = -1;

Comment 6 Alex Jia 2012-01-18 05:48:45 UTC
Hi Osier,
Although device detached successfully, the pci device can't be returned to host with managed mode, is this a expected result? I don't think so. I remember the original design is the pci device with managed mode will be automatically returned to host when detaching a hot-pluged pci device from running guest or shut down the running guest with attached pci device.

In addition, I tried to manually reattach the pci device to host, although virsh nodedev-reattach said Device pci_0000_00_19_0 re-attached, in fact, the pci device is /sys/devices/pci0000:00/0000:00:19.0/driver not original e1000e, I still can't use the NICs on host.

Thanks,
Alex

Comment 7 Alex Jia 2012-01-18 05:50:20 UTC
(In reply to comment #6)

> pci device is /sys/devices/pci0000:00/0000:00:19.0/driver not original e1000e,
s/device/driver/.

Comment 8 Alex Jia 2012-01-18 06:09:15 UTC
Hi Osier, 
I saw your v2 patch "qemu: Introduce inactive PCI device list" remove 'if (!pciDeviceGetManaged(dev))' judgement from 'qemuReattachPciDevice' function, I think the patch will fix some issues on Comment 6 not all, if 'pciResetDevice != 0', we should also do some cleanup work such as returning the pci device to host, right?

Alex

Comment 11 zhpeng 2012-02-15 05:55:08 UTC
With comment 0 steps, on libvirt-0.9.10-1.el6.x86_64, results:

[root@zhpeng ~]# virsh attach-device kvm1 nodedev.xml 
Device attached successfully

[root@zhpeng ~]# virsh detach-device kvm1 nodedev.xml 
Device detached successfully

So it's verified.

Comment 21 zhpeng 2012-02-22 03:23:29 UTC
libvirt-0.9.4-23.el6_2.6 test passed.

Comment 22 Osier Yang 2012-05-04 10:14:56 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No documentation needed.

Comment 24 errata-xmlrpc 2012-06-20 06:46:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0748.html