Bug 500217 - RHEL5.4 vt-d: libvirt does not automatically re-attach an assigned device in the host after guest shutdown
Summary: RHEL5.4 vt-d: libvirt does not automatically re-attach an assigned device in ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libvirt
Version: 5.4
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Chris Lalancette
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 558779 (view as bug list)
Depends On:
Blocks: 516837 533941
TreeView+ depends on / blocked
 
Reported: 2009-05-11 17:29 UTC by Mark McLoughlin
Modified: 2010-07-19 13:44 UTC (History)
12 users (show)

Fixed In Version: libvirt-0.6.3-31.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 08:09:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Upstream libvirt PCI hotplug code (35.57 KB, patch)
2010-01-07 18:09 UTC, Chris Lalancette
no flags Details | Diff
steps to reproduce (6.05 KB, text/plain)
2010-01-08 07:56 UTC, Alex Jia
no flags Details
libvirt-0.6.3-23.el5 (5.44 KB, text/plain)
2010-01-17 09:10 UTC, Alex Jia
no flags Details
libvirt-0.6.3-30.el5 (5.11 KB, text/plain)
2010-01-17 09:10 UTC, Alex Jia
no flags Details
the detail (6.86 KB, text/plain)
2010-01-20 06:54 UTC, Alex Jia
no flags Details
dmesg information (32.07 KB, text/plain)
2010-01-21 06:51 UTC, Alex Jia
no flags Details
verified information (2.67 KB, text/plain)
2010-01-21 06:53 UTC, Alex Jia
no flags Details
details (2.87 KB, text/plain)
2010-01-29 02:31 UTC, Alex Jia
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2010:0205 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2010-03-29 12:27:37 UTC

Description Mark McLoughlin 2009-05-11 17:29:12 UTC
Clone of a Fedora 11 bug. Not a critical issue, but would improve user experience.

+++ This bug was initially created as a clone of Bug #499561 +++

See:

https://fedoraproject.org/w/index.php?title=QA:Testcase_Virtualization_KVM_PCI_Device_Assignment_assign_using_libvirt

If you attach a device to a guest in "managed" mode, then libvirt automatically detaches the device from the host before starting the guest, but it doesn't automatically do it when the guest shuts down

This would be particularly noticeable if you were using virt-manager - you'd have to drop to the shell and do "virsh reattach" before you can use the device in the guest again.

Granted, libvirtd might not even be running when the guest dies, but is there anything we can do here for the common case?

--- Additional comment from berrange on 2009-05-07 05:12:43 EDT ---

Agreed, it should reset the device after guest shutdown and then re-attach the host OS drivers. 

We should assume libvirtd is running at time of guest shutdown. Any scenario where it wouldn't be running is a bug.

--- Additional comment from berrange on 2009-05-07 05:15:11 EDT ---

NB, we need to be careful in the reset to avoid outstanding DMA ops clobbering the host OS memory - see this note from xen

 http://xenbits.xensource.com/xen-unstable.hg?rev/026957d523f9

This might mean we need KVM todo a FLR in the guest cleanup path ?

Comment 2 Mark McLoughlin 2009-08-14 07:50:14 UTC
Upstream fix for this:

  http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=4035152a87

Comment 3 Daniel Veillard 2009-12-04 15:33:57 UTC
libvirt-0.6.3-23.el5 has been built in dist-5E-qu-candidate with
the patches,

Daniel

Comment 5 Alex Jia 2009-12-30 05:26:22 UTC
This bug still exist with libvirt-0.6.3-23.el5 on rhel5u5,and it will raise different result if setting managed mode to 'yes' for xen and hypervisor:

1.xen hypervisor
The function can't be supported by the xen hypervisor,it will raise a error message:

error: Failed to start domain rhel5u5_x86_64_xenfv
error: POST operation failed: xend_post: error from xen daemon: (xend.err 'Error creating domain: pci: PCI Backend does not own device 0000:00:19.0\nSee the pciback.hide kernel command-line parameter or\nbind your slot/device to the PCI backend using sysfs')


2.kvm hypervisor
It can't still automatically re-attach the assigned device after guest be shut down,see following steps to reproduce:

[root@dhcp-66-70-62 ~]# virsh nodedev-list --tree
computer
  |
  +-pci_8086_10bd
  |   |
  |   +-net_00_23_ae_6f_f1_d7
  |
  +-pci_8086_244e
  +-pci_8086_2914
......

[root@dhcp-66-70-62 ~]# virsh nodedev-dumpxml pci_8086_10bd
<device>
  <name>pci_8086_10bd</name>
  <parent>computer</parent>
  <capability type='pci'>
    <domain>0</domain>
    <bus>0</bus>
    <slot>25</slot>
    <function>0</function>
    <product id='0x10bd'>82566DM-2 Gigabit Network Connection</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
  </capability>
</device>

[root@dhcp-66-70-62 ~]# readlink /sys/bus/pci/devices/0000\:00\:19.0/driver/ -f
/sys/bus/pci/drivers/e1000e

[root@dhcp-66-70-62 ~]# virsh edit rhel5u5
Domain rhel5u5 XML configuration edited.

[root@dhcp-66-70-62 ~]# virsh dumpxml rhel5u5
<domain type='qemu'>
  <name>rhel5u5</name>
  <uuid>c0e805d9-e288-e4e8-2357-13b39d57cb6e</uuid>
  <memory>1048576</memory>
  <currentMemory>1048576</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <source file='/var/lib/libvirt/images/rhel5u5.img'/>
      <target dev='hda' bus='ide'/>
    </disk>
    <interface type='network'>
      <mac address='54:52:00:49:37:65'/>
      <source network='default'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x00' slot='0x19' function='0x0'/>
      </source>
    </hostdev>
  </devices>
</domain>

[root@dhcp-66-70-62 ~]# virsh start rhel5u5
Domain rhel5u5 started

[root@dhcp-66-70-62 ~]# readlink /sys/bus/pci/devices/0000\:00\:19.0/driver/ -f
/sys/bus/pci/drivers/pci-stub

[root@dhcp-66-70-62 ~]# ifconfig eth0
eth0: error fetching interface information: Device not found

[root@dhcp-66-70-62 ~]# virsh shutdown rhel5u5
Domain rhel5u5 is being shutdown

[root@dhcp-66-70-62 ~]# readlink /sys/bus/pci/devices/0000\:00\:19.0/driver/ -f
/sys/bus/pci/drivers/pci-stub

[root@dhcp-66-70-62 ~]# ifconfig eth0
eth0: error fetching interface information: Device not found

Note that,the host NIC will be available if we explicitly re-attach the NIC device:
[root@dhcp-66-70-62 ~]# virsh nodedev-reattach pci_8086_10bd
Device pci_8086_10bd re-attached

[root@dhcp-66-70-62 ~]# readlink /sys/bus/pci/devices/0000\:00\:19.0/driver/ -f
/sys/bus/pci/drivers/e1000e

[root@dhcp-66-70-62 ~]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:23:AE:6F:F1:D7  
          inet addr:10.66.70.62  Bcast:10.66.70.255  Mask:255.255.255.0
          inet6 addr: fe80::223:aeff:fe6f:f1d7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1807 errors:0 dropped:0 overruns:0 frame:0
          TX packets:75 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:259894 (253.8 KiB)  TX bytes:20171 (19.6 KiB)
          Memory:febe0000-fec00000 

[root@dhcp-66-70-62 ~]# ping 10.66.70.161
PING 10.66.70.161 (10.66.70.161) 56(84) bytes of data.
64 bytes from 10.66.70.161: icmp_seq=1 ttl=64 time=0.616 ms
64 bytes from 10.66.70.161: icmp_seq=2 ttl=64 time=0.259 ms
......


Version-Release number of selected component (if applicable):
[root@dhcp-66-70-62 libvirt]# uname -a
Linux dhcp-66-70-62.nay.redhat.com 2.6.18-183.el5xen #1 SMP Mon Dec 21 18:46:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@dhcp-66-70-62 libvirt]# rpm -qa|grep libvirt
libvirt-debuginfo-0.6.3-23.el5
libvirt-0.6.3-23.el5
libvirt-python-0.6.3-23.el5
[root@dhcp-66-70-62 ~]# rpm -qa|grep kvm
kvm-tools-83-140.el5
kvm-qemu-img-83-140.el5
etherboot-zroms-kvm-5.4.4-13.el5
kvm-83-140.el5
etherboot-roms-kvm-5.4.4-13.el5
kmod-kvm-83-140.el5
[root@dhcp-66-70-62 ~]# lsmod|grep kvm
kvm_intel              86664  0 
kvm                   223648  2 ksm,kvm_intel
[root@dhcp-66-70-62 ~]#

Comment 6 Chris Lalancette 2010-01-06 20:50:29 UTC
(In reply to comment #5)
> This bug still exist with libvirt-0.6.3-23.el5 on rhel5u5,and it will raise
> different result if setting managed mode to 'yes' for xen and hypervisor:
> 
> 1.xen hypervisor
> The function can't be supported by the xen hypervisor,it will raise a error
> message:

OK, that part is expected; managed mode is not supported under Xen.  You always have to use unmanaged mode.  So we can ignore this part of the problem.

> 2.kvm hypervisor
> It can't still automatically re-attach the assigned device after guest be shut
> down,see following steps to reproduce:
> 
> [root@dhcp-66-70-62 ~]# virsh nodedev-list --tree
> computer
>   |
>   +-pci_8086_10bd
>   |   |
>   |   +-net_00_23_ae_6f_f1_d7
>   |
>   +-pci_8086_244e
>   +-pci_8086_2914
> ......
> 
> [root@dhcp-66-70-62 ~]# virsh nodedev-dumpxml pci_8086_10bd
> <device>
>   <name>pci_8086_10bd</name>
>   <parent>computer</parent>
>   <capability type='pci'>
>     <domain>0</domain>
>     <bus>0</bus>
>     <slot>25</slot>
>     <function>0</function>
>     <product id='0x10bd'>82566DM-2 Gigabit Network Connection</product>
>     <vendor id='0x8086'>Intel Corporation</vendor>
>   </capability>
> </device>
> 
> [root@dhcp-66-70-62 ~]# readlink /sys/bus/pci/devices/0000\:00\:19.0/driver/ -f
> /sys/bus/pci/drivers/e1000e
> 
> [root@dhcp-66-70-62 ~]# virsh edit rhel5u5
> Domain rhel5u5 XML configuration edited.
> 
> [root@dhcp-66-70-62 ~]# virsh dumpxml rhel5u5
> <domain type='qemu'>
>   <name>rhel5u5</name>
>   <uuid>c0e805d9-e288-e4e8-2357-13b39d57cb6e</uuid>
>   <memory>1048576</memory>
>   <currentMemory>1048576</currentMemory>
>   <vcpu>1</vcpu>
>   <os>
>     <type arch='x86_64' machine='pc'>hvm</type>
>     <boot dev='hd'/>
>   </os>
>   <features>
>     <acpi/>
>     <apic/>
>     <pae/>
>   </features>
>   <clock offset='utc'/>
>   <on_poweroff>destroy</on_poweroff>
>   <on_reboot>restart</on_reboot>
>   <on_crash>restart</on_crash>
>   <devices>
>     <emulator>/usr/libexec/qemu-kvm</emulator>
>     <disk type='file' device='disk'>
>       <source file='/var/lib/libvirt/images/rhel5u5.img'/>
>       <target dev='hda' bus='ide'/>
>     </disk>
>     <interface type='network'>
>       <mac address='54:52:00:49:37:65'/>
>       <source network='default'/>
>     </interface>
>     <serial type='pty'>
>       <target port='0'/>
>     </serial>
>     <console type='pty'>
>       <target port='0'/>
>     </console>
>     <input type='mouse' bus='ps2'/>
>     <graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/>
>     <hostdev mode='subsystem' type='pci' managed='yes'>
>       <source>
>         <address domain='0x0000' bus='0x00' slot='0x19' function='0x0'/>
>       </source>
>     </hostdev>
>   </devices>
> </domain>
> 
> [root@dhcp-66-70-62 ~]# virsh start rhel5u5
> Domain rhel5u5 started
> 
> [root@dhcp-66-70-62 ~]# readlink /sys/bus/pci/devices/0000\:00\:19.0/driver/ -f
> /sys/bus/pci/drivers/pci-stub
> 
> [root@dhcp-66-70-62 ~]# ifconfig eth0
> eth0: error fetching interface information: Device not found
> 
> [root@dhcp-66-70-62 ~]# virsh shutdown rhel5u5
> Domain rhel5u5 is being shutdown

Actually, I run into a problem right here.  At the end of shutdown, libvirtd crashes on me with this stack trace:

#0  qemuCheckPciHostDevice (conn=0x0, owner_vm=0x17a75f60, dev=0x17a7a6d0)
    at qemu_driver.c:1233
#1  0x00002b4f28ec906d in pciResetDevice (conn=0x0, vm=0x17a75f60, 
    dev=0x17a7a6d0, check=0x41f6a0 <qemuCheckPciHostDevice>) at pci.c:647
#2  0x0000000000420b68 in qemuDomainReAttachHostDevices (
    vm=<value optimized out>, conn=<value optimized out>) at qemu_driver.c:1378
#3  qemudShutdownVMDaemon (vm=<value optimized out>, 
    conn=<value optimized out>) at qemu_driver.c:1711
#4  0x000000000042cd23 in qemudDispatchVMEvent (watch=10, fd=16, events=12, 
    opaque=<value optimized out>) at qemu_driver.c:1779
#5  0x000000000040e10f in virEventDispatchHandles (fds=<value optimized out>, 
    nfds=<value optimized out>) at event.c:451
#6  virEventRunOnce (fds=<value optimized out>, nfds=<value optimized out>)
    at event.c:578
#7  0x000000000040f478 in qemudOneLoop () at qemud.c:2079
#8  qemudRunLoop () at qemud.c:2184
#9  0x0000000000413981 in main (argc=396645872, argv=<value optimized out>)
    at qemud.c:2956

And looking at the code, it makes sense.  qemudShutdownVMDaemon() calls qemudDomainReAttachHostDevices() with a NULL conn parameter, but qemuCheckPciHostDevice() wants to dereference the conn parameter to get at the device list.  This causes the crash.  Upstream reverted the fix that's currently in RHEL-5 libvirt in favor of another way of doing this.  I'm going to attempt to backport the new code that upstream is using and see if that works better.

Chris Lalancette

Comment 7 Chris Lalancette 2010-01-07 18:09:02 UTC
Created attachment 382295 [details]
Upstream libvirt PCI hotplug code

This is the backport of the upstream code that I'm currently testing.  Mark, can you please give it a once over and make sure that it makes sense?

Thanks,
Chris Lalancette

Comment 8 Chris Lalancette 2010-01-07 18:12:42 UTC
Also, I've uploaded test packages here:

http://people.redhat.com/clalance/bz500217

Ajia, can you test out those packages and see if it works in your scenario?

Thanks,
Chris Lalancette

Comment 9 Alex Jia 2010-01-08 07:53:29 UTC
Hi,Chris,I has already installed these packages and retest it,but some error messages were raised:

[root@dhcp-66-70-62 ~]# virsh start rhel5u5_x86_64_qemu
error: Failed to start domain rhel5u5_x86_64_qemu
error: this function is not supported by the hypervisor: Failed to find parent device for 0000:00:19.0

and virt-manager also met a problem,"Device" drop list content is empty,although I reboot libvirtd service.the issue is resolved until I reboot host.

Comment 10 Alex Jia 2010-01-08 07:56:18 UTC
Created attachment 382409 [details]
steps to reproduce

Comment 11 Chris Lalancette 2010-01-08 21:56:24 UTC
(In reply to comment #9)
> Hi,Chris,I has already installed these packages and retest it,but some error
> messages were raised:
> 
> [root@dhcp-66-70-62 ~]# virsh start rhel5u5_x86_64_qemu
> error: Failed to start domain rhel5u5_x86_64_qemu
> error: this function is not supported by the hypervisor: Failed to find parent
> device for 0000:00:19.0

Hm, OK, odd.  I'm not seeing that on my machine, so there's probably something different.  Can you try some things for me:

1)  Start up libvirtd and virsh with more debugging.  In particular, do this:
    # service libvirtd stop
    # LIBVIRT_DEBUG=1 /usr/sbin/libvirtd --verbose
    <in another terminal>
    # LIBVIRT_DEBUG=1 virsh start rhel5u5_x86_64_qemu

    Then attach the output from both of those commands to this bug.

2)  Can you give me information so that I can access the machine you are working with?  If I can work directly on that machine, I can probably solve it a bit faster.

Thanks,
Chris Lalancette

Comment 12 Alex Jia 2010-01-11 03:11:17 UTC
(In reply to comment #11)
> (In reply to comment #9)
> > Hi,Chris,I has already installed these packages and retest it,but some error
> > messages were raised:
> > 
> > [root@dhcp-66-70-62 ~]# virsh start rhel5u5_x86_64_qemu
> > error: Failed to start domain rhel5u5_x86_64_qemu
> > error: this function is not supported by the hypervisor: Failed to find parent
> > device for 0000:00:19.0
> 
> Hm, OK, odd.  I'm not seeing that on my machine, so there's probably something
> different.  Can you try some things for me:
> 
> 1)  Start up libvirtd and virsh with more debugging.  In particular, do this:
>     # service libvirtd stop
>     # LIBVIRT_DEBUG=1 /usr/sbin/libvirtd --verbose
>     <in another terminal>
>     # LIBVIRT_DEBUG=1 virsh start rhel5u5_x86_64_qemu
> 
>     Then attach the output from both of those commands to this bug.
Hi,Chris,I retest it according the above steps,but guest define be destroyed oddly,and output information is too long,so I haven't added it as a attachment instead of providing my machine with you,and I will always keep the machine running,so that you can log on it at any time. 
> 
> 2)  Can you give me information so that I can access the machine you are
> working with?  If I can work directly on that machine, I can probably solve it
> a bit faster.
> 
> Thanks,
> Chris Lalancette   

Chris,not at all,I will provide my test environment with you,so that you can solve the issue quickly.please check you email for the login information.

Comment 13 Chris Lalancette 2010-01-15 22:07:18 UTC
OK.  I went through this issue with Don Dutile, and after explaining to him the problem I've been having along with showing him the data sheet for this particular machine (Dell Optiplex 755) he told me that this particular platform does *not* support PCI passthrough.  That is, the BIOS has fake DMAR tables that don't actually work, and therefore when trying to do the PCI passthrough I get the crashes that I'm seeing.

Alex, are you sure that PCI passthrough has ever worked?  It seems like it won't work with this BIOS/chipset, which would explain all of the problems I've been having trying to reproduce the issue on your machine.

If you are sure it has worked, then I absolutely need for you to setup a serial console attached to that machine so I can figure out why it is crashing.  I cannot make progress otherwise.

Chris Lalancette

Comment 14 Alex Jia 2010-01-17 05:16:16 UTC
I am not sure PCI passthrough has ever worked,but the Dell Optiplex 755 should support the feature,because the machine with VT-d support and I enable the option in BIOS,meanwhile I also add intel_iommu=on parameter option into kernel cmd,but the same error is still raised:
[root@dhcp-66-70-62 ~]# virsh start rhel5u5_x86_64_kvm
error: Failed to start domain rhel5u5_x86_64_kvm
error: this function is not supported by the hypervisor: Failed to find parent device for 0000:00:19.0


additional information:
[root@dhcp-66-70-62 ~]# uname -a
Linux dhcp-66-70-62.nay.redhat.com 2.6.18-185.el5 #1 SMP Thu Jan 14 16:44:40 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

[root@dhcp-66-70-62 ~]# lsmod|grep kvm
kvm_intel              86664  0 
kvm                   223648  2 ksm,kvm_intel

[root@dhcp-66-70-62 ~]# rpm -qa|grep libvirt
libvirt-python-0.6.3-29.el5bz500217
libvirt-0.6.3-29.el5bz500217

[root@dhcp-66-70-62 ~]# dmesg|grep -i iommu
Command line: ro root=LABEL=/ intel_iommu=on
Kernel command line: ro root=LABEL=/ intel_iommu=on
Intel-IOMMU: enabled
IOMMU fedae000: ver 1:0 cap c9008020a30270 ecap 1000
IOMMU fedb0000: ver 1:0 cap c0000020230270 ecap 1000
IOMMU fedb1000: ver 1:0 cap c9008020230270 ecap 1000
IOMMU 0xfedb0000: using Register based invalidation
IOMMU 0xfedae000: using Register based invalidation
IOMMU 0xfedb1000: using Register based invalidation
IOMMU: Setting RMRR:
IOMMU: Setting identity map for device 0000:00:1d.0 [0xbfe58000 - 0xbfe70000]
IOMMU: Setting identity map for device 0000:00:1d.1 [0xbfe58000 - 0xbfe70000]
IOMMU: Setting identity map for device 0000:00:1d.2 [0xbfe58000 - 0xbfe70000]
IOMMU: Setting identity map for device 0000:00:1d.7 [0xbfe58000 - 0xbfe70000]
IOMMU: Setting identity map for device 0000:00:1a.0 [0xbfe58000 - 0xbfe70000]
IOMMU: Setting identity map for device 0000:00:1a.1 [0xbfe58000 - 0xbfe70000]
IOMMU: Setting identity map for device 0000:00:1a.7 [0xbfe58000 - 0xbfe70000]
IOMMU: Prepare 0-16MiB unity mapping for LPC
IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0x1000000]

[root@dhcp-66-70-62 ~]# cat /boot/grub/grub.conf |grep iommu
        kernel /boot/vmlinuz-2.6.18-185.el5 ro root=LABEL=/ intel_iommu=on
        kernel /boot/vmlinuz-2.6.18-183.el5 ro root=LABEL=/ rhgb quiet intel_iommu=on

Comment 15 Alex Jia 2010-01-17 09:09:23 UTC
(In reply to comment #13)
> OK.  I went through this issue with Don Dutile, and after explaining to him the
> problem I've been having along with showing him the data sheet for this
> particular machine (Dell Optiplex 755) he told me that this particular platform
> does *not* support PCI passthrough.  That is, the BIOS has fake DMAR tables
> that don't actually work, and therefore when trying to do the PCI passthrough I
> get the crashes that I'm seeing.
> 
> Alex, are you sure that PCI passthrough has ever worked?  It seems like it
> won't work with this BIOS/chipset, which would explain all of the problems I've
> been having trying to reproduce the issue on your machine.
> 
> If you are sure it has worked, then I absolutely need for you to setup a serial
> console attached to that machine so I can figure out why it is crashing.  I
> cannot make progress otherwise.
> 
> Chris Lalancette    

Hi,Chris,the PCI passthrough will work if we degrade libvirt version to 0.6.3-23.el5(please refer to bug 517465).

There are some problems at present:
1.libvirt-0.6.3-23.el5 can pci passthrough,but when the guest is shutdown,the assigned device can't be re-attach automatically.

2.but pci passthrough can't succeed on libvirt-0.6.3-29.el5/alibvirt-0.6.3-30.el5, this may be a regression bug,which blocks me to continue to verify the current bug,so I am not sure the assigned device if can be re-attach automatically.

(please see the attachment information)

Comment 16 Alex Jia 2010-01-17 09:10:10 UTC
Created attachment 384894 [details]
libvirt-0.6.3-23.el5

Comment 17 Alex Jia 2010-01-17 09:10:53 UTC
Created attachment 384895 [details]
libvirt-0.6.3-30.el5

Comment 18 Alex Jia 2010-01-19 03:51:38 UTC
My machine chipset is "Intel Q35 Express Chipset",it does have VT-d support and I pass through NICs successfully with libvirt-0.6.3-23.el5 on rhel5u5.

[root@dhcp-66-70-62 ~]# lspci
00:00.0 Host bridge: Intel Corporation 82Q35 Express DRAM Controller (rev 02)
00:01.0 PCI bridge: Intel Corporation 82Q35 Express PCI Express Root Port (rev 02)
00:03.0 Communication controller: Intel Corporation 82Q35 Express MEI Controller (rev 02)
00:03.2 IDE interface: Intel Corporation 82Q35 Express PT IDER Controller (rev 02)
00:03.3 Serial controller: Intel Corporation 82Q35 Express Serial KT Controller (rev 02)
00:19.0 Ethernet controller: Intel Corporation 82566DM-2 Gigabit Network Connection (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IO (ICH9DO) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: ATI Technologies Inc RV610 video device [Radeon HD 2400 PRO]

I will change another HPz800 machine to verify the issue again.

Comment 19 Chris Lalancette 2010-01-19 21:59:42 UTC
Alex,
     I've found out quite a few new things today.  Let me tell you what I've found:

1)  You are right, that Dell Optiplex 755 *does* support device passthrough.  So that is a fine box to continue testing with.
2)  The reason that I kept seeing the box "crash" when I tried to test out your steps is that you were assigning the one and only NIC to the guest.  While that's a fine test for you to do locally, it doesn't work over ssh :).
3)  There was a bug in the libvirt code, not related to my latest patch, that was causing your issues.  I've put together another patch to fix that bug now.

So, with that being said, please download the new packages from:

http://people.redhat.com/clalance/bz500217

And install them on the Dell Optiplex 755.  Don't forget to do "service libvirtd restart" or reboot the machine after you have installed them.  Once they are installed, please repeat your steps from Comment #5 and see if it works for you.

Thanks,
Chris Lalancette

Comment 20 Alex Jia 2010-01-20 05:38:49 UTC
Hi,Chris,I retest it with libvirt-0.6.3-30.el5bz500217.x86_64 on Dell Optiplex 755 machine,and the NICs passthrough is successful,but the assigned NICs can't re-attach automatically after guest shutdown.I think libvirt haven't checked domain "managed" mode when the guest is shutdown.so libvirt also haven't called qemudNodeDeviceReAttach and qemudNodeDeviceReset functions return the device to host.please refer to attachment.

Alex

Comment 21 Alex Jia 2010-01-20 06:19:58 UTC
Hi,Chris,I retest it with libvirt-0.6.3-30.el5bz500217.x86_64 on Dell Optiplex 755 machine,and the NICs passthrough is successful,but the assigned NICs can't re-attach automatically after guest shutdown.I think libvirt haven't checked domain "managed" mode when the guest is shutdown.so libvirt also haven't called qemudNodeDeviceReAttach and qemudNodeDeviceReset functions return the device to host.please refer to attachment.

Alex

Comment 22 Alex Jia 2010-01-20 06:53:21 UTC
Hi,Chris,I retest it with libvirt-0.6.3-30.el5bz500217.x86_64 on Dell Optiplex 755 machine,and the NICs passthrough is successful,but the assigned NICs can't re-attach automatically after guest shutdown.I think libvirt haven't checked domain "managed" mode when the guest is shutdown.so libvirt also haven't called qemudNodeDeviceReAttach and qemudNodeDeviceReset functions return the device to host.please refer to attachment.

Alex

Comment 23 Alex Jia 2010-01-20 06:54:40 UTC
Created attachment 385597 [details]
the detail

Comment 24 Chris Lalancette 2010-01-20 21:04:07 UTC
Alex,
     Thanks for the additional testing.  I'd like to ask you to do two things:

1)  Re-run the test exactly as you did in Comment #23.  When you reach the end of the test, collect the output from "dmesg" and attach it to this bug.

2)  Once you've done step 1), I have another test package for you to try out.  This one is at:

http://people.redhat.com/clalance/bz500217v2

Please download those packages, install them on your test machine, restart libvirtd, and then re-run the test in Comment #23 again.

Thanks,
Chris Lalancette

Comment 25 Alex Jia 2010-01-21 06:49:51 UTC
Chris,I retest it with libvirt-0.6.3-30.el5bz500217v2.x86_64 on Dell Optiplex
755 machine,everything is ok,the assigned NICs also can re-attach automatically after guest shutdown.if no question,I will set the bug status to VERIFIED via errata.

Comment 26 Alex Jia 2010-01-21 06:51:57 UTC
Created attachment 385866 [details]
dmesg information

Comment 27 Alex Jia 2010-01-21 06:53:16 UTC
Created attachment 385868 [details]
verified information

Comment 28 Chris Lalancette 2010-01-21 15:52:23 UTC
(In reply to comment #25)
> Chris,I retest it with libvirt-0.6.3-30.el5bz500217v2.x86_64 on Dell Optiplex
> 755 machine,everything is ok,the assigned NICs also can re-attach automatically
> after guest shutdown.if no question,I will set the bug status to VERIFIED via
> errata.    

No, we can't set it to VERIFIED yet; the patches I have in here are proposed, and not yet built into the official package.  I'm going to be posting the patches for review and discussion today, and then hopefully we can get it into RHEL-5 soon.

Chris Lalancette

Comment 29 Alex Jia 2010-01-22 04:34:19 UTC
Chris,I know and I will wait for libvirt respin and retest it.

Alex Jia

Comment 30 Jiri Denemark 2010-01-27 09:03:39 UTC
*** Bug 558779 has been marked as a duplicate of this bug. ***

Comment 31 Jiri Denemark 2010-01-28 16:17:50 UTC
Fix built in libvirt-0.6.3-31.el5

Comment 33 Alex Jia 2010-01-29 02:30:24 UTC
This bug has been fixed with libvirt-0.6.3-31.el5 on rhel5u5,set the bug status to VERIFIED.(see the attachment)

Comment 34 Alex Jia 2010-01-29 02:31:10 UTC
Created attachment 387476 [details]
details

Comment 37 errata-xmlrpc 2010-03-30 08:09:23 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0205.html


Note You need to log in before you can comment on or make changes to this bug.