Bug 513317 - PCI passthrough with kvm guest cause libvirtd dead
Summary: PCI passthrough with kvm guest cause libvirtd dead
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libvirt
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Daniel Berrangé
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-07-23 02:05 UTC by zhanghaiyan
Modified: 2009-12-14 21:17 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 09:22:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
gdb.log (3.77 KB, text/plain)
2009-07-24 05:23 UTC, zhanghaiyan
no flags Details
test1-kvm.xml (1.30 KB, text/plain)
2009-07-24 05:24 UTC, zhanghaiyan
no flags Details
test1.log (12.19 KB, text/plain)
2009-07-24 05:24 UTC, zhanghaiyan
no flags Details
nodedev-list (2.40 KB, text/plain)
2009-07-24 05:25 UTC, zhanghaiyan
no flags Details
nodedev-dumpxml (328 bytes, text/plain)
2009-07-24 05:25 UTC, zhanghaiyan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:1269 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2009-09-01 09:31:21 UTC

Description zhanghaiyan 2009-07-23 02:05:26 UTC
Description of problem:
PCI passthrough with kvm guest fail and can cause libvirtd dead

Version-Release number of selected component (if applicable):
- libvirt-0.6.3-15.el5
- xen-3.0.3-90.el5
- kvm-83-90.el5
- rhel-5.4 (2.6.18-158.el5)

How reproducible:
100%

Steps to Reproduce:
1.# virsh nodedev-dettach pci_8086_10bd
Device pci_8086_10bd dettached

2.# virsh nodedev-reset pci_8086_10bd
Device pci_8086_10bd reset

3.# virsh edit demo
Domain demo XML configuration edited.

        <hostdev mode='subsystem' type='pci' managed='no'>
          <source>
           <address bus='0x00' slot='0x25' function='0x00'/>
          </source>
        </hostdev>

4. # virsh start demo
error: Failed to start domain demo
error: server closed connection

5.# virsh
error: unable to connect to '/var/run/libvirt/libvirt-sock': Connection refused
error: failed to connect to the hypervisor

6.# service libvirtd status
libvirtd dead but pid file exists

Actual results:
PCI passthrough fail and cause libvirtd dead

Expected results:
PCI passthrough success

Additional info:

Comment 1 Daniel Berrangé 2009-07-23 10:03:44 UTC
I can't reproduce this. Can you attempt to capture a stack trace,

- Install libvirt-debuginfo RPM
- Run 'service libvirtd start'
- Run 'ps -auxfw | grep libvirtd' to find the PID of the libvirtd process
- Start 'gdb'
- In the gdb console, type 'attach <PID-OF-LIBVIRTD>' and then 'cont'


Now in another console attempt to run your test to make libvirtd crash.
When it crashes, go back to the GDB console and type

 'thread apply all backtrace'

And then upload all the data from that as an attachment to this bug.


Can you also provide the output of

 'virsh nodedev-list --tree'

And 

  'virsh nodedev-dumpxml pci_8086_10bd'

And finally, the full XML config of the guest, and any /var/log/libvirt/qemu/demo.log that may exist

Comment 2 zhanghaiyan 2009-07-24 05:21:57 UTC
Now, the test result is a little different
after step4 # virsh start demo
It hangs.

Attached gdb.log
         nodedev-list
         nodedev-dumpxml
         kvm-test1.xml
         test1.log

Comment 3 zhanghaiyan 2009-07-24 05:23:42 UTC
Created attachment 354972 [details]
gdb.log

Comment 4 zhanghaiyan 2009-07-24 05:24:12 UTC
Created attachment 354973 [details]
test1-kvm.xml

Comment 5 zhanghaiyan 2009-07-24 05:24:33 UTC
Created attachment 354974 [details]
test1.log

Comment 6 zhanghaiyan 2009-07-24 05:25:02 UTC
Created attachment 354975 [details]
nodedev-list

Comment 7 zhanghaiyan 2009-07-24 05:25:26 UTC
Created attachment 354976 [details]
nodedev-dumpxml

Comment 8 Mark McLoughlin 2009-07-24 11:23:37 UTC
excerpt from stack trace:

#0  0x000000386747268e in free () from /lib64/libc.so.6
#1  0x0000003a66e1865c in virFree (ptrptr=<value optimized out>)
    at memory.c:177
#2  0x0000003a66e1890a in pciReadDeviceID (dev=<value optimized out>, 
---Type <return> to continue, or q <return> to quit---
    id_name=<value optimized out>) at pci.c:839
#3  0x0000003a66e18a01 in pciGetDevice (conn=<value optimized out>, 
    domain=<value optimized out>, bus=<value optimized out>, 
    slot=<value optimized out>, function=<value optimized out>) at pci.c:875
#4  0x0000000000420d15 in qemudStartVMDaemon (conn=0x5e30f50, 
    driver=0x5d8f950, vm=0x5e31090, migrateFrom=0x0, stdin_fd=-1)
    at qemu_driver.c:1251

Comment 9 Daniel Berrangé 2009-07-24 11:28:44 UTC
Ah, that will probably be this upstream bug fix

http://libvirt.org/git/?p=libvirt.git;a=commit;h=4a7acedd3c59a6a750576cb8680bc3f08fe0b52c


IIRC it triggers if you configure a PCI device that does not actually exist on the host

Comment 10 Daniel Berrangé 2009-07-24 12:17:20 UTC
Yep, confirmed here

The device being attached has a slot '25'  (decimal)

<device>
  <name>pci_8086_10bd</name>
  <parent>computer</parent>
  <capability type='pci'>
    <domain>0</domain>
    <bus>0</bus>
    <slot>25</slot>
    <function>0</function>
    <product id='0x10bd'>82566DM-2 Gigabit Network Connection</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
  </capability>
</device>


The guest XML has been configure with slot 0x25 (hexidecimal).

    <source>
        <address bus='0x00' slot='0x25' function='0x00'/>
    </source>

It is a shame the node-device XML prints decimal, but not hexidecimal when dumping XML, but that's life. Upon reading XML both domain & nodedevice XML accept any number base.


So changing the XML to

   <source>
        <address bus='00' slot='25' function='0'/>
    </source>

should avoid the crash, but clearly we should still fix this.

Comment 11 zhanghaiyan 2009-07-27 09:32:40 UTC
I tried with XML
   <source>
        <address bus='00' slot='25' function='0'/>
   </source>

YES, can passthrough PCI successfully.

Comment 13 Daniel Veillard 2009-07-28 16:03:26 UTC
libvirt-0.6.3-17.el5 has been built in dist-5E-qu-candidate with the fix

Daniel

Comment 16 Yewei Shao 2009-07-29 07:32:09 UTC
Verified on libvirt-0.6.3-15.el5, cannot reproduce this bug

Comment 17 zhanghaiyan 2009-07-29 07:33:17 UTC
Update comment #16.
Verified on libvirt-0.6.3-17.el5, cannot reproduce this bug

Comment 19 errata-xmlrpc 2009-09-02 09:22:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1269.html


Note You need to log in before you can comment on or make changes to this bug.