This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 957416 - Libvirt daemon crash while attaching VF interface or dumping PF interface using nodedev-dumpxml.
Libvirt daemon crash while attaching VF interface or dumping PF interface usi...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.0
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Laine Stump
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-04-27 23:49 EDT by Hu Jianwei
Modified: 2014-06-17 20:49 EDT (History)
8 users (show)

See Also:
Fixed In Version: libvirt-1.0.5-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 09:06:08 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Hu Jianwei 2013-04-27 23:49:56 EDT
Description of problem:
Libvirt daemon crash while attaching VF interface or dumping PF interface using nodedev-dumpxml.

Version-Release number of selected component (if applicable):
libvirt-1.0.4-1.1.el7.x86_64
qemu-kvm-1.4.0-3.el7.x86_64
kernel-3.9.0-0.rc8.54.el7.x86_64

How reproducible:
100%

Steps:

Setup
On SR-IOV1 or SR-IOV2 server:

1.Need the following steps:
modprobe -r kvm_intel
modprobe -r kvm
modprobe kvm allow_unsafe_assigned_interrupts=1
modprobe kvm_intel

2. Generate VFs
modprobe -r igb
modprobe igb max_vfs=2

3. [root@#localhost ~]# lspci |grep Ethernet
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5764M Gigabit Ethernet PCIe (rev 10)
03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)

Reproduced steps:
1. Create a domain with below VF interface.

<interface type='hostdev' managed='yes'>
      <mac address='52:54:00:e0:2c:31'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>


2. [root@#localhost ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rhel6_local                    shut off
 

3. [root@#localhost ~]# virsh start rhel6_local
error: Failed to start domain rhel6_local
error: End of file while reading data: Input/output error
error: Failed to reconnect to the hypervisor

4. [root@#localhost ~]# virsh list --all
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused

5. [root@SRIOV2 ~]# systemctl status libvirtd
libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled)
   Active: failed (Result: signal) since Sat 2013-04-27 05:12:34 EDT; 5min ago
  Process: 5738 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=killed, signal=SEGV)
   CGroup: name=systemd:/system/libvirtd.service
           └─1200 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf

Apr 27 05:12:23 SRIOV2.qe.lab.eng.nay.redhat.com systemd[1]: Started Virtualization daemon.
Apr 27 05:12:30 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[1200]: read /etc/hosts - 3 addresses
Apr 27 05:12:30 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[1200]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Apr 27 05:12:30 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq-dhcp[1200]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Apr 27 05:12:34 SRIOV2.qe.lab.eng.nay.redhat.com systemd[1]: libvirtd.service: main process exited, code=killed, status=11/SEGV
Apr 27 05:12:34 SRIOV2.qe.lab.eng.nay.redhat.com systemd[1]: MESSAGE=Unit libvirtd.service entered failed state.


Another reprodued steps:

1. [root@#localhost ~]# virsh nodedev-list --tree
...
  |   |
  |   +- pci_0000_03_00_0
  |   |   |
  |   |   +- net_eth1_00_1b_21_39_8b_18
  |   |    
  |   +- pci_0000_03_00_1
  |   |   |
  |   |   +- net_eth2_00_1b_21_39_8b_19
  |   |    
  |   +- pci_0000_03_10_0
  |   +- pci_0000_03_10_1
  |   +- pci_0000_03_10_2
  |   |   |
  |   |   +- net_p1p1_1_8a_e9_09_2b_1a_93
  |   |    
  |   +- pci_0000_03_10_3
  |       |
  |       +- net_p1p2_1_0a_6f_3b_d0_c0_a7
  |        
  +- pci_0000_00_03_0
  |   |
  |   +- pci_0000_0f_00_0
  |    
...  
    
2. [root@#localhost ~]# virsh nodedev-dumpxml pci_0000_03_00_0
error: End of file while reading data: Input/output error
error: Failed to reconnect to the hypervisor
[root@#localhost ~]#

Actual results:
Libvirt daemon crash, only restarting libvirtd to solved this issue.

Expected results:
The domain with VF interface should boot up normally, and the libvirt daemon do not crash. The nodedev-dumpxml command can dump the PF information, can not affect the libvirtd. BTW, nodedev-dumpxml can dump xml for VF normally in rhel7.
Comment 2 Alex Jia 2013-04-28 03:25:20 EDT
This device should have no PM reset capability, for details, please see following debug information:


Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: internal error Unable to reset PCI device 0000:00:07.0: no FLR, PM reset or bus reset available
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: Caught Segmentation violation dumping internal log buffer:

<ignore/>

Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virPCIDeviceConfigOpen:207 : 8086 10ca 0000:03:10.1: opened /sys/bus/pci/devices/0000:03:10.1/config
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virPCIDeviceFindCapabilityOffset:395 : 8086 10ca 0000:03:10.1: found cap 0x10 at 0xa0
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virPCIDeviceFindCapabilityOffset:402 : 8086 10ca 0000:03:10.1: failed to find cap 0x01
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virPCIDeviceDetectFunctionLevelReset:453 : 8086 10ca 0000:03:10.1: detected PCIe FLR capability
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virPCIDeviceDetectPowerManagementReset:513 : 8086 10ca 0000:03:10.1: no PM reset capability found
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virFileClose:72 : Closed fd 22
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virPCIDeviceConfigOpen:207 : 8086 340e 0000:00:07.0: opened /sys/bus/pci/devices/0000:00:07.0/config
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virPCIDeviceFindCapabilityOffset:395 : 8086 340e 0000:00:07.0: found cap 0x10 at 0x90
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.534+0000: 6364: debug : virPCIDeviceFindCapabilityOffset:395 : 8086 340e 0000:00:07.0: found cap 0x01 at 0xe0
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.534+0000: 6364: debug : virPCIDeviceFindCapabilityOffset:402 : 8086 340e 0000:00:07.0: failed to find cap 0x13
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.534+0000: 6364: debug : virPCIDeviceDetectFunctionLevelReset:490 : 8086 340e 0000:00:07.0: no FLR capability found
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.534+0000: 6364: debug : virPCIDeviceDetectPowerManagementReset:513 : 8086 340e 0000:00:07.0: no PM reset capability found
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.534+0000: 6364: error : virPCIDeviceReset:829 : internal error Unable to reset PCI device 0000:00:07.0: no FLR, PM reset or bus reset available
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.534+0000: 6364: debug : virFileClose:72 : Closed fd 22

<ignore/>
Comment 3 Alex Jia 2013-04-28 04:02:36 EDT
I will try to fix this NULL pointer derefer issue.


(gdb) bt
#0  virPCIGetVirtualFunctionIndex (pf_sysfs_device_link=0x7fc04400f470 "/sys/bus/pci/devices/0000:03:00.1", vf_sysfs_device_link=<optimized out>, vf_index=vf_index@entry=0x7fc06897b8f4)
    at util/virpci.c:2107
#1  0x00007fc0785dcacf in virPCIGetVirtualFunctionInfo (vf_sysfs_device_path=<optimized out>, pfname=pfname@entry=0x7fc06897b8f8, vf_index=vf_index@entry=0x7fc06897b8f4) at util/virpci.c:2217
#2  0x00007fc062672ed0 in qemuDomainHostdevNetDevice (hostdev=hostdev@entry=0x7fc05c345048, linkdev=linkdev@entry=0x7fc06897b8f8, vf=vf@entry=0x7fc06897b8f4) at qemu/qemu_hostdev.c:257
#3  0x00007fc0626735a1 in qemuDomainHostdevNetConfigRestore (hostdev=0x7fc05c345048, stateDir=0x7fc05c012250 "/var/run/libvirt/qemu") at qemu/qemu_hostdev.c:400
#4  0x00007fc0626746ba in qemuDomainReAttachHostdevDevices (driver=driver@entry=0x7fc05c104860, name=<optimized out>, hostdevs=0x7fc05c344fa0, nhostdevs=2) at qemu/qemu_hostdev.c:922
#5  0x00007fc0626748c3 in qemuDomainReAttachHostDevices (driver=0x7fc05c104860, def=0x7fc05c332160) at qemu/qemu_hostdev.c:1018
#6  0x00007fc0626842fc in qemuProcessStop (driver=driver@entry=0x7fc05c104860, vm=vm@entry=0x7fc05c335e70, reason=reason@entry=VIR_DOMAIN_SHUTOFF_FAILED, flags=flags@entry=2) at qemu/qemu_process.c:4051
#7  0x00007fc0626860c6 in qemuProcessStart (conn=conn@entry=0x7fc058000ad0, driver=driver@entry=0x7fc05c104860, vm=vm@entry=0x7fc05c335e70, migrateFrom=migrateFrom@entry=0x0, stdin_fd=stdin_fd@entry=-1, 
    stdin_path=stdin_path@entry=0x0, snapshot=snapshot@entry=0x0, vmop=vmop@entry=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=<optimized out>, flags@entry=1) at qemu/qemu_process.c:3859
#8  0x00007fc0626ccae6 in qemuDomainObjStart (conn=0x7fc058000ad0, driver=driver@entry=0x7fc05c104860, vm=vm@entry=0x7fc05c335e70, flags=flags@entry=0) at qemu/qemu_driver.c:5458
#9  0x00007fc0626cd09a in qemuDomainStartWithFlags (dom=0x7fc0440008c0, flags=0) at qemu/qemu_driver.c:5514
#10 0x00007fc07865de07 in virDomainCreate (domain=domain@entry=0x7fc0440008c0) at libvirt.c:8450
#11 0x00007fc0790440d7 in remoteDispatchDomainCreate (server=<optimized out>, msg=<optimized out>, args=<optimized out>, rerr=0x7fc06897cc90, client=0x7fc0796bbcd0) at remote_dispatch.h:1066
#12 remoteDispatchDomainCreateHelper (server=<optimized out>, client=0x7fc0796bbcd0, msg=<optimized out>, rerr=0x7fc06897cc90, args=<optimized out>, ret=<optimized out>) at remote_dispatch.h:1044
#13 0x00007fc0786b1527 in virNetServerProgramDispatchCall (msg=0x7fc0796bae90, client=0x7fc0796bbcd0, server=0x7fc0796b0400, prog=0x7fc0796b66e0) at rpc/virnetserverprogram.c:439
#14 virNetServerProgramDispatch (prog=0x7fc0796b66e0, server=server@entry=0x7fc0796b0400, client=0x7fc0796bbcd0, msg=0x7fc0796bae90) at rpc/virnetserverprogram.c:305
#15 0x00007fc0786ac738 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7fc0796b0400) at rpc/virnetserver.c:162
#16 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7fc0796b0400) at rpc/virnetserver.c:183
#17 0x00007fc0785e5235 in virThreadPoolWorker (opaque=opaque@entry=0x7fc07968af70) at util/virthreadpool.c:144
#18 0x00007fc0785e4cc1 in virThreadHelper (data=<optimized out>) at util/virthreadpthread.c:161
#19 0x00007fc075e98c53 in start_thread () from /lib64/libpthread.so.0
#20 0x00007fc0757beecd in clone () from /lib64/libc.so.6
Comment 4 Alex Jia 2013-04-28 06:12:29 EDT
Patch on upstream:
http://www.redhat.com/archives/libvir-list/2013-April/msg01995.html
Comment 5 Laine Stump 2013-04-29 11:03:36 EDT
(In reply to comment #4)
> Patch on upstream:
> http://www.redhat.com/archives/libvir-list/2013-April/msg01995.html

That patch is incorrect (reasons in my reply to the email).

The problem was already found and solved upstream post-1.0.4. Here is the correct patch:

http://libvirt.org/git/?p=libvirt.git;a=commit;h=9579b6bc209b46a0f079b21455b598c817925b48
Comment 6 Hu Jianwei 2013-05-06 04:26:20 EDT
Can reproduce the bug in libvirt-1.0.4-1.1.el7.x86_64, but can not reproduce in libvirt-1.0.5-1.el7.x86_64.

1. Create a domain with the following VF xml.
    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:84:33:71'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x0a' slot='0x10' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>


2. Start the rhel7
[root@SRIOV2 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 14    rhel7                          running

[root@SRIOV2 ~]# virsh dumpxml rhel7
...
    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:84:33:71'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x0a' slot='0x10' function='0x1'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
...

3. Dump pci node. 
[root@SRIOV2 ~]# virsh nodedev-list --tree
...
 +- pci_0000_00_1c_6
  |   |
  |   +- pci_0000_07_00_0
  |       |
  |       +- pci_0000_08_02_0
  |       |   |
  |       |   +- pci_0000_09_00_0
  |       |   |   |
  |       |   |   +- net_p1p1_00_1b_21_55_b3_b8
  |       |   |     
  |       |   +- pci_0000_09_00_1
  |       |   |   |
  |       |   |   +- net_eth3_00_1b_21_55_b3_b9
  |       |   |     
  |       |   +- pci_0000_0a_10_0
  |       |   |   |
  |       |   |   +- net_p1p1_0_d2_23_65_b4_70_44
...
[root@SRIOV2 ~]# virsh nodedev-dumpxml pci_0000_09_00_1
<device>
  <name>pci_0000_09_00_1</name>
  <parent>pci_0000_08_02_0</parent>
  <driver>
    <name>igb</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>9</bus>
    <slot>0</slot>
    <function>1</function>
    <product id='0x10e8'>82576 Gigabit Network Connection</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <capability type='virt_functions'>
      <address domain='0x0000' bus='0x0a' slot='0x10' function='0x1'/>
      <address domain='0x0000' bus='0x0a' slot='0x10' function='0x3'/>
    </capability>
  </capability>
</device>
[root@SRIOV2 ~]# virsh nodedev-dumpxml pci_0000_0a_10_0
<device>
  <name>pci_0000_0a_10_0</name>
  <parent>pci_0000_08_02_0</parent>
  <driver>
    <name>igbvf</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>10</bus>
    <slot>16</slot>
    <function>0</function>
    <product id='0x10ca'>82576 Virtual Function</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <capability type='phys_function'>
      <address domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </capability>
  </capability>
</device>
4. After above commands, to check the libvirtd status.

[root@SRIOV2 ~]# systemctl status libvirtd
libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled)
   Active: active (running) since Mon 2013-05-06 02:01:15 EDT; 1h 33min ago
 Main PID: 6917 (libvirtd)
   CGroup: name=systemd:/system/libvirtd.service
           ├─1778 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf
           ├─6565 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br1.conf
           ├─6917 /usr/sbin/libvirtd
           └─8461 /usr/libexec/qemu-kvm -name rhel7 -S -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu qemu64,-kvmclock -bios /usr/share...

May 06 03:29:25 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[6565]: using local addresses only for unqualified names
May 06 03:29:25 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[1778]: using local addresses only for unqualified names
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[6565]: reading /etc/resolv.conf
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[1778]: reading /etc/resolv.conf
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[1778]: using nameserver 10.68.5.26#53
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[1778]: using nameserver 10.66.127.17#53
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[1778]: using local addresses only for unqualified names
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[6565]: using nameserver 10.68.5.26#53
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[6565]: using nameserver 10.66.127.17#53
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[6565]: using local addresses only for unqualified names

We can get expected results and the libvirt daemon does not crash, running normally.
Comment 7 Huang Wenlong 2013-05-06 04:30:19 EDT
set Verified according to comment 6
Comment 8 Ludek Smid 2014-06-13 09:06:08 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.