Bug 957416

Summary: Libvirt daemon crash while attaching VF interface or dumping PF interface using nodedev-dumpxml.
Product: Red Hat Enterprise Linux 7 Reporter: Hu Jianwei <jiahu>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: acathrow, ajia, bili, cwei, dyuan, honzhang, laine, mzhan
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.0.5-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 13:06:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hu Jianwei 2013-04-28 03:49:56 UTC
Description of problem:
Libvirt daemon crash while attaching VF interface or dumping PF interface using nodedev-dumpxml.

Version-Release number of selected component (if applicable):
libvirt-1.0.4-1.1.el7.x86_64
qemu-kvm-1.4.0-3.el7.x86_64
kernel-3.9.0-0.rc8.54.el7.x86_64

How reproducible:
100%

Steps:

Setup
On SR-IOV1 or SR-IOV2 server:

1.Need the following steps:
modprobe -r kvm_intel
modprobe -r kvm
modprobe kvm allow_unsafe_assigned_interrupts=1
modprobe kvm_intel

2. Generate VFs
modprobe -r igb
modprobe igb max_vfs=2

3. [root@#localhost ~]# lspci |grep Ethernet
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5764M Gigabit Ethernet PCIe (rev 10)
03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)

Reproduced steps:
1. Create a domain with below VF interface.

<interface type='hostdev' managed='yes'>
      <mac address='52:54:00:e0:2c:31'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>


2. [root@#localhost ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rhel6_local                    shut off
 

3. [root@#localhost ~]# virsh start rhel6_local
error: Failed to start domain rhel6_local
error: End of file while reading data: Input/output error
error: Failed to reconnect to the hypervisor

4. [root@#localhost ~]# virsh list --all
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused

5. [root@SRIOV2 ~]# systemctl status libvirtd
libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled)
   Active: failed (Result: signal) since Sat 2013-04-27 05:12:34 EDT; 5min ago
  Process: 5738 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=killed, signal=SEGV)
   CGroup: name=systemd:/system/libvirtd.service
           └─1200 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf

Apr 27 05:12:23 SRIOV2.qe.lab.eng.nay.redhat.com systemd[1]: Started Virtualization daemon.
Apr 27 05:12:30 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[1200]: read /etc/hosts - 3 addresses
Apr 27 05:12:30 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[1200]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Apr 27 05:12:30 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq-dhcp[1200]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Apr 27 05:12:34 SRIOV2.qe.lab.eng.nay.redhat.com systemd[1]: libvirtd.service: main process exited, code=killed, status=11/SEGV
Apr 27 05:12:34 SRIOV2.qe.lab.eng.nay.redhat.com systemd[1]: MESSAGE=Unit libvirtd.service entered failed state.


Another reprodued steps:

1. [root@#localhost ~]# virsh nodedev-list --tree
...
  |   |
  |   +- pci_0000_03_00_0
  |   |   |
  |   |   +- net_eth1_00_1b_21_39_8b_18
  |   |    
  |   +- pci_0000_03_00_1
  |   |   |
  |   |   +- net_eth2_00_1b_21_39_8b_19
  |   |    
  |   +- pci_0000_03_10_0
  |   +- pci_0000_03_10_1
  |   +- pci_0000_03_10_2
  |   |   |
  |   |   +- net_p1p1_1_8a_e9_09_2b_1a_93
  |   |    
  |   +- pci_0000_03_10_3
  |       |
  |       +- net_p1p2_1_0a_6f_3b_d0_c0_a7
  |        
  +- pci_0000_00_03_0
  |   |
  |   +- pci_0000_0f_00_0
  |    
...  
    
2. [root@#localhost ~]# virsh nodedev-dumpxml pci_0000_03_00_0
error: End of file while reading data: Input/output error
error: Failed to reconnect to the hypervisor
[root@#localhost ~]#

Actual results:
Libvirt daemon crash, only restarting libvirtd to solved this issue.

Expected results:
The domain with VF interface should boot up normally, and the libvirt daemon do not crash. The nodedev-dumpxml command can dump the PF information, can not affect the libvirtd. BTW, nodedev-dumpxml can dump xml for VF normally in rhel7.

Comment 2 Alex Jia 2013-04-28 07:25:20 UTC
This device should have no PM reset capability, for details, please see following debug information:


Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: internal error Unable to reset PCI device 0000:00:07.0: no FLR, PM reset or bus reset available
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: Caught Segmentation violation dumping internal log buffer:

<ignore/>

Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virPCIDeviceConfigOpen:207 : 8086 10ca 0000:03:10.1: opened /sys/bus/pci/devices/0000:03:10.1/config
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virPCIDeviceFindCapabilityOffset:395 : 8086 10ca 0000:03:10.1: found cap 0x10 at 0xa0
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virPCIDeviceFindCapabilityOffset:402 : 8086 10ca 0000:03:10.1: failed to find cap 0x01
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virPCIDeviceDetectFunctionLevelReset:453 : 8086 10ca 0000:03:10.1: detected PCIe FLR capability
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virPCIDeviceDetectPowerManagementReset:513 : 8086 10ca 0000:03:10.1: no PM reset capability found
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virFileClose:72 : Closed fd 22
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virPCIDeviceConfigOpen:207 : 8086 340e 0000:00:07.0: opened /sys/bus/pci/devices/0000:00:07.0/config
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.533+0000: 6364: debug : virPCIDeviceFindCapabilityOffset:395 : 8086 340e 0000:00:07.0: found cap 0x10 at 0x90
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.534+0000: 6364: debug : virPCIDeviceFindCapabilityOffset:395 : 8086 340e 0000:00:07.0: found cap 0x01 at 0xe0
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.534+0000: 6364: debug : virPCIDeviceFindCapabilityOffset:402 : 8086 340e 0000:00:07.0: failed to find cap 0x13
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.534+0000: 6364: debug : virPCIDeviceDetectFunctionLevelReset:490 : 8086 340e 0000:00:07.0: no FLR capability found
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.534+0000: 6364: debug : virPCIDeviceDetectPowerManagementReset:513 : 8086 340e 0000:00:07.0: no PM reset capability found
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.534+0000: 6364: error : virPCIDeviceReset:829 : internal error Unable to reset PCI device 0000:00:07.0: no FLR, PM reset or bus reset available
Apr 28 03:19:27 dhcp-66-72-126 libvirtd[6362]: 2013-04-28 07:19:27.534+0000: 6364: debug : virFileClose:72 : Closed fd 22

<ignore/>

Comment 3 Alex Jia 2013-04-28 08:02:36 UTC
I will try to fix this NULL pointer derefer issue.


(gdb) bt
#0  virPCIGetVirtualFunctionIndex (pf_sysfs_device_link=0x7fc04400f470 "/sys/bus/pci/devices/0000:03:00.1", vf_sysfs_device_link=<optimized out>, vf_index=vf_index@entry=0x7fc06897b8f4)
    at util/virpci.c:2107
#1  0x00007fc0785dcacf in virPCIGetVirtualFunctionInfo (vf_sysfs_device_path=<optimized out>, pfname=pfname@entry=0x7fc06897b8f8, vf_index=vf_index@entry=0x7fc06897b8f4) at util/virpci.c:2217
#2  0x00007fc062672ed0 in qemuDomainHostdevNetDevice (hostdev=hostdev@entry=0x7fc05c345048, linkdev=linkdev@entry=0x7fc06897b8f8, vf=vf@entry=0x7fc06897b8f4) at qemu/qemu_hostdev.c:257
#3  0x00007fc0626735a1 in qemuDomainHostdevNetConfigRestore (hostdev=0x7fc05c345048, stateDir=0x7fc05c012250 "/var/run/libvirt/qemu") at qemu/qemu_hostdev.c:400
#4  0x00007fc0626746ba in qemuDomainReAttachHostdevDevices (driver=driver@entry=0x7fc05c104860, name=<optimized out>, hostdevs=0x7fc05c344fa0, nhostdevs=2) at qemu/qemu_hostdev.c:922
#5  0x00007fc0626748c3 in qemuDomainReAttachHostDevices (driver=0x7fc05c104860, def=0x7fc05c332160) at qemu/qemu_hostdev.c:1018
#6  0x00007fc0626842fc in qemuProcessStop (driver=driver@entry=0x7fc05c104860, vm=vm@entry=0x7fc05c335e70, reason=reason@entry=VIR_DOMAIN_SHUTOFF_FAILED, flags=flags@entry=2) at qemu/qemu_process.c:4051
#7  0x00007fc0626860c6 in qemuProcessStart (conn=conn@entry=0x7fc058000ad0, driver=driver@entry=0x7fc05c104860, vm=vm@entry=0x7fc05c335e70, migrateFrom=migrateFrom@entry=0x0, stdin_fd=stdin_fd@entry=-1, 
    stdin_path=stdin_path@entry=0x0, snapshot=snapshot@entry=0x0, vmop=vmop@entry=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=<optimized out>, flags@entry=1) at qemu/qemu_process.c:3859
#8  0x00007fc0626ccae6 in qemuDomainObjStart (conn=0x7fc058000ad0, driver=driver@entry=0x7fc05c104860, vm=vm@entry=0x7fc05c335e70, flags=flags@entry=0) at qemu/qemu_driver.c:5458
#9  0x00007fc0626cd09a in qemuDomainStartWithFlags (dom=0x7fc0440008c0, flags=0) at qemu/qemu_driver.c:5514
#10 0x00007fc07865de07 in virDomainCreate (domain=domain@entry=0x7fc0440008c0) at libvirt.c:8450
#11 0x00007fc0790440d7 in remoteDispatchDomainCreate (server=<optimized out>, msg=<optimized out>, args=<optimized out>, rerr=0x7fc06897cc90, client=0x7fc0796bbcd0) at remote_dispatch.h:1066
#12 remoteDispatchDomainCreateHelper (server=<optimized out>, client=0x7fc0796bbcd0, msg=<optimized out>, rerr=0x7fc06897cc90, args=<optimized out>, ret=<optimized out>) at remote_dispatch.h:1044
#13 0x00007fc0786b1527 in virNetServerProgramDispatchCall (msg=0x7fc0796bae90, client=0x7fc0796bbcd0, server=0x7fc0796b0400, prog=0x7fc0796b66e0) at rpc/virnetserverprogram.c:439
#14 virNetServerProgramDispatch (prog=0x7fc0796b66e0, server=server@entry=0x7fc0796b0400, client=0x7fc0796bbcd0, msg=0x7fc0796bae90) at rpc/virnetserverprogram.c:305
#15 0x00007fc0786ac738 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7fc0796b0400) at rpc/virnetserver.c:162
#16 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7fc0796b0400) at rpc/virnetserver.c:183
#17 0x00007fc0785e5235 in virThreadPoolWorker (opaque=opaque@entry=0x7fc07968af70) at util/virthreadpool.c:144
#18 0x00007fc0785e4cc1 in virThreadHelper (data=<optimized out>) at util/virthreadpthread.c:161
#19 0x00007fc075e98c53 in start_thread () from /lib64/libpthread.so.0
#20 0x00007fc0757beecd in clone () from /lib64/libc.so.6

Comment 4 Alex Jia 2013-04-28 10:12:29 UTC
Patch on upstream:
http://www.redhat.com/archives/libvir-list/2013-April/msg01995.html

Comment 5 Laine Stump 2013-04-29 15:03:36 UTC
(In reply to comment #4)
> Patch on upstream:
> http://www.redhat.com/archives/libvir-list/2013-April/msg01995.html

That patch is incorrect (reasons in my reply to the email).

The problem was already found and solved upstream post-1.0.4. Here is the correct patch:

http://libvirt.org/git/?p=libvirt.git;a=commit;h=9579b6bc209b46a0f079b21455b598c817925b48

Comment 6 Hu Jianwei 2013-05-06 08:26:20 UTC
Can reproduce the bug in libvirt-1.0.4-1.1.el7.x86_64, but can not reproduce in libvirt-1.0.5-1.el7.x86_64.

1. Create a domain with the following VF xml.
    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:84:33:71'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x0a' slot='0x10' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>


2. Start the rhel7
[root@SRIOV2 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 14    rhel7                          running

[root@SRIOV2 ~]# virsh dumpxml rhel7
...
    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:84:33:71'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x0a' slot='0x10' function='0x1'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
...

3. Dump pci node. 
[root@SRIOV2 ~]# virsh nodedev-list --tree
...
 +- pci_0000_00_1c_6
  |   |
  |   +- pci_0000_07_00_0
  |       |
  |       +- pci_0000_08_02_0
  |       |   |
  |       |   +- pci_0000_09_00_0
  |       |   |   |
  |       |   |   +- net_p1p1_00_1b_21_55_b3_b8
  |       |   |     
  |       |   +- pci_0000_09_00_1
  |       |   |   |
  |       |   |   +- net_eth3_00_1b_21_55_b3_b9
  |       |   |     
  |       |   +- pci_0000_0a_10_0
  |       |   |   |
  |       |   |   +- net_p1p1_0_d2_23_65_b4_70_44
...
[root@SRIOV2 ~]# virsh nodedev-dumpxml pci_0000_09_00_1
<device>
  <name>pci_0000_09_00_1</name>
  <parent>pci_0000_08_02_0</parent>
  <driver>
    <name>igb</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>9</bus>
    <slot>0</slot>
    <function>1</function>
    <product id='0x10e8'>82576 Gigabit Network Connection</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <capability type='virt_functions'>
      <address domain='0x0000' bus='0x0a' slot='0x10' function='0x1'/>
      <address domain='0x0000' bus='0x0a' slot='0x10' function='0x3'/>
    </capability>
  </capability>
</device>
[root@SRIOV2 ~]# virsh nodedev-dumpxml pci_0000_0a_10_0
<device>
  <name>pci_0000_0a_10_0</name>
  <parent>pci_0000_08_02_0</parent>
  <driver>
    <name>igbvf</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>10</bus>
    <slot>16</slot>
    <function>0</function>
    <product id='0x10ca'>82576 Virtual Function</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <capability type='phys_function'>
      <address domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </capability>
  </capability>
</device>
4. After above commands, to check the libvirtd status.

[root@SRIOV2 ~]# systemctl status libvirtd
libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled)
   Active: active (running) since Mon 2013-05-06 02:01:15 EDT; 1h 33min ago
 Main PID: 6917 (libvirtd)
   CGroup: name=systemd:/system/libvirtd.service
           ├─1778 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf
           ├─6565 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br1.conf
           ├─6917 /usr/sbin/libvirtd
           └─8461 /usr/libexec/qemu-kvm -name rhel7 -S -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu qemu64,-kvmclock -bios /usr/share...

May 06 03:29:25 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[6565]: using local addresses only for unqualified names
May 06 03:29:25 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[1778]: using local addresses only for unqualified names
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[6565]: reading /etc/resolv.conf
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[1778]: reading /etc/resolv.conf
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[1778]: using nameserver 10.68.5.26#53
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[1778]: using nameserver 10.66.127.17#53
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[1778]: using local addresses only for unqualified names
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[6565]: using nameserver 10.68.5.26#53
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[6565]: using nameserver 10.66.127.17#53
May 06 03:29:48 SRIOV2.qe.lab.eng.nay.redhat.com dnsmasq[6565]: using local addresses only for unqualified names

We can get expected results and the libvirt daemon does not crash, running normally.

Comment 7 Huang Wenlong 2013-05-06 08:30:19 UTC
set Verified according to comment 6

Comment 8 Ludek Smid 2014-06-13 13:06:08 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.