Bug 1692053

Summary: PCI hostdev interface segfault
Product: [Fedora] Fedora Reporter: Attila Fazekas <afazekas>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 29CC: agedosier, berrange, clalancette, crobinso, itamar, jforbes, laine, libvirt-maint, phrdina, veillard, virt-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-4.7.0-5.fc29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-09 02:24:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Attila Fazekas 2019-03-23 19:46:48 UTC
I tried to add one of my device to a new domain,
but the handling libvirt thread crashed.

I did it wrongly,
but segfault in libvirt is not expected.

The tried device:
02:00.0 Network controller: Mellanox Technologies MT25408A0-FCC-QI ConnectX, Dual Port 40Gb/s InfiniBand / 10GigE Adapter IC with PCIe 2.0 x8 5.0GT/s In... (rev b0)

Kernel:
4.20.16-200.fc29.x86_64

CPU:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
Chipset:
Z77  (old)

iommu enabled.

interface_xml:
   <interface type='hostdev' managed='yes'>
     <source>
       <address type='pci' domain='0x0' bus='0x02' slot='0x00' function='0x0'/>
     </source>
     <mac address='52:54:00:6d:90:02'/>
     <virtualport type='802.1Qbh'>
        <parameters profileid='finance'/>
     </virtualport>
   </interface>


Version-Release number of selected component (if applicable):
libvirt-4.7.0-1.fc29.x86_64 

How reproducible:
always


Actual results:
virsh -c qemu:///system create ./domxml_mlx
error: Disconnected from qemu:///system due to keepalive timeout
error: Failed to create domain from ./domxml_mlx
error: internal error: connection closed due to keepalive timeout


#0  0x00007f966aad4cfa in __strlen_sse2 () from /lib64/libc.so.6
#1  0x00007f966b461b90 in virStrncpy () from /lib64/libvirt.so.0
#2  0x00007f966b42c9fb in virNetDevGetIndex () from /lib64/libvirt.so.0
#3  0x00007f966b43c16d in ?? () from /lib64/libvirt.so.0
#4  0x00007f966b43d265 in virNetDevVPortProfileAssociate () from /lib64/libvirt.so.0
#5  0x00007f966b418e53 in ?? () from /lib64/libvirt.so.0
#6  0x00007f966b419d51 in virHostdevPreparePCIDevices () from /lib64/libvirt.so.0
#7  0x00007f96524f2c02 in qemuHostdevPrepareDomainDevices () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#8  0x00007f965250d665 in qemuProcessPrepareHost () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#9  0x00007f9652513c2f in qemuProcessStart () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#10 0x00007f9652563dcf in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#11 0x00007f966b60fdaa in virDomainCreateXML () from /lib64/libvirt.so.0
#12 0x000055ab33505082 in remoteDispatchDomainCreateXML (server=0x55ab34c75aa0, msg=0x55ab34cea060, args=0x7f9654000b60, args=0x7f9654000b60, ret=0x7f9654000b80, rerr=0x7f965c255960, client=0x55ab34ce9c70)
    at remote/remote_daemon_dispatch_stubs.h:4575
#13 remoteDispatchDomainCreateXMLHelper (server=0x55ab34c75aa0, client=0x55ab34ce9c70, msg=0x55ab34cea060, rerr=0x7f965c255960, args=0x7f9654000b60, ret=0x7f9654000b80)
    at remote/remote_daemon_dispatch_stubs.h:4553
#14 0x00007f966b533cc4 in virNetServerProgramDispatch () from /lib64/libvirt.so.0
#15 0x00007f966b53a1cc in ?? () from /lib64/libvirt.so.0
#16 0x00007f966b469b70 in ?? () from /lib64/libvirt.so.0
#17 0x00007f966b468e7c in ?? () from /lib64/libvirt.so.0
#18 0x00007f966ac4858e in start_thread () from /lib64/libpthread.so.0
#19 0x00007f966ab376a3 in clone () from /lib64/libc.so.6

The daemon stays alive, just one thread got  SIGSEGV/11.


Expected results:
No segfault in libvirt.


Refuse to start in a proper way,
No way to use the given interface definition with this pci address/device.

Additional info:
I did not made any attempt to have the system to not use the device.
F29 today stable version.

Comment 1 Cole Robinson 2019-03-28 21:28:49 UTC
Thanks for the report. Can you see if this also reproduces with newer libvirt from virt-preview repo?

https://fedoraproject.org/wiki/Virtualization_Preview_Repository

Comment 2 Cole Robinson 2019-05-02 18:30:50 UTC
I can't reproduce because it requires sr-iov hardware. Since there's been no response for a month, I'm closing this. If you can still reproduce, please reopen and check:

- if virt-preview repo fixes it, this is a useful piece of data
- reproduce the crash after full dnf debuginfo-install libvirt-daemon\* to get a complete backtrace

Comment 3 Pavel Hrdina 2019-05-03 10:52:58 UTC
I have access to few machines with SR-IOV HW and I was able to reproduce it and figure out the issue.

It is already fixed in upstream since libvirt 5.1.0:

commit 04983c3c6a821f67994b1c65d4d6175f3ac49d69
Author: Radoslaw Biernacki <radoslaw.biernacki>
Date:   Tue Jan 22 12:26:15 2019 -0700

    util: Fixing invalid error checking from virPCIGetNetname()

In Fedora 30 and virt-preview it should be fixed.

Comment 4 Cole Robinson 2019-05-03 14:33:04 UTC
Thanks Pavel. Reopening and setting to POST

Comment 5 Fedora Update System 2019-06-20 22:38:54 UTC
FEDORA-2019-9210998aaa has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-9210998aaa

Comment 6 Fedora Update System 2019-06-22 02:46:08 UTC
libvirt-4.7.0-5.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-9210998aaa

Comment 7 Fedora Update System 2019-07-09 02:24:26 UTC
libvirt-4.7.0-5.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.