Red Hat Bugzilla – Bug 980339
libvirtd crashes when starting a guest that uses a hostdev network specifying a nonexistent PF
Last modified: 2013-11-21 04:04:53 EST
This bug also exists in RHEL6.4 and is a trivial backport. +++ This bug was initially created as a clone of Bug #971325 +++ Description of problem: libvirtd crash when start a guest with inactive network that wrong pf value in Version-Release number of selected component (if applicable): libvirt-1.0.6-1.el7.x86_64 qemu-kvm-1.5.0-2.el7.x86_64 3.9.0-0.55.el7.x86_64 How reproducible: 100% Steps to Reproduce: # virsh nodedev-list --tree computer | ...... +- pci_0000_00_1c_6 | | | +- pci_0000_07_00_0 | | | +- pci_0000_08_02_0 | | | | | +- pci_0000_09_00_0 | | | | | | | +- net_p1p1_00_1b_21_55_b3_b8 | | | | | +- pci_0000_09_00_1 | | | | | | | +- net_eth3_00_1b_21_55_b3_b9 <======== right device name | | | | | +- pci_0000_0a_10_0 | | | | | | | +- net_p1p1_0_5a_12_12_ed_a5_1b ...... # cat passthrough.xml <network> <name>passthrough</name> <forward mode='hostdev' managed='yes'> <pf dev='eth1'/> <======== wrong device name </forward> </network> # virsh net-define passthrough.xml Network passthrough defined from passthrough.xml # virsh net-list --all Name State Autostart Persistent ---------------------------------------------------------- default active yes yes passthrough inactive no yes # virsh net-dumpxml passthrough <network> <name>passthrough</name> <uuid>3be5c577-c1bb-4a6d-8641-adda7b2b9b16</uuid> <forward mode='hostdev' managed='yes'> <pf dev='eth1'/> </forward> </network> Add passthrough interface to guest # virsh edit rhel7 Domain rhel7 XML configuration edited. # virsh dumpxml rhel7 <domain type='kvm'> <name>rhel7</name> ...... <interface type='network'> <source network='passthrough'/> </interface> # virsh start rhel7 error: Failed to start domain rhel7 error: End of file while reading data: Input/output error error: One or more references were leaked after disconnect from the hypervisor error: Failed to reconnect to the hypervisor Actual result libvirtd crash Expect result Throw error Additional info: --- Additional comment from hongming on 2013-06-06 05:42:56 EDT --- the device name in log is different from the device name in the bug description. --- Additional comment from hongming on 2013-06-06 05:55:33 EDT --- (In reply to hongming from comment #1) > Created attachment 757587 [details] > libvirt debug log > > the device name in log is different from the device name in the bug > description. I mean they are two different test. --- Additional comment from Laine Stump on 2013-07-01 00:04:15 EDT --- I have reproduced this crash and posted a fix upstream: https://www.redhat.com/archives/libvir-list/2013-July/msg00002.html For reference when testing for this fix - note that it would only crash if a *nonexistent* interface was specified (it wasn't enough to specify an interface that had no SRIOV capabilities; that is yet another failure path that should be in the regression tests to prevent future breakage). --- Additional comment from Laine Stump on 2013-07-01 00:31:58 EDT --- The fix was pushed upstream and will be in libvirt-1.1.0: commit 2c2525ab6a6f0ad5d75a6c60711e2e28cb1cebe9 Author: Laine Stump <laine@laine.org> Date: Sun Jun 30 23:52:43 2013 -0400 pci: initialize virtual_functions array pointer to avoid segfault This fixes https://bugzilla.redhat.com/show_bug.cgi?id=971325 The problem was that if virPCIGetVirtualFunctions was given the name of a non-existent interface, it would return to its caller without initializing the pointer to the array of virtual functions to NULL, and the caller (virNetDevGetVirtualFunctions) would try to VIR_FREE() the invalid pointer. The final error message before the crash would be: virPCIGetVirtualFunctions:2088 : Failed to open dir '/sys/class/net/eth2/device': No such file or directory In this patch I move the initialization in virPCIGetVirtualFunctions() to the begining of the function, and also do an explicit initialization in virNetDevGetVirtualFunctions, just in case someone in the future adds code into that function prior to the call to virPCIGetVirtualFunctions.
The patch libvirt-pci-initialize-virtual_functions-array-pointer-to-avoid-segfault.patch is not completed to this bug, it do not set up pciConfigAddr to NULL. And this bug also exists in libvirt-0.10.2-19.el6, not be verified. My reproduce step like: # virsh nodedev-list --tree computer | ... +- pci_0000_00_16_0 +- pci_0000_00_16_3 +- pci_0000_00_19_0 | | | +- net_eth0_10_60_4b_78_2a_74 <== this is my network interface, named eth0 | +- pci_0000_00_1a_0 | | ... # cat passthrough.xml <network> <name>passthrough</name> <forward mode='hostdev' managed='yes'> <pf dev='eth99'/> <======== wrong interface </forward> </network> # virsh net-define passthrough.xml Network passthrough defined from passthrough.xml # virsh net-list --all Name State Autostart Persistent ---------------------------------------------------------- default active yes yes passthrough inactive no yes # virsh edit a add following into domain a's xml <interface type='network'> <source network='passthrough'/> </interface> # virsh start a error: Failed to start domain a error: End of file while reading data: Input/output error error: One or more references were leaked after disconnect from the hypervisor error: Failed to reconnect to the hypervisor
It turns out there was an additional problem that had already been silently fixed upstream several months prior to the original bug (Bug 971325) being filed. commit ac5cb26a32300d03517692cd15a604dd0517fbd6 Author: John Ferlan <jferlan@redhat.com> Date: Tue Jan 22 09:15:41 2013 -0500 virnetdev: Need to initialize 'pciConfigAddr' It was possible to call VIR_FREE in cleanup prior to initialization
I backported and posted this additional patch to rhvirt-patches. http://post-office.corp.redhat.com/archives/rhvirt-patches/2013-July/msg00176.html Additionally, I tested and it does eliminate this slightly different crash.
This bug fix is verified, the verification step like below: # rpm -q libvirt libvirt-0.10.2-20.el6.x86_64 # virsh nodedev-list --tree computer | ... +- pci_0000_00_16_0 +- pci_0000_00_16_3 +- pci_0000_00_19_0 | | | +- net_eth0_10_60_4b_78_2a_74 <== this is my network interface, named eth0 | +- pci_0000_00_1a_0 | | ... # cat passthrough.xml <network> <name>passthrough</name> <forward mode='hostdev' managed='yes'> <pf dev='eth99'/> <======== wrong interface </forward> </network> # virsh net-define passthrough.xml Network passthrough defined from passthrough.xml # virsh net-list --all Name State Autostart Persistent ---------------------------------------------------------- default active yes yes passthrough inactive no yes # virsh edit r6 add following into domain a's xml <interface type='network'> <source network='passthrough'/> </interface> # virsh start r6 error: Failed to start domain r6 error: internal error Could not get Virtual functions on eth99 # service libvirtd status libvirtd (pid 3408) is running... So, change the status to VERIFIED.
In addition, for the network card that has no SRIOV capability, the verification step of this fix looks like: # vim network.xml <network> <name>passthrough</name> <forward mode='hostdev' managed='yes'> <pf dev='eth1'/> <======== no SRIOV interface </forward> </network> # virsh net-define network.xml Network passthrough defined from network.xml # virsh net-list --all Name State Autostart Persistent -------------------------------------------------- default active yes yes passthrough inactive no yes # virsh edit r6m Domain r6m XML configuration edited. add following into domain r6m's xml <interface type='network'> <source network='passthrough'/> </interface> # virsh start r6m error: Failed to start domain r6m error: internal error No Vf's present on SRIOV PF eth1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1581.html