Bug 980339 - libvirtd crashes when starting a guest that uses a hostdev network specifying a nonexistent PF
libvirtd crashes when starting a guest that uses a hostdev network specifying...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
6.4
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Laine Stump
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-02 02:28 EDT by Laine Stump
Modified: 2013-11-21 04:04 EST (History)
7 users (show)

See Also:
Fixed In Version: libvirt-0.10.2-20.el6
Doc Type: Bug Fix
Doc Text:
Cause: if an incorrect device name was given in the <pf> element of a libvirt network definition, libvirt would crash when a guest attempted to create an interface using that network. Fix: libvirt now validates the pf device name to verify that it exists and that it is an sriov-capable network device. Result: libvirt no longer crashes when a network with an incorrect <pf> is referenced. Instead it logs an appropriate error message and prevents the operation.
Story Points: ---
Clone Of: 971325
Environment:
Last Closed: 2013-11-21 04:04:53 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Laine Stump 2013-07-02 02:28:06 EDT
This bug also exists in RHEL6.4 and is a trivial backport.

+++ This bug was initially created as a clone of Bug #971325 +++

Description of problem:
libvirtd crash when start a guest with inactive network that wrong pf value in


Version-Release number of selected component (if applicable):
libvirt-1.0.6-1.el7.x86_64
qemu-kvm-1.5.0-2.el7.x86_64
3.9.0-0.55.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
# virsh nodedev-list --tree

computer
  |
......
  +- pci_0000_00_1c_6
  |   |
  |   +- pci_0000_07_00_0
  |       |
  |       +- pci_0000_08_02_0
  |       |   |
  |       |   +- pci_0000_09_00_0
  |       |   |   |
  |       |   |   +- net_p1p1_00_1b_21_55_b3_b8
  |       |   |    
  |       |   +- pci_0000_09_00_1
  |       |   |   |
  |       |   |   +- net_eth3_00_1b_21_55_b3_b9  <======== right device name
  |       |   |    
  |       |   +- pci_0000_0a_10_0
  |       |   |   |
  |       |   |   +- net_p1p1_0_5a_12_12_ed_a5_1b
......

# cat passthrough.xml
<network>
   <name>passthrough</name>
   <forward mode='hostdev' managed='yes'>
     <pf dev='eth1'/>                            <======== wrong device name 
   </forward>
</network>

# virsh net-define passthrough.xml
Network passthrough defined from passthrough.xml


# virsh net-list --all
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              active     yes           yes
 passthrough          inactive   no            yes


# virsh net-dumpxml passthrough
<network>
  <name>passthrough</name>
  <uuid>3be5c577-c1bb-4a6d-8641-adda7b2b9b16</uuid>
  <forward mode='hostdev' managed='yes'>
    <pf dev='eth1'/>
  </forward>
</network>


Add passthrough  interface to guest
# virsh edit rhel7 
Domain rhel7 XML configuration edited.
# virsh dumpxml rhel7
<domain type='kvm'>
  <name>rhel7</name>
  ......

    <interface type='network'>
      <source network='passthrough'/>
    </interface>

# virsh start rhel7
error: Failed to start domain rhel7
error: End of file while reading data: Input/output error
error: One or more references were leaked after disconnect from the hypervisor
error: Failed to reconnect to the hypervisor



Actual result
libvirtd crash

Expect result
Throw error 
Additional info:

--- Additional comment from hongming on 2013-06-06 05:42:56 EDT ---

the device name in log is different from the device name in the bug description.

--- Additional comment from hongming on 2013-06-06 05:55:33 EDT ---

(In reply to hongming from comment #1)
> Created attachment 757587 [details]
> libvirt debug log
> 
> the device name in log is different from the device name in the bug
> description.

I mean they are two different test.

--- Additional comment from Laine Stump on 2013-07-01 00:04:15 EDT ---

I have reproduced this crash and posted a fix upstream:

https://www.redhat.com/archives/libvir-list/2013-July/msg00002.html

For reference when testing for this fix - note that it would only crash if a *nonexistent* interface was specified (it wasn't enough to specify an interface that had no SRIOV capabilities; that is yet another failure path that should be in the regression tests to prevent future breakage).

--- Additional comment from Laine Stump on 2013-07-01 00:31:58 EDT ---

The fix was pushed upstream and will be in libvirt-1.1.0:

commit 2c2525ab6a6f0ad5d75a6c60711e2e28cb1cebe9
Author: Laine Stump <laine@laine.org>
Date:   Sun Jun 30 23:52:43 2013 -0400

    pci: initialize virtual_functions array pointer to avoid segfault
    
    This fixes https://bugzilla.redhat.com/show_bug.cgi?id=971325
    
    The problem was that if virPCIGetVirtualFunctions was given the name
    of a non-existent interface, it would return to its caller without
    initializing the pointer to the array of virtual functions to NULL,
    and the caller (virNetDevGetVirtualFunctions) would try to VIR_FREE()
    the invalid pointer.
    
    The final error message before the crash would be:
    
     virPCIGetVirtualFunctions:2088 :
      Failed to open dir '/sys/class/net/eth2/device':
      No such file or directory
    
    In this patch I move the initialization in virPCIGetVirtualFunctions()
    to the begining of the function, and also do an explicit
    initialization in virNetDevGetVirtualFunctions, just in case someone
    in the future adds code into that function prior to the call to
    virPCIGetVirtualFunctions.
Comment 5 Jincheng Miao 2013-07-09 05:39:45 EDT
The patch libvirt-pci-initialize-virtual_functions-array-pointer-to-avoid-segfault.patch is not completed to this bug, it do not set up pciConfigAddr to NULL.

And this bug also exists in libvirt-0.10.2-19.el6, not be verified.

My reproduce step like:
# virsh nodedev-list --tree
computer
  |
...        
  +- pci_0000_00_16_0
  +- pci_0000_00_16_3
  +- pci_0000_00_19_0
  |   |
  |   +- net_eth0_10_60_4b_78_2a_74  <== this is my network interface, named eth0
  |     
  +- pci_0000_00_1a_0
  |   |
...

# cat passthrough.xml
<network>
   <name>passthrough</name>
   <forward mode='hostdev' managed='yes'>
     <pf dev='eth99'/>                            <======== wrong interface 
   </forward>
</network>

# virsh net-define passthrough.xml
Network passthrough defined from passthrough.xml


# virsh net-list --all
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              active     yes           yes
 passthrough          inactive   no            yes


# virsh edit a
add following into domain a's xml
    <interface type='network'>
      <source network='passthrough'/>
    </interface>

# virsh start a
error: Failed to start domain a
error: End of file while reading data: Input/output error
error: One or more references were leaked after disconnect from the hypervisor
error: Failed to reconnect to the hypervisor
Comment 6 Laine Stump 2013-07-09 14:03:30 EDT
It turns out there was an additional problem that had already been silently fixed upstream several months prior to the original bug (Bug 971325) being filed.

commit ac5cb26a32300d03517692cd15a604dd0517fbd6
Author: John Ferlan <jferlan@redhat.com>
Date:   Tue Jan 22 09:15:41 2013 -0500

    virnetdev: Need to initialize 'pciConfigAddr'
    
    It was possible to call VIR_FREE in cleanup prior to initialization
Comment 7 Laine Stump 2013-07-09 14:04:40 EDT
I backported and posted this additional patch to rhvirt-patches.

  http://post-office.corp.redhat.com/archives/rhvirt-patches/2013-July/msg00176.html

Additionally, I tested and it does eliminate this slightly different crash.
Comment 9 Jincheng Miao 2013-07-15 23:37:12 EDT
This bug fix is verified, the verification step like below:

# rpm -q libvirt
libvirt-0.10.2-20.el6.x86_64

# virsh nodedev-list --tree
computer
  |
...        
  +- pci_0000_00_16_0
  +- pci_0000_00_16_3
  +- pci_0000_00_19_0
  |   |
  |   +- net_eth0_10_60_4b_78_2a_74  <== this is my network interface, named eth0
  |     
  +- pci_0000_00_1a_0
  |   |
...

# cat passthrough.xml
<network>
   <name>passthrough</name>
   <forward mode='hostdev' managed='yes'>
     <pf dev='eth99'/>                            <======== wrong interface 
   </forward>
</network>

# virsh net-define passthrough.xml
Network passthrough defined from passthrough.xml


# virsh net-list --all
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              active     yes           yes
 passthrough          inactive   no            yes


# virsh edit r6
add following into domain a's xml
    <interface type='network'>
      <source network='passthrough'/>
    </interface>

# virsh start r6
error: Failed to start domain r6
error: internal error Could not get Virtual functions on eth99

# service libvirtd status
libvirtd (pid  3408) is running...

So, change the status to VERIFIED.
Comment 10 Jincheng Miao 2013-07-29 04:25:44 EDT
In addition, for the network card that has no SRIOV capability, the verification step of this fix looks like:

# vim network.xml
<network>
   <name>passthrough</name>
   <forward mode='hostdev' managed='yes'>
     <pf dev='eth1'/>                            <======== no SRIOV interface 
   </forward>
</network>

# virsh net-define network.xml 
Network passthrough defined from network.xml

# virsh net-list --all
Name                 State      Autostart     Persistent
--------------------------------------------------
default              active     yes           yes
passthrough          inactive   no            yes

# virsh edit r6m
Domain r6m XML configuration edited.
add following into domain r6m's xml
    <interface type='network'>
      <source network='passthrough'/>
    </interface>

# virsh start r6m
error: Failed to start domain r6m
error: internal error No Vf's present on SRIOV PF eth1
Comment 12 errata-xmlrpc 2013-11-21 04:04:53 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1581.html

Note You need to log in before you can comment on or make changes to this bug.