Bug 893131

Summary: support for IFLA_EXT_MASK and RTEXT_FILTER_VF needs to be added to lib
Product: [Fedora] Fedora Reporter: Laine Stump <laine>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: berrange, clalancette, crobinso, itamar, jforbes, jyang, laine, libvirt-maint, veillard, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 889319 Environment:
Last Closed: 2013-03-04 12:16:37 EST Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Laine Stump 2013-01-08 11:39:23 EST
+++ This bug was initially created as a clone of Bug #889319 +++

+++ This bug was initially created as a clone of Bug #889249 +++
(IFLA_VFINFO_LIST missing in netlink response messages after upgrade from kernel 2.6.32-339 to 2.6.32-349 - leads to failure of libvirt to get VF info & failure to attach PCI netdev to vm)

After upgrading the RHEL6 kernel to 2.6.32-349 from 2.6.32-349, libvirt was no longer able to perform "intelligent PCI passthrough" of a network device (this is a mode of PCI passthrough where libvirt first retrieves the current MAC address and VLAN tag for an SR-IOV VF, then sets the mac and vlan tag to values in libvirt's configuration, and then sends the address of the VF to qemu for attachment).

Since the failure did not produce a useful error message (the author of the code had missed adding an error log, and this error had never before been encountered), tracing with gdb was required to find the source of the problem. It turns out that the cause was that, with kernel -349, libvirt is failing to get the current VFINFO for the VF.

In particular, libvirt sends an NLM_F_REQUEST for the PF's ifindex over a NETLINK_ROUTE socket, waits for the response packet, then looks at tb[IFLA_VFINFO_LIST] in the response for the VFINFO of the VF. WIth kernel -339, tb[IFLA_VFINFO_LIST] was a valid pointer pointing to info for all the VFs of the PF. But with kernel -349, tb[IFLA_VINFO_LIST] is NULL.

The application code in question is available in the source for the libvirt package, in src/util/virnetdev.c and src/util/virnetlink.c. The toplevel function is virNetDevGetVfConfig(). It calls virNetDevLinkDump() to send the NLM_F_REQUEST and receive the response, and virNetDevParseVfConfig() to extract the info for the requested VF from the response.

It is fairly easy to trace through this with gdb by installing the libvirt-debuginfo package and attaching gdb to the system's libvirtd process prior to attempting to start a domain with an <interface type='hostdev'> device defined. I can help in setting this up, or provide access to my machine which is already setup to reproduce the error. Alternately the virt QE team has at least one host that is properly setup to reproduce this error (see https://bugzilla.redhat.com/show_bug.cgi?id=827519#c10 - it also provides the libvirt log when the problem is encountered, although that is unfortunately of little use)

--- Additional comment from Thomas Graf on 2012-12-20 11:08:41 EST ---

You need to add the following U32 attribute to your request message:

  IFLA_EXT_MASK

and set it to

  RTEXT_FILTER_VF

This was introduced due to the RTM_NEWLINK message exceeding a page in size which userspace was unprepared to handle.

commit 115c9b81928360d769a76c632bae62d15206a94a
Author: Greg Rose <gregory.v.rose@intel.com>
Date:   Tue Feb 21 16:54:48 2012 -0500

    rtnetlink: Fix problem with buffer allocation
    
    Implement a new netlink attribute type IFLA_EXT_MASK.  The mask
    is a 32 bit value that can be used to indicate to the kernel that
    certain extended ifinfo values are requested by the user application.
    At this time the only mask value defined is RTEXT_FILTER_VF to
    indicate that the user wants the ifinfo dump to send information
    about the VFs belonging to the interface.
    
    This patch fixes a bug in which certain applications do not have
    large enough buffers to accommodate the extra information returned
    by the kernel with large numbers of SR-IOV virtual functions.
    Those applications will not send the new netlink attribute with
    the interface info dump request netlink messages so they will
    not get unexpectedly large request buffers returned by the kernel.
    
    Modifies the rtnl_calcit function to traverse the list of net
    devices and compute the minimum buffer size that can hold the
    info dumps of all matching devices based upon the filter passed
    in via the new netlink attribute filter mask.  If no filter
    mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.
    
    With this change it is possible to add yet to be defined netlink
    attributes to the dump request which should make it fairly extensible
    in the future.
    
    Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

--- Additional comment from Laine Stump on 2012-12-20 16:35:37 EST ---

The following were committed upstream:

commit ac2797cf2af2fd0e64c58a48409a8175d24d6f86
Author: Laine Stump <laine@laine.org>
Date:   Thu Dec 20 13:22:17 2012 -0500

    util: fix functions that retrieve SRIOV VF info
    
    This patch resolves:
    
      https://bugzilla.redhat.com/show_bug.cgi?id=889319
    
    When assigning an SRIOV virtual function to a guest using "intelligent
    PCI passthrough" (<interface type='hostdev'>, which sets the MAC
    address and vlan tag of the VF before passing its info to qemu),
    libvirt first learns the current MAC address and vlan tag by sending
    an NLM_F_REQUEST message for the VF's PF (physical function) to the
    kernel via a NETLINK_ROUTE socket (see virNetDevLinkDump()); the
    response message's IFLA_VFINFO_LIST section is examined to extract the
    info for the particular VF being assigned.
    
    This worked fine with kernels up until kernel commit
    115c9b81928360d769a76c632bae62d15206a94a (first appearing in upstream
    kernel 3.3) which changed the ABI to not return IFLA_VFINFO_LIST in
    the response until a newly introduced IFLA_EXT_MASK field was included
    in the request, with the (newly introduced, of course) RTEXT_FILTER_VF
    flag set.
    
    The justification for this ABI change was that new fields had been
    added to the VFINFO, causing NLM_F_REQUEST messages to fail on systems
    with large numbers of VFs if the requesting application didn't have a
    large enough buffer for all the info. The idea is that most
    applications doing an NLM_F_REQUEST don't care about VFINFO anyway, so
    eliminating it from the response would lower the requirements on
    buffer size. Apparently, the people who pushed this patch made the
    mistaken assumption that iproute2 (the "ip" command) was the only
    package that used IFLA_VFINFO_LIST, so it wouldn't break anything else
    (and they made sure that iproute2 was fixed.
    
    The logic of this "fix" is debatable at best (one could claim that the
    proper fix would be for the applications in question to be fixed so
    that they properly sized the buffer, which is what libvirt does
    (purely by virtue of using libnl), but it is what it is and we have to
    deal with it.
    
    In order for <interface type='hostdev'> to work properly on systems
    with a kernel 3.3 or later, libvirt needs to add the afore-mentioned
    IFLA_EXT_MASK field with RTEXT_FILTER_VF set.
    
    Of course we also need to continue working on systems with older
    kernels, so that one bit of code is compiled conditionally. The one
    time this could cause problems is if the libvirt binary was built on a
    system without IFLA_EXT_MASK which was subsequently updated to a
    kernel that *did* have it. That could be solved by manually providing
    the values of IFLA_EXT_MASK and RTEXT_FILTER_VF and adding it to the
    message anyway, but I'm uncertain what that might actually do on a
    system that didn't support the message, so for the time being we'll
    just fail in that case (which will very likely never happen anyway).

commit 846770e5ff959f7819e2c32857598cb88e2e2f0e
Author: Laine Stump <laine@laine.org>
Date:   Thu Dec 20 14:52:41 2012 -0500

    util: add missing error log messages when failing to get netlink VFINFO
    
    This patch fixes the lack of error messages when libvirt fails to find
    VFINFO in a returned netlinke response message.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=827519#c10 is an example
    of the error message that was previously logged when the
    IFLA_VFINFO_LIST object was missing from the netlink response. The
    reason for this failure is detailed in
    
       https://bugzilla.redhat.com/show_bug.cgi?id=889319
    
    Even though that root problem has been fixed, the experience of
    finding the root cause shows us how important it is to properly log an
    error message in these cases. This patch *seems* to replace the entire
    function, but really most of the changes are due to moving code that
    was previously inside an if() statement out to the top level of the
    function (the original if() was reversed and made to log an error and
    return).


--- Additional comment from Laine Stump on 2012-12-21 16:25:12 EST ---

The first patch above was incomplete/incorrect, requiring a 3rd patch, which has been committed upstream:

commit 7c36650699f33e54361720f824efdf164bc6e65d
Author: Laine Stump <laine@laine.org>
Date:   Fri Dec 21 15:09:33 2012 -0500

    util: fix botched check for new netlink request filters
    
    This is an adjustment to the fix for
    
      https://bugzilla.redhat.com/show_bug.cgi?id=889319
    
    to account for two bonehead mistakes I made.
    
    commit ac2797cf2af2fd0e64c58a48409a8175d24d6f86 attempted to fix a
    problem with netlink in newer kernels requiring an extra attribute
    with a filter flag set in order to receive an IFLA_VFINFO_LIST from
    netlink. Unfortunately, the #ifdef that protected against compiling it
    in on systems without the new flag went a bit too far, assuring that
    the new code would *never* be compiled, and even if it had, the code
    was incorrect.
    
    The first problem was that, while some IFLA_* enum values are also
    their existence at compile time, IFLA_EXT_MASK *isn't* #defined, so
    checking to see if it's #defined is not a valid method of determining
    whether or not to add the attribute. Fortunately, the flag that is
    being set (RTEXT_FILTER_VF) *is* #defined, and it is never present if
    IFLA_EXT_MASK isn't, so it's sufficient to just check for that flag.
    
    And to top it off, due to the code not actually compiling when I
    thought it did, I didn't realize that I'd been given the wrong arglist
    to nla_put() - you can't just send a const value to nla_put, you have
    to send it a pointer to memory containing what you want to add to the
    message, along with the length of that memory.

--- Additional comment from hongming on 2013-01-05 04:19:14 EST ---

Verify it as follows. the result is expected. Change its status to VERIFIED.

# uname -r
2.6.32-349.el6.x86_64

# rpm -q libvirt qemu-kvm
libvirt-0.10.2-14.el6.x86_64
qemu-kvm-0.12.1.2-2.344.el6.x86_64

# lspci |grep Ethernet
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 05)
01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0a:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0a:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0a:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0a:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
.......

Add the following xml to guest
    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:60:6e:2d'/>
      <source>
        <address domain='0x0000' bus='0x0a' slot='0x10' function='0x0'/>
      </source>
    </interface>

# virsh start rhel63
Domain rhel63 started

# virsh dumpxml rhel63
<domain type='kvm' id='4'>
  <name>rhel63</name>
  ......
  <devices>
  ......
    <interface type='network'>
      <mac address='52:54:00:d7:f0:f2'/>
      <source network='default'/>
      <target dev='vnet0'/>
      <model type='rtl8139'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:60:6e:2d'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x0a' slot='0x10' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </interface>
 ......
  </devices>
 ......
</domain>


Login guest , Check the Virtual Function using lspci.it exists.


# virsh detach-interface rhel63 --type hostdev --mac 52:54:00:60:6e:2d
Interface detached successfully


The VF has been detached from guest.

# virsh dumpxml rhel63
<domain type='kvm' id='5'>
 ....... 
    <interface type='network'>
      <mac address='52:54:00:d7:f0:f2'/>
      <source network='default'/>
      <target dev='vnet0'/>
      <model type='rtl8139'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
 .......
</domain>
Comment 1 Laine Stump 2013-01-08 13:04:53 EST
I pushed these three patches to both the v0.9.11-maint and v0.10.2-maint branches, since both of those branches have support for <interface type='hostdev'> and they are used on Fedora 17 and 18 respectively, both of which use kernels new enough to require the patch (3.6.10 and 3.6.11).
Comment 2 Fedora Update System 2013-01-28 15:46:33 EST
libvirt-0.10.2.3-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/libvirt-0.10.2.3-1.fc18
Comment 3 Fedora Update System 2013-02-05 11:53:39 EST
libvirt-0.10.2.3-1.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.