Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1380803 - "ip link show" command reports Message Truncated on a system with a large number of VF interfaces
"ip link show" command reports Message Truncated on a system with a large num...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: iproute (Show other bugs)
7.2
x86_64 Unspecified
medium Severity high
: rc
: ---
Assigned To: Hangbin Liu
Jaroslav Aster
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-09-30 11:03 EDT by Alan Griffiths
Modified: 2018-04-10 10:30 EDT (History)
10 users (show)

See Also:
Fixed In Version: iproute-4.11.0-9.el7
Doc Type: Bug Fix
Doc Text:
Cause: Due to fixed receive buffer lengths, 'ip link show' command on a system with a large number of virtual functions could fail with error "Message truncated" resulting in incomplete output. Consequence: Command 'ip link show' did not correctly reflect system state if too many virtual functions were present. Fix: Receive buffer sizes are dynamically allocated depending on expected message size. Result: On systems with arbitrary numbers of virtual functions 'ip link show' command now correctly reflects system state.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-04-10 10:28:47 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:0815 None None None 2018-04-10 10:30 EDT

  None (edit)
Description Alan Griffiths 2016-09-30 11:03:37 EDT
Description of problem:
Running "ip link show" outputs Message Truncated several times and the resulting interface list is incomplete.

The issue occurs when the number of VF interfaces is over ~50.

This is impacting applications/scripts which rely on parsing the output of ip to ascertain the current network configuration (e.g. ovirt/vdsm)

Version-Release number of selected component (if applicable):
iproute-3.10.0-54.el7_2.1.x86_64


This appears to be related to https://bugzilla.redhat.com/show_bug.cgi?id=1086512
Comment 2 Alan Griffiths 2016-10-26 06:59:36 EDT
This has been fixed upstream in either 4.5 or 4.6.

Rebuilding the SRPM from Fedora 25 (iproute-4.6.0-1.fc25.src.rpm) fixes the problem. But iproute-4.4.0-3.fc24.src.rpm from Fedora 24 does not.
Comment 3 Hangbin Liu 2017-01-10 21:42:06 EST
commit 72b365e8e0fd5efe1d5c05d04c25950736635cfb
Author: Phil Sutter <phil@nwl.cc>
Date:   Fri Mar 4 19:57:28 2016 +0100

    libnetlink: Double the dump buffer size

    There have been reports about 'ip addr' printing "Message truncated" on
    systems with large numbers of VFs. Although I haven't been able to get
    my hands on hardware suitable to reproduce this, increasing the dump
    buffer has been reported to resolve the issue. For want of a better
    idea, just double the buffer size to 32k.

    Feels like this opportunistic buffer size selection is rather
    workarounding a design flaw in libnetlink or maybe even the netlink
    protocol itself.

    Signed-off-by: Phil Sutter <phil@nwl.cc>
Comment 4 Alan Griffiths 2017-03-24 06:32:00 EDT
If there's any interest I could probably provide access to a system exhibiting the issue (Cisco B200 with VIC1340).
Comment 5 Alan Griffiths 2017-07-05 11:37:23 EDT
It seems that increasing the buffer size has simply pushed the problem further up. With ~95 VFs the message truncated warnings return.

On my Cisco UCS platform it's possible to go as high as 114 VFs, while other configurations support up to 223.
Comment 6 Phil Sutter 2017-07-21 08:23:31 EDT
Hi Alan,

(In reply to Alan Griffiths from comment #5)
> It seems that increasing the buffer size has simply pushed the problem
> further up. With ~95 VFs the message truncated warnings return.
> 
> On my Cisco UCS platform it's possible to go as high as 114 VFs, while other
> configurations support up to 223.

So you see the problem even with iproute-4.6.0-1.fc25?

Thanks, Phil
Comment 7 Alan Griffiths 2017-07-25 07:17:34 EDT
Yes, so the behaviour seems to be the same across the releases now.

I've tested: -

iproute-3.10.0-74 (EL7)
iproute-4.6.0-1 (FC25)
iproute-4.11.0-1 (FC26)

The test system has 2 PF. At 90 x VF all above versions work, at 91 x VF I start seeing message truncated errors.
Comment 8 Jan Gutter 2017-10-10 07:53:00 EDT
Just a note:

There are theoretical and practical upper limits to the number of Virtual Functions per PF (256 from some sources)[1], so setting a static maximum buffer size is a justifiable option. That moves the burden over to ensuring that the response per VF doesn't grow in the future.

Libvirt had similar issues (it used libnl) until it enabled message peeking by default.  Unfortunately the only way to solve this issue in all cases without performance penalty is to alter the kernel's message truncation handling mechanism to not free the skb if MSG_TRUNC occurred. [1]

It's possible to put in a retry on truncation, resizing the receive buffer to fit the response, BUT since that's two unrelated syscalls, it leaves me feeling a bit queasy.

[1] http://windowsitpro.com/systems-management/q-what-sr-iov
[2] http://man7.org/linux/man-pages/man2/recv.2.html
Comment 11 Phil Sutter 2017-11-07 11:37:21 EST
A real fix for this issue has been accepted upstream and was applied to net-next branch:

commit 2d34851cd341f0e1b3fc17ca3e6e874229f3a1f8
Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Thu Oct 26 09:41:46 2017 +0800

    lib/libnetlink: re malloc buff if size is not enough

commit 86bf43c7c2fdc33d7c021b4a1add1c8facbca51c
Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Thu Oct 26 09:41:47 2017 +0800

    lib/libnetlink: update rtnl_talk to support malloc buff at run time


Alan, are you able to compile iproute2 by yourself? If so, could you please
give upstream's 'net-next' branch a try? It should solve the issue you are
seeing.

Thanks, Phil
Comment 13 Alan Griffiths 2017-11-08 10:54:15 EST
I can confirm it now works with 112 VF. This is the most my hardware will support.

Thanks,

Alan
Comment 17 errata-xmlrpc 2018-04-10 10:28:47 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0815

Note You need to log in before you can comment on or make changes to this bug.