RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1380803 - "ip link show" command reports Message Truncated on a system with a large number of VF interfaces
Summary: "ip link show" command reports Message Truncated on a system with a large num...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: iproute
Version: 7.2
Hardware: x86_64
OS: Unspecified
medium
high
Target Milestone: rc
: ---
Assignee: Hangbin Liu
QA Contact: Jaroslav Aster
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-30 15:03 UTC by Alan Griffiths
Modified: 2018-04-10 14:30 UTC (History)
10 users (show)

Fixed In Version: iproute-4.11.0-9.el7
Doc Type: Bug Fix
Doc Text:
Cause: Due to fixed receive buffer lengths, 'ip link show' command on a system with a large number of virtual functions could fail with error "Message truncated" resulting in incomplete output. Consequence: Command 'ip link show' did not correctly reflect system state if too many virtual functions were present. Fix: Receive buffer sizes are dynamically allocated depending on expected message size. Result: On systems with arbitrary numbers of virtual functions 'ip link show' command now correctly reflects system state.
Clone Of:
Environment:
Last Closed: 2018-04-10 14:28:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:0815 0 None None None 2018-04-10 14:30:42 UTC

Description Alan Griffiths 2016-09-30 15:03:37 UTC
Description of problem:
Running "ip link show" outputs Message Truncated several times and the resulting interface list is incomplete.

The issue occurs when the number of VF interfaces is over ~50.

This is impacting applications/scripts which rely on parsing the output of ip to ascertain the current network configuration (e.g. ovirt/vdsm)

Version-Release number of selected component (if applicable):
iproute-3.10.0-54.el7_2.1.x86_64


This appears to be related to https://bugzilla.redhat.com/show_bug.cgi?id=1086512

Comment 2 Alan Griffiths 2016-10-26 10:59:36 UTC
This has been fixed upstream in either 4.5 or 4.6.

Rebuilding the SRPM from Fedora 25 (iproute-4.6.0-1.fc25.src.rpm) fixes the problem. But iproute-4.4.0-3.fc24.src.rpm from Fedora 24 does not.

Comment 3 Hangbin Liu 2017-01-11 02:42:06 UTC
commit 72b365e8e0fd5efe1d5c05d04c25950736635cfb
Author: Phil Sutter <phil>
Date:   Fri Mar 4 19:57:28 2016 +0100

    libnetlink: Double the dump buffer size

    There have been reports about 'ip addr' printing "Message truncated" on
    systems with large numbers of VFs. Although I haven't been able to get
    my hands on hardware suitable to reproduce this, increasing the dump
    buffer has been reported to resolve the issue. For want of a better
    idea, just double the buffer size to 32k.

    Feels like this opportunistic buffer size selection is rather
    workarounding a design flaw in libnetlink or maybe even the netlink
    protocol itself.

    Signed-off-by: Phil Sutter <phil>

Comment 4 Alan Griffiths 2017-03-24 10:32:00 UTC
If there's any interest I could probably provide access to a system exhibiting the issue (Cisco B200 with VIC1340).

Comment 5 Alan Griffiths 2017-07-05 15:37:23 UTC
It seems that increasing the buffer size has simply pushed the problem further up. With ~95 VFs the message truncated warnings return.

On my Cisco UCS platform it's possible to go as high as 114 VFs, while other configurations support up to 223.

Comment 6 Phil Sutter 2017-07-21 12:23:31 UTC
Hi Alan,

(In reply to Alan Griffiths from comment #5)
> It seems that increasing the buffer size has simply pushed the problem
> further up. With ~95 VFs the message truncated warnings return.
> 
> On my Cisco UCS platform it's possible to go as high as 114 VFs, while other
> configurations support up to 223.

So you see the problem even with iproute-4.6.0-1.fc25?

Thanks, Phil

Comment 7 Alan Griffiths 2017-07-25 11:17:34 UTC
Yes, so the behaviour seems to be the same across the releases now.

I've tested: -

iproute-3.10.0-74 (EL7)
iproute-4.6.0-1 (FC25)
iproute-4.11.0-1 (FC26)

The test system has 2 PF. At 90 x VF all above versions work, at 91 x VF I start seeing message truncated errors.

Comment 8 Jan Gutter (Netronome) 2017-10-10 11:53:00 UTC
Just a note:

There are theoretical and practical upper limits to the number of Virtual Functions per PF (256 from some sources)[1], so setting a static maximum buffer size is a justifiable option. That moves the burden over to ensuring that the response per VF doesn't grow in the future.

Libvirt had similar issues (it used libnl) until it enabled message peeking by default.  Unfortunately the only way to solve this issue in all cases without performance penalty is to alter the kernel's message truncation handling mechanism to not free the skb if MSG_TRUNC occurred. [1]

It's possible to put in a retry on truncation, resizing the receive buffer to fit the response, BUT since that's two unrelated syscalls, it leaves me feeling a bit queasy.

[1] http://windowsitpro.com/systems-management/q-what-sr-iov
[2] http://man7.org/linux/man-pages/man2/recv.2.html

Comment 11 Phil Sutter 2017-11-07 16:37:21 UTC
A real fix for this issue has been accepted upstream and was applied to net-next branch:

commit 2d34851cd341f0e1b3fc17ca3e6e874229f3a1f8
Author: Hangbin Liu <liuhangbin>
Date:   Thu Oct 26 09:41:46 2017 +0800

    lib/libnetlink: re malloc buff if size is not enough

commit 86bf43c7c2fdc33d7c021b4a1add1c8facbca51c
Author: Hangbin Liu <liuhangbin>
Date:   Thu Oct 26 09:41:47 2017 +0800

    lib/libnetlink: update rtnl_talk to support malloc buff at run time


Alan, are you able to compile iproute2 by yourself? If so, could you please
give upstream's 'net-next' branch a try? It should solve the issue you are
seeing.

Thanks, Phil

Comment 13 Alan Griffiths 2017-11-08 15:54:15 UTC
I can confirm it now works with 112 VF. This is the most my hardware will support.

Thanks,

Alan

Comment 17 errata-xmlrpc 2018-04-10 14:28:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0815


Note You need to log in before you can comment on or make changes to this bug.