Red Hat Bugzilla – Bug 1380803
"ip link show" command reports Message Truncated on a system with a large number of VF interfaces
Last modified: 2018-04-10 10:30:43 EDT
Description of problem: Running "ip link show" outputs Message Truncated several times and the resulting interface list is incomplete. The issue occurs when the number of VF interfaces is over ~50. This is impacting applications/scripts which rely on parsing the output of ip to ascertain the current network configuration (e.g. ovirt/vdsm) Version-Release number of selected component (if applicable): iproute-3.10.0-54.el7_2.1.x86_64 This appears to be related to https://bugzilla.redhat.com/show_bug.cgi?id=1086512
This has been fixed upstream in either 4.5 or 4.6. Rebuilding the SRPM from Fedora 25 (iproute-4.6.0-1.fc25.src.rpm) fixes the problem. But iproute-4.4.0-3.fc24.src.rpm from Fedora 24 does not.
commit 72b365e8e0fd5efe1d5c05d04c25950736635cfb Author: Phil Sutter <phil@nwl.cc> Date: Fri Mar 4 19:57:28 2016 +0100 libnetlink: Double the dump buffer size There have been reports about 'ip addr' printing "Message truncated" on systems with large numbers of VFs. Although I haven't been able to get my hands on hardware suitable to reproduce this, increasing the dump buffer has been reported to resolve the issue. For want of a better idea, just double the buffer size to 32k. Feels like this opportunistic buffer size selection is rather workarounding a design flaw in libnetlink or maybe even the netlink protocol itself. Signed-off-by: Phil Sutter <phil@nwl.cc>
If there's any interest I could probably provide access to a system exhibiting the issue (Cisco B200 with VIC1340).
It seems that increasing the buffer size has simply pushed the problem further up. With ~95 VFs the message truncated warnings return. On my Cisco UCS platform it's possible to go as high as 114 VFs, while other configurations support up to 223.
Hi Alan, (In reply to Alan Griffiths from comment #5) > It seems that increasing the buffer size has simply pushed the problem > further up. With ~95 VFs the message truncated warnings return. > > On my Cisco UCS platform it's possible to go as high as 114 VFs, while other > configurations support up to 223. So you see the problem even with iproute-4.6.0-1.fc25? Thanks, Phil
Yes, so the behaviour seems to be the same across the releases now. I've tested: - iproute-3.10.0-74 (EL7) iproute-4.6.0-1 (FC25) iproute-4.11.0-1 (FC26) The test system has 2 PF. At 90 x VF all above versions work, at 91 x VF I start seeing message truncated errors.
Just a note: There are theoretical and practical upper limits to the number of Virtual Functions per PF (256 from some sources)[1], so setting a static maximum buffer size is a justifiable option. That moves the burden over to ensuring that the response per VF doesn't grow in the future. Libvirt had similar issues (it used libnl) until it enabled message peeking by default. Unfortunately the only way to solve this issue in all cases without performance penalty is to alter the kernel's message truncation handling mechanism to not free the skb if MSG_TRUNC occurred. [1] It's possible to put in a retry on truncation, resizing the receive buffer to fit the response, BUT since that's two unrelated syscalls, it leaves me feeling a bit queasy. [1] http://windowsitpro.com/systems-management/q-what-sr-iov [2] http://man7.org/linux/man-pages/man2/recv.2.html
A real fix for this issue has been accepted upstream and was applied to net-next branch: commit 2d34851cd341f0e1b3fc17ca3e6e874229f3a1f8 Author: Hangbin Liu <liuhangbin@gmail.com> Date: Thu Oct 26 09:41:46 2017 +0800 lib/libnetlink: re malloc buff if size is not enough commit 86bf43c7c2fdc33d7c021b4a1add1c8facbca51c Author: Hangbin Liu <liuhangbin@gmail.com> Date: Thu Oct 26 09:41:47 2017 +0800 lib/libnetlink: update rtnl_talk to support malloc buff at run time Alan, are you able to compile iproute2 by yourself? If so, could you please give upstream's 'net-next' branch a try? It should solve the issue you are seeing. Thanks, Phil
I can confirm it now works with 112 VF. This is the most my hardware will support. Thanks, Alan
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0815