Bug 1932502 - Setting MTU for a bond interface using Kernel arguments is not working
Summary: Setting MTU for a bond interface using Kernel arguments is not working
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.6
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Micah Abbott
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-24 18:16 UTC by Mario Vázquez
Modified: 2022-01-05 07:46 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Specifying the MTU for on a bonded interface with a VLAN subinterface defined was incorrectly assigned to the bonded ports rather than the bonded interface itself. Consequence: MTU setting on bonded interface was not respected. Fix: Change the initrd parser of NetworkManager to correctly detect the use of an MTU on the bonded interface when VLAN subinterfaces were used. Result: MTU settings on bonded interfaces are correctly respected.
Clone Of:
Environment:
Last Closed: 2021-07-27 22:48:13 UTC
Target Upstream Version:
Embargoed:
miabbott: needinfo-


Attachments (Terms of Use)
boot log (127.62 KB, text/plain)
2021-02-25 18:53 UTC, Mario Vázquez
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github coreos fedora-coreos-config pull 1401 0 None open networking: configure MTU on a VLAN subinterface for the bond works 2022-01-05 07:46:45 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:48:34 UTC

Description Mario Vázquez 2021-02-24 18:16:53 UTC
Description of problem:
We want to deploy some RHCOS nodes with a bond interface with jumbo frames enabled.

We're using Dracut kernel command line options to configure the bond at boot time.

The bonding configuration works, but the MTU is not properly set on the bond interface.


Version-Release number of selected component (if applicable):

4.6


How reproducible:

Always

Steps to Reproduce:
1. Run an RHCOS image and include the following kernel args:

bond=bond0:ens3,ens4:mode=802.3ad,lacp_rate=0,miimon=100:9000 ip=192.168.125.10::192.168.125.1:255.255.255.0:testvm.local:bond0:none:9000:

Actual results:

Interfaces get the proper MTU set (9000) but the bond interface does not (1500).

$ ip a l                                                                                                                                                               
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc fq_codel master bond0 state UP group default qlen 1000
    link/ether 52:54:00:d5:60:91 brd ff:ff:ff:ff:ff:ff
3: ens4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc fq_codel master bond0 state UP group default qlen 1000
    link/ether 52:54:00:d5:60:91 brd ff:ff:ff:ff:ff:ff
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:d5:60:91 brd ff:ff:ff:ff:ff:ff
    inet 192.168.125.10/24 brd 192.168.125.255 scope global noprefixroute bond0
       valid_lft forever preferred_lft forever

Expected results:

Bond interface gets its MTU configured to 9000

Additional info:

We used the dracut kernel args as documented in the following link:

https://man7.org/linux/man-pages/man7/dracut.cmdline.7.html

Comment 1 Jonathan Lebon 2021-02-25 18:02:31 UTC
This is likely more an NM question/bug. RHCOS just takes the NM connection files that nm-initrd-generator created and propagates those to the real root.

As a sanity-check, you could see if it works correctly on traditional RHEL (there's no NM keyfile propagation there, but AFAIK the network doesn't get taken down before switchroot, so it should remain in the same state it was set up by NM in the initrd).

Comment 2 Micah Abbott 2021-02-25 18:22:51 UTC
I agree with Jonathan's assessment.  Pinging @thaller for more eyes.

Could you provide the full journal from the node showing the configuration + activation of the network interfaces?

Could you provide the contents of `/etc/NetworkManager/system-connections`?

Comment 3 Mario Vázquez 2021-02-25 18:52:51 UTC
/etc/NetworkManager/system-connections/ens3.nmconnection
[connection]
id=ens3
uuid=8ceff32e-caad-43b4-aa9f-b71a270e3353
type=ethernet
interface-name=ens3
master=78fd2db5-22e7-4c80-9993-62285f6c5a95
multi-connect=1
permissions=
slave-type=bond

[ethernet]
mac-address-blacklist=
mtu=9000

----------------------------------------------------------

/etc/NetworkManager/system-connections/ens4.nmconnection
[connection]
id=ens4
uuid=6791a31d-19d6-4f66-b8cc-f6c1ca029fc9
type=ethernet
interface-name=ens4
master=78fd2db5-22e7-4c80-9993-62285f6c5a95
multi-connect=1
permissions=
slave-type=bond

[ethernet]
mac-address-blacklist=
mtu=9000

----------------------------------------------------------

/etc/NetworkManager/system-connections/bond0.nmconnection
[connection]
id=bond0
uuid=78fd2db5-22e7-4c80-9993-62285f6c5a95
type=bond
interface-name=bond0
multi-connect=1
permissions=

[bond]
lacp_rate=0
miimon=100
mode=802.3ad

[ipv4]
address1=192.168.125.10/24,192.168.125.1
dhcp-hostname=testvm.local
dns-search=
may-fail=false
method=manual

[ipv6]
addr-gen-mode=eui64
dhcp-hostname=testvm.local
dns-search=
method=disabled

[proxy]

Comment 4 Mario Vázquez 2021-02-25 18:53:55 UTC
Created attachment 1759372 [details]
boot log

Comment 5 Micah Abbott 2021-02-25 19:54:25 UTC
Looking at the journal, I see this message:

`Feb 25 18:45:47 localhost nm-initrd-gener[454]: <warn>  [1614278747.4742] cmdline-reader: 'bond' does not support setting mtu`

I am not a networking expert, but I believe this is the correct behavior.  (Which begs the question of why the dracut man page claims you can configure the MTU on the logical bond...)

My theory:  since the logical bonded interface can be made of up multiple physical interfaces connected to multiple L2 devices, it is not reasonable to enforce MTU on the logical bonded interface.  Put another way, if you have multiple underlying interfaces configured with different MTUs, the egress traffic from the bonded interface is going to use the MTU for the primary physical interface that the bond is currently using.


You should be able to confirm this by doing a similar test with a RHEL 8 system, where two physical interfaces are configured with different MTUs and inspecting the packet sizes from the bonded interface while switching the underlying primary interface.  (Not sure this is possible, but would be a neat experiment!)


I'm going to send this over to the NetworkManager folks to have a look at this and correct any misconceptions I have.

Comment 7 Beniamino Galvani 2021-02-25 20:57:56 UTC
Hi,

the bond driver should propagate the MTU from the bond interface to the ports [1]:

 [root@localhost ~]# ip link add dummy1 type dummy
 [root@localhost ~]# ip link add dummy2 type dummy
 [root@localhost ~]# ip link set dummy1 mtu 1300
 [root@localhost ~]# ip link set dummy2 mtu 1400
 [root@localhost ~]# ip link add bond1 type bond
 [root@localhost ~]# ip link set dummy1 master bond1
 [root@localhost ~]# ip link set dummy2 master bond1
 [root@localhost ~]# ip link set bond1 mtu 2000
 [root@localhost ~]# ip -o link
 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group ...
 6938: dummy1: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 2000 qdisc noqueue master bond1  ...
 6939: dummy2: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 2000 qdisc noqueue master bond1  ...
 6940: bond1: <BROADCAST,MULTICAST,MASTER> mtu 2000 qdisc noop state DOWN mode DEFAULT  ...

So, it makes sense to set the MTU on a bond connection.

On RHEL 8.2, the NM initrd generator couldn't set the MTU for bonds/teams/bridges due to a bug.

That was fixed in commit [2] and it should work in RHEL 8.3 (NM 1.26).

[1] https://access.redhat.com/solutions/64136
[2] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/79f70bf5d62213e7e6ce2c5e15fdf6981dc19ef0

Comment 8 Micah Abbott 2021-02-25 22:10:39 UTC
(In reply to Beniamino Galvani from comment #7)
> Hi,
> 
> the bond driver should propagate the MTU from the bond interface to the
> ports [1]:
> 
>  [root@localhost ~]# ip link add dummy1 type dummy
>  [root@localhost ~]# ip link add dummy2 type dummy
>  [root@localhost ~]# ip link set dummy1 mtu 1300
>  [root@localhost ~]# ip link set dummy2 mtu 1400
>  [root@localhost ~]# ip link add bond1 type bond
>  [root@localhost ~]# ip link set dummy1 master bond1
>  [root@localhost ~]# ip link set dummy2 master bond1
>  [root@localhost ~]# ip link set bond1 mtu 2000
>  [root@localhost ~]# ip -o link
>  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
> DEFAULT group ...
>  6938: dummy1: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 2000 qdisc noqueue
> master bond1  ...
>  6939: dummy2: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 2000 qdisc noqueue
> master bond1  ...
>  6940: bond1: <BROADCAST,MULTICAST,MASTER> mtu 2000 qdisc noop state DOWN
> mode DEFAULT  ...
> 
> So, it makes sense to set the MTU on a bond connection.
> 
> On RHEL 8.2, the NM initrd generator couldn't set the MTU for
> bonds/teams/bridges due to a bug.
> 
> That was fixed in commit [2] and it should work in RHEL 8.3 (NM 1.26).
> 
> [1] https://access.redhat.com/solutions/64136
> [2]
> https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/
> 79f70bf5d62213e7e6ce2c5e15fdf6981dc19ef0

Thanks for the correction and additional information, Beniamino!

I'll pull this back to RHCOS so we can track the inclusion of NM 1.26 in our builds.  It is included as part of OCP/RHCOS 4.7, but doesn't seem likely to get included in OCP 4.6.z, as we are using RHEL 8.2 EUS as the base content set in RHCOS 4.6.

@Mario can you try configuring the MTU on the bond with RHCOS 4.7?

Comment 9 Mario Vázquez 2021-03-01 16:01:20 UTC
(In reply to Micah Abbott from comment #8)
> (In reply to Beniamino Galvani from comment #7)
> > Hi,
> > 
> > the bond driver should propagate the MTU from the bond interface to the
> > ports [1]:
> > 
> >  [root@localhost ~]# ip link add dummy1 type dummy
> >  [root@localhost ~]# ip link add dummy2 type dummy
> >  [root@localhost ~]# ip link set dummy1 mtu 1300
> >  [root@localhost ~]# ip link set dummy2 mtu 1400
> >  [root@localhost ~]# ip link add bond1 type bond
> >  [root@localhost ~]# ip link set dummy1 master bond1
> >  [root@localhost ~]# ip link set dummy2 master bond1
> >  [root@localhost ~]# ip link set bond1 mtu 2000
> >  [root@localhost ~]# ip -o link
> >  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
> > DEFAULT group ...
> >  6938: dummy1: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 2000 qdisc noqueue
> > master bond1  ...
> >  6939: dummy2: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 2000 qdisc noqueue
> > master bond1  ...
> >  6940: bond1: <BROADCAST,MULTICAST,MASTER> mtu 2000 qdisc noop state DOWN
> > mode DEFAULT  ...
> > 
> > So, it makes sense to set the MTU on a bond connection.
> > 
> > On RHEL 8.2, the NM initrd generator couldn't set the MTU for
> > bonds/teams/bridges due to a bug.
> > 
> > That was fixed in commit [2] and it should work in RHEL 8.3 (NM 1.26).
> > 
> > [1] https://access.redhat.com/solutions/64136
> > [2]
> > https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/
> > 79f70bf5d62213e7e6ce2c5e15fdf6981dc19ef0
> 
> Thanks for the correction and additional information, Beniamino!
> 
> I'll pull this back to RHCOS so we can track the inclusion of NM 1.26 in our
> builds.  It is included as part of OCP/RHCOS 4.7, but doesn't seem likely to
> get included in OCP 4.6.z, as we are using RHEL 8.2 EUS as the base content
> set in RHCOS 4.6.
> 
> @Mario can you try configuring the MTU on the bond with RHCOS 4.7?

Configuring MTU on the bond with RHCOS 4.7 worked with these kernel args: "bond=bond0:ens3,ens4:mode=802.3ad,lacp_rate=0,miimon=100:9000 ip=192.168.125.10::192.168.125.1:255.255.255.0:testvm.local:bond0:none:9000:"

The next step was trying to setup MTU on a VLAN subinterface for the bond which is the real use case, that failed tho.

These are the kernel args used: "bond=bond0:ens3,ens4:mode=802.3ad,lacp_rate=0,miimon=100:9000 ip=192.168.125.10::192.168.125.1:255.255.255.0:testvm.local:bond0.19:none:9000: vlan=bond0.19:bond0"


We get 1500 MTU configured on bond0 and bond0.19 interfaces.

The system-connections look like this:


/etc/NetworkManager/system-connections/bond0.nmconnection
[connection]
id=bond0
uuid=78fd2db5-22e7-4c80-9993-62285f6c6a96
type=bond
interface-name=bond0
multi-connect=1
permissions=

[bond]
lacp_rate=0
miimon=100
mode=802.3ad

[ipv4]
dns-search=
method=disabled

[ipv6]
addr-gen-mode=eui64
dns-search=
method=disabled

[proxy]

-----------

/etc/NetworkManager/system-connections/bond0.19.nmconnection
[connection]
id=bond0.19
uuid=78fd2db5-22e7-4c80-9993-62285f6c2acf
type=vlan
interface-name=bond0.19
multi-connect=1
permissions=

[ethernet]
mac-address-blacklist=
mtu=9000

[vlan]
egress-priority-map=
flags=1
id=19
ingress-priority-map=
parent=bond0

[ipv4]
address1=192.168.125.10/24,192.168.125.1
dhcp-hostname=testvm.local
dns-search=
may-fail=false
method=manual

[ipv6]
addr-gen-mode=eui64
dhcp-hostname=testvm.local
dns-search=
method=disabled

[proxy]



Versions tested: 
- nmcli tool: 1.26.0-12.1.rhaos4.7.el8
- NetworkManager: 1.26.0-12.1.rhaos4.7.el8.x86_64

Comment 10 Beniamino Galvani 2021-03-01 20:31:46 UTC
(In reply to Mario Vázquez from comment #9)
> The next step was trying to setup MTU on a VLAN subinterface for the bond
> which is the real use case, that failed tho.
> 
> These are the kernel args used:
> "bond=bond0:ens3,ens4:mode=802.3ad,lacp_rate=0,miimon=100:9000
> ip=192.168.125.10::192.168.125.1:255.255.255.0:testvm.local:bond0.19:none:
> 9000: vlan=bond0.19:bond0"
> 
> 
> We get 1500 MTU configured on bond0 and bond0.19 interfaces.

You are right, this command line doesn't work properly because the MTU specified in

  bond=<bondname>[:<bondslaves>:[:<options>[:<mtu>]]]

gets wrongly assigned to the bond ports, instead of the bond itself. I opened a merge request [1] to fix this issue.

For now you can use the ip= argument to specified the MTU for the bond. Just add:

  ip=bond0:none:9000

to the command line.

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/767

Comment 11 Mario Vázquez 2021-03-05 12:03:11 UTC
@bgalvani I can see the MR on GitLab has passed the required CI but it requires a rebase. Once that MR is merged when can we expect to get that extra fix into RHCOS?

Comment 12 Beniamino Galvani 2021-03-08 09:42:21 UTC
The fix is now merged upstream.

> Once that MR is merged when can we expect to get that extra fix into RHCOS?

I think this is more a question for the RHCOS team.

Comment 13 Micah Abbott 2021-03-08 14:52:43 UTC
(In reply to Beniamino Galvani from comment #12)

> > Once that MR is merged when can we expect to get that extra fix into RHCOS?
> 
> I think this is more a question for the RHCOS team.

It'll depend on when the fix lands in RHEL.  Once that fix is included in a build of NetworkManager in RHEL 8.3, we can included it as part of RHCOS 4.7.  Though, since this is a boot-time problem, we would have to build a new set of boot images for RHCOS 4.7 and go through the process of releasing them.

However since there is a workaround available, we would prefer not to build new boot images as the process is quite expensive from a process perspective.

Beniamino, do you know when the fix would land in a RHEL build?

Comment 14 Beniamino Galvani 2021-03-08 16:27:16 UTC
At this time we need to ask exception+ to get the fix into RHEL 8.4. Since the fix seems important, I would do it.

For RHEL 8.3 we need z-stream approval but since there is a workaround probably there is less justification to have the fix backported.

What do you think?

Comment 16 Micah Abbott 2021-03-08 19:55:08 UTC
(In reply to Beniamino Galvani from comment #14)
> At this time we need to ask exception+ to get the fix into RHEL 8.4. Since
> the fix seems important, I would do it.
> 
> For RHEL 8.3 we need z-stream approval but since there is a workaround
> probably there is less justification to have the fix backported.
> 
> What do you think?

Agreed; I will file a BZ against NM for 8.4, but I think the backport to 8.3.z can be omitted.

Comment 17 Micah Abbott 2021-03-08 20:01:18 UTC
RHEL BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1936610

Comment 18 Gris Ge 2021-03-09 13:11:19 UTC
Thanks for the bug report. I will collect test feedback and request exception for it there.

Comment 19 Dusty Mabe 2021-06-01 14:14:21 UTC
(In reply to Micah Abbott from comment #17)
> RHEL BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1936610

Looks like that was fixed in NetworkManager-1.32.0-0.1.el8 and newer (not in RHCOS yet).

Comment 20 Micah Abbott 2021-06-24 17:25:55 UTC
I'm not sure why I set this to CLOSED between comment #8 and comment #9, but I am going to change the state so it can correctly be attached to the OCP 4.8 errata.

Comment 22 Micah Abbott 2021-06-24 17:28:54 UTC
(In reply to Dusty Mabe from comment #19)
> (In reply to Micah Abbott from comment #17)
> > RHEL BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1936610
> 
> Looks like that was fixed in NetworkManager-1.32.0-0.1.el8 and newer (not in
> RHCOS yet).

Latest RHCOS 4.8 builds include `NetworkManager-1.30.0-8.el8_4` and the RHEL BZ#1936610 is marked as VERIFIED, so moving this to VERIFIED as well.

Comment 24 errata-xmlrpc 2021-07-27 22:48:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.