Description of problem: We want to deploy some RHCOS nodes with a bond interface with jumbo frames enabled. We're using Dracut kernel command line options to configure the bond at boot time. The bonding configuration works, but the MTU is not properly set on the bond interface. Version-Release number of selected component (if applicable): 4.6 How reproducible: Always Steps to Reproduce: 1. Run an RHCOS image and include the following kernel args: bond=bond0:ens3,ens4:mode=802.3ad,lacp_rate=0,miimon=100:9000 ip=192.168.125.10::192.168.125.1:255.255.255.0:testvm.local:bond0:none:9000: Actual results: Interfaces get the proper MTU set (9000) but the bond interface does not (1500). $ ip a l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc fq_codel master bond0 state UP group default qlen 1000 link/ether 52:54:00:d5:60:91 brd ff:ff:ff:ff:ff:ff 3: ens4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc fq_codel master bond0 state UP group default qlen 1000 link/ether 52:54:00:d5:60:91 brd ff:ff:ff:ff:ff:ff 5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:d5:60:91 brd ff:ff:ff:ff:ff:ff inet 192.168.125.10/24 brd 192.168.125.255 scope global noprefixroute bond0 valid_lft forever preferred_lft forever Expected results: Bond interface gets its MTU configured to 9000 Additional info: We used the dracut kernel args as documented in the following link: https://man7.org/linux/man-pages/man7/dracut.cmdline.7.html
This is likely more an NM question/bug. RHCOS just takes the NM connection files that nm-initrd-generator created and propagates those to the real root. As a sanity-check, you could see if it works correctly on traditional RHEL (there's no NM keyfile propagation there, but AFAIK the network doesn't get taken down before switchroot, so it should remain in the same state it was set up by NM in the initrd).
I agree with Jonathan's assessment. Pinging @thaller for more eyes. Could you provide the full journal from the node showing the configuration + activation of the network interfaces? Could you provide the contents of `/etc/NetworkManager/system-connections`?
/etc/NetworkManager/system-connections/ens3.nmconnection [connection] id=ens3 uuid=8ceff32e-caad-43b4-aa9f-b71a270e3353 type=ethernet interface-name=ens3 master=78fd2db5-22e7-4c80-9993-62285f6c5a95 multi-connect=1 permissions= slave-type=bond [ethernet] mac-address-blacklist= mtu=9000 ---------------------------------------------------------- /etc/NetworkManager/system-connections/ens4.nmconnection [connection] id=ens4 uuid=6791a31d-19d6-4f66-b8cc-f6c1ca029fc9 type=ethernet interface-name=ens4 master=78fd2db5-22e7-4c80-9993-62285f6c5a95 multi-connect=1 permissions= slave-type=bond [ethernet] mac-address-blacklist= mtu=9000 ---------------------------------------------------------- /etc/NetworkManager/system-connections/bond0.nmconnection [connection] id=bond0 uuid=78fd2db5-22e7-4c80-9993-62285f6c5a95 type=bond interface-name=bond0 multi-connect=1 permissions= [bond] lacp_rate=0 miimon=100 mode=802.3ad [ipv4] address1=192.168.125.10/24,192.168.125.1 dhcp-hostname=testvm.local dns-search= may-fail=false method=manual [ipv6] addr-gen-mode=eui64 dhcp-hostname=testvm.local dns-search= method=disabled [proxy]
Created attachment 1759372 [details] boot log
Looking at the journal, I see this message: `Feb 25 18:45:47 localhost nm-initrd-gener[454]: <warn> [1614278747.4742] cmdline-reader: 'bond' does not support setting mtu` I am not a networking expert, but I believe this is the correct behavior. (Which begs the question of why the dracut man page claims you can configure the MTU on the logical bond...) My theory: since the logical bonded interface can be made of up multiple physical interfaces connected to multiple L2 devices, it is not reasonable to enforce MTU on the logical bonded interface. Put another way, if you have multiple underlying interfaces configured with different MTUs, the egress traffic from the bonded interface is going to use the MTU for the primary physical interface that the bond is currently using. You should be able to confirm this by doing a similar test with a RHEL 8 system, where two physical interfaces are configured with different MTUs and inspecting the packet sizes from the bonded interface while switching the underlying primary interface. (Not sure this is possible, but would be a neat experiment!) I'm going to send this over to the NetworkManager folks to have a look at this and correct any misconceptions I have.
Hi, the bond driver should propagate the MTU from the bond interface to the ports [1]: [root@localhost ~]# ip link add dummy1 type dummy [root@localhost ~]# ip link add dummy2 type dummy [root@localhost ~]# ip link set dummy1 mtu 1300 [root@localhost ~]# ip link set dummy2 mtu 1400 [root@localhost ~]# ip link add bond1 type bond [root@localhost ~]# ip link set dummy1 master bond1 [root@localhost ~]# ip link set dummy2 master bond1 [root@localhost ~]# ip link set bond1 mtu 2000 [root@localhost ~]# ip -o link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group ... 6938: dummy1: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 2000 qdisc noqueue master bond1 ... 6939: dummy2: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 2000 qdisc noqueue master bond1 ... 6940: bond1: <BROADCAST,MULTICAST,MASTER> mtu 2000 qdisc noop state DOWN mode DEFAULT ... So, it makes sense to set the MTU on a bond connection. On RHEL 8.2, the NM initrd generator couldn't set the MTU for bonds/teams/bridges due to a bug. That was fixed in commit [2] and it should work in RHEL 8.3 (NM 1.26). [1] https://access.redhat.com/solutions/64136 [2] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/79f70bf5d62213e7e6ce2c5e15fdf6981dc19ef0
(In reply to Beniamino Galvani from comment #7) > Hi, > > the bond driver should propagate the MTU from the bond interface to the > ports [1]: > > [root@localhost ~]# ip link add dummy1 type dummy > [root@localhost ~]# ip link add dummy2 type dummy > [root@localhost ~]# ip link set dummy1 mtu 1300 > [root@localhost ~]# ip link set dummy2 mtu 1400 > [root@localhost ~]# ip link add bond1 type bond > [root@localhost ~]# ip link set dummy1 master bond1 > [root@localhost ~]# ip link set dummy2 master bond1 > [root@localhost ~]# ip link set bond1 mtu 2000 > [root@localhost ~]# ip -o link > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode > DEFAULT group ... > 6938: dummy1: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 2000 qdisc noqueue > master bond1 ... > 6939: dummy2: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 2000 qdisc noqueue > master bond1 ... > 6940: bond1: <BROADCAST,MULTICAST,MASTER> mtu 2000 qdisc noop state DOWN > mode DEFAULT ... > > So, it makes sense to set the MTU on a bond connection. > > On RHEL 8.2, the NM initrd generator couldn't set the MTU for > bonds/teams/bridges due to a bug. > > That was fixed in commit [2] and it should work in RHEL 8.3 (NM 1.26). > > [1] https://access.redhat.com/solutions/64136 > [2] > https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/ > 79f70bf5d62213e7e6ce2c5e15fdf6981dc19ef0 Thanks for the correction and additional information, Beniamino! I'll pull this back to RHCOS so we can track the inclusion of NM 1.26 in our builds. It is included as part of OCP/RHCOS 4.7, but doesn't seem likely to get included in OCP 4.6.z, as we are using RHEL 8.2 EUS as the base content set in RHCOS 4.6. @Mario can you try configuring the MTU on the bond with RHCOS 4.7?
(In reply to Micah Abbott from comment #8) > (In reply to Beniamino Galvani from comment #7) > > Hi, > > > > the bond driver should propagate the MTU from the bond interface to the > > ports [1]: > > > > [root@localhost ~]# ip link add dummy1 type dummy > > [root@localhost ~]# ip link add dummy2 type dummy > > [root@localhost ~]# ip link set dummy1 mtu 1300 > > [root@localhost ~]# ip link set dummy2 mtu 1400 > > [root@localhost ~]# ip link add bond1 type bond > > [root@localhost ~]# ip link set dummy1 master bond1 > > [root@localhost ~]# ip link set dummy2 master bond1 > > [root@localhost ~]# ip link set bond1 mtu 2000 > > [root@localhost ~]# ip -o link > > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode > > DEFAULT group ... > > 6938: dummy1: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 2000 qdisc noqueue > > master bond1 ... > > 6939: dummy2: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 2000 qdisc noqueue > > master bond1 ... > > 6940: bond1: <BROADCAST,MULTICAST,MASTER> mtu 2000 qdisc noop state DOWN > > mode DEFAULT ... > > > > So, it makes sense to set the MTU on a bond connection. > > > > On RHEL 8.2, the NM initrd generator couldn't set the MTU for > > bonds/teams/bridges due to a bug. > > > > That was fixed in commit [2] and it should work in RHEL 8.3 (NM 1.26). > > > > [1] https://access.redhat.com/solutions/64136 > > [2] > > https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/ > > 79f70bf5d62213e7e6ce2c5e15fdf6981dc19ef0 > > Thanks for the correction and additional information, Beniamino! > > I'll pull this back to RHCOS so we can track the inclusion of NM 1.26 in our > builds. It is included as part of OCP/RHCOS 4.7, but doesn't seem likely to > get included in OCP 4.6.z, as we are using RHEL 8.2 EUS as the base content > set in RHCOS 4.6. > > @Mario can you try configuring the MTU on the bond with RHCOS 4.7? Configuring MTU on the bond with RHCOS 4.7 worked with these kernel args: "bond=bond0:ens3,ens4:mode=802.3ad,lacp_rate=0,miimon=100:9000 ip=192.168.125.10::192.168.125.1:255.255.255.0:testvm.local:bond0:none:9000:" The next step was trying to setup MTU on a VLAN subinterface for the bond which is the real use case, that failed tho. These are the kernel args used: "bond=bond0:ens3,ens4:mode=802.3ad,lacp_rate=0,miimon=100:9000 ip=192.168.125.10::192.168.125.1:255.255.255.0:testvm.local:bond0.19:none:9000: vlan=bond0.19:bond0" We get 1500 MTU configured on bond0 and bond0.19 interfaces. The system-connections look like this: /etc/NetworkManager/system-connections/bond0.nmconnection [connection] id=bond0 uuid=78fd2db5-22e7-4c80-9993-62285f6c6a96 type=bond interface-name=bond0 multi-connect=1 permissions= [bond] lacp_rate=0 miimon=100 mode=802.3ad [ipv4] dns-search= method=disabled [ipv6] addr-gen-mode=eui64 dns-search= method=disabled [proxy] ----------- /etc/NetworkManager/system-connections/bond0.19.nmconnection [connection] id=bond0.19 uuid=78fd2db5-22e7-4c80-9993-62285f6c2acf type=vlan interface-name=bond0.19 multi-connect=1 permissions= [ethernet] mac-address-blacklist= mtu=9000 [vlan] egress-priority-map= flags=1 id=19 ingress-priority-map= parent=bond0 [ipv4] address1=192.168.125.10/24,192.168.125.1 dhcp-hostname=testvm.local dns-search= may-fail=false method=manual [ipv6] addr-gen-mode=eui64 dhcp-hostname=testvm.local dns-search= method=disabled [proxy] Versions tested: - nmcli tool: 1.26.0-12.1.rhaos4.7.el8 - NetworkManager: 1.26.0-12.1.rhaos4.7.el8.x86_64
(In reply to Mario Vázquez from comment #9) > The next step was trying to setup MTU on a VLAN subinterface for the bond > which is the real use case, that failed tho. > > These are the kernel args used: > "bond=bond0:ens3,ens4:mode=802.3ad,lacp_rate=0,miimon=100:9000 > ip=192.168.125.10::192.168.125.1:255.255.255.0:testvm.local:bond0.19:none: > 9000: vlan=bond0.19:bond0" > > > We get 1500 MTU configured on bond0 and bond0.19 interfaces. You are right, this command line doesn't work properly because the MTU specified in bond=<bondname>[:<bondslaves>:[:<options>[:<mtu>]]] gets wrongly assigned to the bond ports, instead of the bond itself. I opened a merge request [1] to fix this issue. For now you can use the ip= argument to specified the MTU for the bond. Just add: ip=bond0:none:9000 to the command line. [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/767
@bgalvani I can see the MR on GitLab has passed the required CI but it requires a rebase. Once that MR is merged when can we expect to get that extra fix into RHCOS?
The fix is now merged upstream. > Once that MR is merged when can we expect to get that extra fix into RHCOS? I think this is more a question for the RHCOS team.
(In reply to Beniamino Galvani from comment #12) > > Once that MR is merged when can we expect to get that extra fix into RHCOS? > > I think this is more a question for the RHCOS team. It'll depend on when the fix lands in RHEL. Once that fix is included in a build of NetworkManager in RHEL 8.3, we can included it as part of RHCOS 4.7. Though, since this is a boot-time problem, we would have to build a new set of boot images for RHCOS 4.7 and go through the process of releasing them. However since there is a workaround available, we would prefer not to build new boot images as the process is quite expensive from a process perspective. Beniamino, do you know when the fix would land in a RHEL build?
At this time we need to ask exception+ to get the fix into RHEL 8.4. Since the fix seems important, I would do it. For RHEL 8.3 we need z-stream approval but since there is a workaround probably there is less justification to have the fix backported. What do you think?
(In reply to Beniamino Galvani from comment #14) > At this time we need to ask exception+ to get the fix into RHEL 8.4. Since > the fix seems important, I would do it. > > For RHEL 8.3 we need z-stream approval but since there is a workaround > probably there is less justification to have the fix backported. > > What do you think? Agreed; I will file a BZ against NM for 8.4, but I think the backport to 8.3.z can be omitted.
RHEL BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1936610
Thanks for the bug report. I will collect test feedback and request exception for it there.
(In reply to Micah Abbott from comment #17) > RHEL BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1936610 Looks like that was fixed in NetworkManager-1.32.0-0.1.el8 and newer (not in RHCOS yet).
I'm not sure why I set this to CLOSED between comment #8 and comment #9, but I am going to change the state so it can correctly be attached to the OCP 4.8 errata.
(In reply to Dusty Mabe from comment #19) > (In reply to Micah Abbott from comment #17) > > RHEL BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1936610 > > Looks like that was fixed in NetworkManager-1.32.0-0.1.el8 and newer (not in > RHCOS yet). Latest RHCOS 4.8 builds include `NetworkManager-1.30.0-8.el8_4` and the RHEL BZ#1936610 is marked as VERIFIED, so moving this to VERIFIED as well.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438