Bug 2071985

Summary: Nmstate Verification when creating bond over eth interfaces
Product: Red Hat Enterprise Linux 8 Reporter: Radim Hrazdil <rhrazdil>
Component: NetworkManagerAssignee: Thomas Haller <thaller>
Status: CLOSED ERRATA QA Contact: David Jaša <djasa>
Severity: high Docs Contact:
Priority: high    
Version: 8.4CC: bgalvani, cstabler, ferferna, fge, jiji, jishi, lrintel, myakove, network-qe, phoracek, rkhan, rnetser, sfaye, sukulkar, thaller, till, vbenes, ysegev
Target Milestone: rcKeywords: Regression, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: NetworkManager-1.39.3-1.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-08 10:07:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Reproducer script
none
NetworkManager trace log
none
rh2071985-repro.sh
none
rh2071985-rhel-8.4.log
none
rh2071985-rhel-8.5.log none

Description Radim Hrazdil 2022-04-05 11:07:42 UTC
Description of problem:

Nmstate returns Verification error when crating a bond over two eth interfaces


The current state of the two ethernet interfaces:
[cnv-qe-jenkins@n-yoss-411-ovn-nkjcx-executor ~]$ oc get nns c01-rn411-sdn-pvtbd-worker-0-htfjr -ojson | jq '.status.currentState.interfaces[] | select(."name" == "ens9")'
{
  "ipv4": {
    "address": [],
    "dhcp": false,
    "enabled": false
  },
  "ipv6": {
    "address": [],
    "autoconf": false,
    "dhcp": false,
    "enabled": false
  },
  "lldp": {
    "enabled": false
  },
  "mac-address": "FA:16:3E:CF:D0:93",
  "mtu": 1450,
  "name": "ens9",
  "state": "up",
  "type": "ethernet"
}
[cnv-qe-jenkins@n-yoss-411-ovn-nkjcx-executor ~]$ 
[cnv-qe-jenkins@n-yoss-411-ovn-nkjcx-executor ~]$ oc get nns c01-rn411-sdn-pvtbd-worker-0-htfjr -ojson | jq '.status.currentState.interfaces[] | select(."name" == "ens11")'
{
  "ipv4": {
    "address": [],
    "dhcp": false,
    "enabled": false
  },
  "ipv6": {
    "address": [],
    "autoconf": false,
    "dhcp": false,
    "enabled": false
  },
  "lldp": {
    "enabled": false
  },
  "mac-address": "FA:16:3E:40:50:86",
  "mtu": 1450,
  "name": "ens11",
  "state": "up",
  "type": "ethernet"
}


Applying following desiredState:

interfaces:
- ipv4:
    auto-dns: true
    dhcp: false
    enabled: false
  ipv6:
    auto-dns: true
    autoconf: false
    dhcp: false
    enabled: false
  link-aggregation:
    mode: balance-xor
    options:
      miimon: '120'
    port:
    - ens11
    - ens9
  mtu: 1450
  name: mtx-bond1
  state: up
  type: bond
- mtu: 1400
  name: ens11
  state: up
  type: ethernet
- mtu: 1400
  name: ens9
  state: up
  type: ethernet


Results in the following difference reported with VerificationError
      difference
      ==========
      --- desired
      +++ current
      @@ -2,4 +2,11 @@
       name: ens11
       type: ethernet
       state: up
      -mtu: 1400
      +ipv4:
      +  enabled: false
      +ipv6:
      +  enabled: false
      +lldp:
      +  enabled: false
      +mac-address: FA:16:3E:40:50:86
      +mtu: 1450


Version-Release number of selected component (if applicable):
1.0.2

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Comment 1 Yossi Segev 2022-04-05 11:22:50 UTC
This is the policy I tried to apply:


apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: matrix-bond1-nncp
spec:
  desiredState:
    interfaces:
    - ipv4:
        auto-dns: true
        dhcp: false
        enabled: false
      ipv6:
        auto-dns: true
        autoconf: false
        dhcp: false
        enabled: false
      link-aggregation:
        mode: balance-xor
        options:
          miimon: '120'
        port:
        - ens11
        - ens9
      mtu: 1450
      name: mtx-bond1
      state: up
      type: bond
    - mtu: 1400
      name: ens11
      state: up
      type: ethernet
    - mtu: 1400
      name: ens9
      state: up
      type: ethernet
  nodeSelector:
    kubernetes.io/hostname: c01-rn411-sdn-pvtbd-worker-0-htfjr

Comment 2 Gris Ge 2022-04-06 07:55:13 UTC
Hi Radim,

The bond port(ens11 and ens9) is holding MTU 1400, while the bond itself is using 1450 MTU. Is this configure error or dedicate for special usage?

Comment 3 Yossi Segev 2022-04-06 11:26:36 UTC
Hi Gris,

After checking this scenario, I found that setting the same MTU (1400) for the bond interface and its 2 ports ends successfully.
But the configuration described in this bug used to work until recently, and all our downstream bond tests passed with this configuration, where the bond is assigned +50 MTU more than its ports.
What happened?

Thank you,
Yossi

Comment 4 Yossi Segev 2022-04-06 12:11:18 UTC
And another clarification - to rule-out the possibility that the 1450 MTU was too high, I tried the same policy, only this time with setting the bond's MTU to 1400 and the ports' MTU to 1350, and the configuration failed.

Comment 5 Radim Hrazdil 2022-04-07 07:30:15 UTC
Hi Gris, I don't know if there is an actual use-case behind setting the MTU higher on the bond than on the ports,
but it doesn't seem like a conflicting configuration.

@Yossi was there ever any particular reason to add 50B to the bond?


If you decide to force the MTU to be the same across bond and bonded interfaces, it would be noce if there was an error message saying that there is a discrepancy in MTU setting

Comment 6 Fernando F. Mancera 2022-04-07 07:35:31 UTC
(In reply to Yossi Segev from comment #3)
> Hi Gris,
> 
> After checking this scenario, I found that setting the same MTU (1400) for
> the bond interface and its 2 ports ends successfully.
> But the configuration described in this bug used to work until recently, and
> all our downstream bond tests passed with this configuration, where the bond
> is assigned +50 MTU more than its ports.
> What happened?
> 
> Thank you,
> Yossi

Hello!

We did not introduce changes to Nmstate 1.0.2 so it doesn't seem Nmstate faul. I must say that Nmstate is not reporting this properly IMO. Do you know if NetworkManager was updated in your environment? Thank you!

Comment 7 Yossi Segev 2022-04-07 09:32:44 UTC
> I must say that Nmstate is not reporting this properly IMO.

I totally agree, and there's already https://bugzilla.redhat.com/show_bug.cgi?id=2044150 (which evolved from https://bugzilla.redhat.com/show_bug.cgi?id=1812559 I opened) to handle it.


> Do you know if NetworkManager was updated in your environment?

I don't know if it was updated, but I can say that the current NetworkManager on my cluster node is
sh-4.4# NetworkManager --version
1.32.10-4.el8

@Radim:
> @Yossi was there ever any particular reason to add 50B to the bond?

Actually what we do is reduce 50 from the ports, rather than adding 50 to the bond.
I tried to recall and also consult with colleagues, but unfortunately we can't remember why we had to assign different MTUs to the bond and to its ports.
what we do know is that we had to do it, because our tests failed otherwise. And keep in mind that the tests passed successfully until recently, with different MTU on the bond and on the ports.

Comment 8 Gris Ge 2022-04-13 13:35:40 UTC
Confirmed. The same yaml works well on NetworkManager-1.30.0-10.el8_4.x86_64 (RHEL 8.4.0), but fail since NetworkManager-1.32.10-4.el8(RHEL 8.5.0).

I will create simpler reproducer and collect NM logs for NetworkManager team to investigate this regression.

For the MTU difference between port and bond, as long as kernel support so and it used to work, we should keep supporting it.

Comment 9 Gris Ge 2022-04-13 13:53:42 UTC
Created attachment 1872179 [details]
Reproducer script

This is simplified reproducer.

Comment 10 Gris Ge 2022-04-13 13:54:13 UTC
Created attachment 1872181 [details]
NetworkManager trace log

Comment 11 Gris Ge 2022-04-13 13:55:26 UTC
Setting regression keyword as this used to works well in RHEL 8.4.

Changing component to NetworkManage for follow up investigation.

Comment 12 Gris Ge 2022-04-13 13:57:56 UTC
Hi Petr,

Which RHEL releases are your team expecting the fix been backported to?

Comment 13 Petr Horáček 2022-04-14 09:23:48 UTC
I'm not convinced this should be urgent. Unless there is a clear reason why would a customer set different MTUs. IIUIC such config does not have a networking justification, does it? I'd suggest we lower it to high at most. Please protest if I'm missing something.

For the same reason I think it is ok to get a fix of this to 8.6 batch 1. With that it would reach OCP 4.11.z.

Comment 14 Thomas Haller 2022-04-21 16:17:06 UTC
Created attachment 1874176 [details]
rh2071985-repro.sh

very similar to attachment 1872179 [details], minor modifications and this is what was used for the following logs.

Comment 15 Thomas Haller 2022-04-21 16:18:17 UTC
Created attachment 1874178 [details]
rh2071985-rhel-8.4.log

logfile running rh2071985-repro.sh on rhel-8.4

NetworkManager 1:1.30.0-13.el8_4
kernel 4.18.0-305.25.1.el8_4

Comment 16 Thomas Haller 2022-04-21 16:19:22 UTC
Created attachment 1874179 [details]
rh2071985-rhel-8.5.log

logfile running rh2071985-repro.sh on rhel-8.5

NetworkManager 1:1.32.10-5.el8_5
kernel 4.18.0-348.23.1.el8_5

Comment 17 Thomas Haller 2022-04-21 16:23:24 UTC
(In reply to Gris Ge from comment #10)
> Created attachment 1872181 [details]
> NetworkManager trace log


Kernel changes the MTU when enslaving to a bond:


You can also see that with this shell script:
```
#!/bin/bash

set -x
set -e

ip netns del t 2>/dev/null || :
ip netns add t

ip -netns t link add name v type veth peer name w
ip -netns t link add name b type bond
ip -netns t link set dev w up
ip -netns t link set dev b up

ip -netns t link set dev b mtu 1450
ip -netns t link set dev v mtu 1400
ip -netns t link set dev w mtu 1400

ip -netns t link

ip -netns t link set dev v master b

ip -netns t link
```

this happens on all kernels (rhel-8.4 (4.18.0-305.25.1.el8_4.x86_64); rhel-8.5 (4.18.0-348.23.1.el8_5.x86_64), Fedora 35).


the difference between rhel-8.4 and rhel-8.5 is that on rhel-8.4 NM will change the MTU *after* enslaving the port. Still investigating why.



As to whether this configuration is useful, seems questionable to me. What happens if the bond interface wants to send a package of 1450 bytes? Should it be truncated by the port?

Comment 18 Thomas Haller 2022-04-22 13:53:22 UTC
This changed in 1.32.0, with [1]. Now we set the MTU before "ip-config" state.

Interestingly, we enslave ports during "ip-config" state ([2]), so now we first set the MTU, before enslaving.
This results in kernel winning, when it changes the MTU when attaching the port.

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/e67ddd826fae1a348a7a253e9c37744c05645112
[2] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/e67ddd826fae1a348a7a253e9c37744c05645112/src/core/devices/nm-device.c#L6364


A simple fix would be to change the MTU again after enslaving.

A follow-up rework should be, that we don't enslave the port in state "ip-config", it is the wrong state. 



The large question is however whether it's correct to fight with kernel over setting the MTU.
It's not clear that setting a smaller MTU on the bond port results in a working configuration (see [3], [4]).

[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/bonding/bond_main.c?h=v5.17#n3603
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/bonding/bond_main.c?h=v5.17#n4372


@Yossi, could you clarify again, why this is a useful setup? Why is this done, and does this even work in a correct bond configuration? Did you actually test that it works, or did you merely test that such a configuration could be created. You already touched on that topic, but the answer is not clear to me. Could you provide more information about this use-case? Thank you.

Comment 19 Yossi Segev 2022-04-24 09:30:38 UTC
@Meni
It seems to me that the fix that Thomas is suggesting in comment #18 ("change the MTU again after enslaving") is redundant, as it actually means "to fight with kernel", as Thomas put it.
I think that we should change our tests, and keep the same MTU for the bond interface and its ports.
But we should keep in mind, that if the future rework that Thomas is suggesting ("don't enslave the port in state "ip-config", it is the wrong state") is implemented, we will probably be forced to change our tests again.
What do you think?

Comment 20 Thomas Haller 2022-04-26 11:16:18 UTC
with https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1199, NetworkManager is going to set the MTU after attaching the port.

This may not result in a working configuration. But it seems better to honor the user's configuration, than silently not applying it. After all, the user must configure a sensible MTU, and if they fail to do so, they have problems anyway.


This should restore the rhel-8.4 (1.30) behavior.

Comment 26 errata-xmlrpc 2022-11-08 10:07:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7680

Comment 27 Red Hat Bugzilla 2023-09-18 04:34:45 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days