Bug 2079808

Summary: [RFE] [NMSTATE] IPv4 hash-based multipath routing is not working
Product: Red Hat Enterprise Linux 8 Reporter: Quique Llorente <ellorent>
Component: nmstateAssignee: Fernando F. Mancera <ferferna>
Status: CLOSED NEXTRELEASE QA Contact: Mingyu Shi <mshi>
Severity: medium Docs Contact: Jaroslav Klech <jklech>
Priority: high    
Version: 8.4CC: andbartl, bnemec, ferferna, fge, jiji, jishi, jklech, network-qe, sfaye, thaller, till
Target Milestone: rcKeywords: FutureFeature, Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 2079796
: 2162401 (view as bug list) Environment:
Last Closed: 2023-02-20 12:36:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2081302    
Bug Blocks: 2079796, 2162401    

Description Quique Llorente 2022-04-28 09:55:04 UTC
+++ This bug was initially created as a clone of Bug #2079796 +++

Description of problem:

My customer is having the following issue:

They are trying to get IPv4 hash-based multipath routing to work with nmstate but it looks like it's not possible yet. 
I can create a manual multipath route on the worker node with:

[core@worker-0 ~]# sudo ip route add 2.2.2.2/32 proto static scope global  nexthop via 10.123.0.56 dev eno2.100 weight 1  nexthop via 10.123.0.184 dev eno1.100 weight 1

#Route is set:
[core@worker-0 ~]# ip r
<snip>
2.2.2.2 proto static 
	nexthop via 10.123.0.56 dev eno2.100 weight 1 
	nexthop via 10.123.0.184 dev eno1.100 weight 1 


Now trying to do the same with nmstate:
---
apiVersion: nmstate.io/v1beta1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: bond-vlan-worker-0
spec:
  nodeSelector:
    kubernetes.io/hostname: worker-0
  desiredState:
    interfaces:
    - name: eno1.100
      type: vlan
      state: up
      mtu: 1500
      vlan:
        base-iface: eno1
        id: 100
      ipv4:
        address:
        - ip: 10.123.0.185
          prefix-length: 31
        dhcp: false
        enabled: true
    - name: eno2.100
      type: vlan
      state: up
      mtu: 1500
      vlan:
        base-iface: eno2
        id: 100
      ipv4:
        address:
        - ip: 10.123.0.57
          prefix-length: 31
        dhcp: false
        enabled: true
    routes:
      config:
      - destination: 2.2.2.2/32
        metric: 451
        next-hop-address: 10.123.0.184
        next-hop-interface: eno1.100
        table-id: 254
      - destination: 2.2.2.2/32
        metric: 451
        next-hop-address: 10.123.0.56
        next-hop-interface: eno2.100
        table-id: 254
        
# But it fails to create a multipath route and now it's only routing to eno1.100:        
```
[core@worker-0 /]# ip r
<snip>
2.2.2.2 via 10.123.0.184 dev eno1.100 proto static metric 451 
2.2.2.2 via 10.123.0.56 dev eno2.100 proto static metric 451 
```

Am I'm doing something wrong, or is this not implemented? I see some references in the nmstate code but not sure if we're running the version that contains this fix: https://github.com/nmstate/nmstate/commit/0eb2c3378f7df6586b0cadb4df734c95e47761bf

The image the customer is using is:

image: registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel8@sha256:2f410c46c75ee4c35027ab76b1d8378aa1edf8a9ceea6eef784cc7fbae81a4aa which does contain the above fix



Version-Release number of selected component (if applicable):

OCP 4.10


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Thomas Haller 2022-04-28 11:15:32 UTC
NetworkManager currently does not support ECMP/multipath routes for IPv4. Consequently, nmstate does neither.
That would be interesting to add.

Btw, for IPv6, NetworkManager supports it... kinda. That is because kernel will just merge identical routes that only differ by their next hop, so NetworkManager treats all routes as single-hop routes, but kernel will merge them and you effectively get ECMP. What is not supported for IPv6, is to specify the weight for the next hop.

RFEs welcome.

Comment 4 Thomas Haller 2022-05-03 10:48:24 UTC
(In reply to Andy Bartlett from comment #3)

This RFE is for nmstate. I cloned this bug for NM as bug 2081302.

Comment 5 Andy Bartlett 2022-05-03 16:06:35 UTC
Thanks Thomas much appreciated

Regards,

Andy

Comment 10 sushil kulkarni 2022-11-14 13:55:00 UTC
Removing from the 8.8 RPL based on the votes in the Devel dashboard.

Comment 14 Gris Ge 2023-01-05 09:36:52 UTC
Hi Ben,

Can we move this feature to RHEL 9.2?

Comment 15 Gris Ge 2023-01-05 09:37:29 UTC
Never mind, I found a customer request for this.

Comment 17 Andy Bartlett 2023-01-05 10:29:05 UTC
@fge I have asked the customer about this, currently my contact is out of the office til next week but I left a message on Slack for him as soon as I have an answer will let you know.

Comment 19 Gris Ge 2023-01-08 13:49:01 UTC
RHEL 9(nmstate 2.x) patch set to upstream https://github.com/nmstate/nmstate/pull/2172

Comment 26 Thomas Haller 2023-01-23 08:52:35 UTC
This bug is reported against rhel-8, and NetworkManager in rhel-8 will not be rebased again to upstream `main` (it will stick to 1.40 version, but still get rebases to minor 1.40 releases).

rhel-8 will get the feature if-and-only-if the patches get backported to nm-1-40.

whether rhel-8.8 gets the feature, depends on whether the backport happens very soon.

rhel-8.7 and rhel-9.1 are both using NetworkManager 1.40.0. They only get the features if we do z-stream updates. Backporting to upstream nm-1-40 is a requirement for that.

rhel-8.6 and rhel-9.0 are both using NetworkManager 1.36.0. They only get the features if we do z-stream updates. This requires backporting the change to upstream nm-1-40, nm-1-38 and nm-1-36 branches.

The feature is merged upstream in 1.41+, but at this moment, it still has issues and is not yet fully stable. This needs to be resolved soon, before rhel-9.2 release.

The feature brings very little new public API, which -- from that point of view -- makes it not a major problem to backport. But it will be a huge patchset. It would be better to first stabilize it fully on main (which would mean, to give it more time on main and only put it in rhel-9.2 for now).

Upstream 1.40 and rhel-8.7/rhel-9.2 is fairly similar to current main. Backporting the feature will be relatively easily possible. It's still gonna be a huge patchset. Backporting further (to older upstream branches and rhel) makes it increasingly more cumbersome and risky.


We strongly avoid (bad) behavioral changes in NetworkManager. Due to that, we almost always *can* backport patches (to Z-stream). The ECMP feature introduces new (good) behavior, which probably users won't be affected badly (most users should not notice unless they enable the feature). In my opinion, backporting to nm-1-40 makes sense, because that way it can reach rhel-8 at all. But better would be, if it only reaches rhel-8.9 and not rush it for rhel-8.8. Backporting to rhel-8.8 is more dangerous due to the shorter time in testing. Backporting to rhel-8.7/rhel-9.1 and older, is even more dangerous, more effort and harder to ensure our testing covers all cases.


TL;DR: In my opinion, there should be *very* strong arguments for getting this before rhel-9.2 and rhel-8.9. But it would be doable...