RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2048988 - NNCP deployment fails on applying ipv6 routes
Summary: NNCP deployment fails on applying ipv6 routes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: nmstate
Version: 8.4
Hardware: Unspecified
OS: Unspecified
urgent
unspecified
Target Milestone: rc
: 8.6
Assignee: Gris Ge
QA Contact: Mingyu Shi
URL:
Whiteboard:
Depends On:
Blocks: 2054053 2054054
TreeView+ depends on / blocked
 
Reported: 2022-02-01 09:57 UTC by Adi Zavalkovsky
Modified: 2022-05-10 13:55 UTC (History)
12 users (show)

Fixed In Version: nmstate-1.2.1-1.el8
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2053027 2054053 2054054 (view as bug list)
Environment:
Last Closed: 2022-05-10 13:34:48 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nns before bridge deployment (16.96 KB, text/plain)
2022-02-01 09:57 UTC, Adi Zavalkovsky
no flags Details
step1.yml (763 bytes, text/plain)
2022-02-08 12:38 UTC, Gris Ge
no flags Details
step2.yml (888 bytes, text/plain)
2022-02-08 12:39 UTC, Gris Ge
no flags Details
Reproducer script (1.10 KB, application/x-shellscript)
2022-02-14 05:20 UTC, Gris Ge
no flags Details
NetworkManager trace log (378.66 KB, text/plain)
2022-02-14 06:13 UTC, Gris Ge
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github nmstate nmstate pull 1800 0 None open python route: Add support of multipath route 2022-02-08 08:10:54 UTC
Red Hat Issue Tracker RHELPLAN-110502 0 None None None 2022-02-01 10:03:58 UTC
Red Hat Product Errata RHEA-2022:1772 0 None None None 2022-05-10 13:35:11 UTC

Internal Links: 2053027

Description Adi Zavalkovsky 2022-02-01 09:57:02 UTC
Created attachment 1858288 [details]
nns before bridge deployment

Description of problem:
When trying to deploy a bridge behind a primary interface on a BM node, it fails during route setting, which is odd since it doesn't fail on another node in the same cluster.


Version-Release number of selected component (if applicable):
nmcli tool, version 1.30.0-13.el8_4


How reproducible:
NNCE -
desiredState:
    interfaces:
    - bridge:
        options:
          stp:
            enabled: false
        port:
        - name: eno1
          vlan: {}
      ipv4:
        address:
        - ip: 10.9.96.50
          prefix-length: 24
        dhcp: false
        enabled: true
      ipv6:
        address:
        - ip: 2620:52:0:960:e643:4bff:fe57:4a90
          prefix-length: 64
        - ip: fe80::e643:4bff:fe57:4a90
          prefix-length: 64
        autoconf: false
        dhcp: false
        enabled: true
      name: capture-br1
      state: up
      type: linux-bridge
    routes:
      config:
      - destination: 2620:52:0:960::/64
        metric: 104
        next-hop-address: '::'
        next-hop-interface: capture-br1
        table-id: 254
      - destination: fe80::/64
        metric: 104
        next-hop-address: '::'
        next-hop-interface: capture-br1
        table-id: 254
      - destination: ::/0
        metric: 104
        next-hop-address: fe80::c242:d000:645f:92a0
        next-hop-interface: capture-br1
        table-id: 254
      - destination: 0.0.0.0/0
        metric: 104
        next-hop-address: 10.9.96.254
        next-hop-interface: capture-br1
        table-id: 254
      - destination: 10.9.96.0/24
        metric: 104
        next-hop-address: 0.0.0.0
        next-hop-interface: capture-br1
        table-id: 254
      - destination: 10.9.96.0/24
        metric: 104
        next-hop-address: 0.0.0.0
        next-hop-interface: capture-br1
        table-id: 254



Actual results:
      error reconciling NodeNetworkConfigurationPolicy at desired state apply: ,
      failed to execute nmstatectl set --no-commit --timeout 480: 'exit status 1'
      Unhandled AF_SPEC_BRIDGE_INFO
        0 [2, 0]
      Unhandled AF_SPEC_BRIDGE_INFO
        1 [1, 0]
      Unhandled AF_SPEC_BRIDGE_INFO
        0 [2, 0]
      Unhandled AF_SPEC_BRIDGE_INFO
        1 [1, 0]
      Unhandled AF_SPEC_BRIDGE_INFO
        0 [2, 0]
 .........
 .........
      Unhandled AF_SPEC_BRIDGE_INFO
        1 [1, 0]
      Unhandled AF_SPEC_BRIDGE_INFO
        0 [2, 0]
      Unhandled AF_SPEC_BRIDGE_INFO
        1 [1, 0]
      libnmstate.error.NmstateVerificationError:
      desired
      =======
      ---
      routes:
        config:
        - destination: 0.0.0.0/0
          metric: 104
          next-hop-address: 10.9.96.254
          next-hop-interface: capture-br1
          table-id: 254
        - destination: 10.9.96.0/24
          metric: 104
          next-hop-address: 0.0.0.0
          next-hop-interface: capture-br1
          table-id: 254
        - destination: 2620:52:0:960::/64
          metric: 104
          next-hop-address: '::'
          next-hop-interface: capture-br1
          table-id: 254
        - destination: ::/0
          metric: 104
          next-hop-address: fe80::c242:d000:645f:92a0
          next-hop-interface: capture-br1
          table-id: 254
        - destination: fe80::/64
          metric: 104
          next-hop-address: '::'
          next-hop-interface: capture-br1
          table-id: 254
      current
      =======
      ---
      routes:
        config:
        - destination: 0.0.0.0/0
          metric: 104
          next-hop-address: 10.9.96.254
          next-hop-interface: capture-br1
          table-id: 254
        - destination: 10.9.96.0/24
          metric: 104
          next-hop-address: 0.0.0.0
          next-hop-interface: capture-br1
          table-id: 254
        - destination: 2620:52:0:960::/64
          metric: 104
          next-hop-address: '::'
          next-hop-interface: capture-br1
          table-id: 254
        - destination: fe80::/64
          metric: 104
          next-hop-address: '::'
          next-hop-interface: capture-br1
          table-id: 254
      difference
      ==========
      --- desired
      +++ current
      @@ -16,11 +16,6 @@
           next-hop-address: '::'
           next-hop-interface: capture-br1
           table-id: 254
      -  - destination: ::/0
      -    metric: 104
      -    next-hop-address: fe80::c242:d000:645f:92a0
      -    next-hop-interface: capture-br1
      -    table-id: 254
         - destination: fe80::/64
           metric: 104
           next-hop-address: '::'



Expected results:
NNCP Applied succesfully, like in 


Additional info:
Attaching NetworkManager logs for successful and unsuccessful nodes, nns before deployment.
cnv-qe-10 - successful node
cnv-qe-11 - failing node

Comment 4 Adi Zavalkovsky 2022-02-03 10:47:09 UTC
Update -
On a new run, the node which previously had the changes applied successfully is also failing.
It seems that deploying bridges behind a node's primary iface is failing on BM nodes when applying IPv6 routes.

Comment 5 Petr Horáček 2022-02-03 13:32:19 UTC
Scrubbing:

We can drop IPv6 from our current test plans. It is not important the use-case of the epic. Could you try it again without the ipv6 portion?

Then we should follow up with development to see whether the issue described here should be solved in nmpolicy or nmstate.

Comment 6 Gris Ge 2022-02-07 07:22:32 UTC
Hi Adi,

I failed to reproduce this in my VM and the log is not showing the root cause. Could you ping me when you have time so that I can do a live debug?

Thank you!

Comment 7 Adi Zavalkovsky 2022-02-07 09:46:12 UTC
Sure, @fge, taking this privately

Comment 8 Gris Ge 2022-02-08 02:00:50 UTC
After applied the desire state via nmstatectl with `--no-commit --no-verify` argument.

I got

```
default proto static metric 102 pref medium
	nexthop via fe80::c242:d000:645f:92a0 dev eno1 weight 1
	nexthop via fe80::c242:d000:645f:92a0 dev capture-br1 weight 1
[root@cnv-qe-10 /]# ip addr show eno1
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master capture-br1 state UP group default qlen 1000
    link/ether e4:43:4b:57:47:50 brd ff:ff:ff:ff:ff:ff
```

So the root causes are:

 1. NetworkManager did not remove route entry of eno1 when attaching eno1 to bridge.
 2. Nmstate is ignoring multipath route.

Let me try to workaround this in nmstate before asking NetworkManager to fix it.

Comment 9 Gris Ge 2022-02-08 08:08:35 UTC
Problem reproduced on a ppc64le server with i40e NIC with below yamls.

On the same server, if I use veth NIC, no problem found. In VM, e100e NIC has no such problem.
Will leave the remaining root cause debug to NetworkManager team.

Nmstate workaround fixes:

 * RHEL 8.6: https://github.com/nmstate/nmstate/pull/1800
 * RHEL 8.5: https://github.com/nmstate/nmstate/pull/1802
 * RHEL 8.4: https://github.com/nmstate/nmstate/pull/1801

first.yml

```
---
routes:
  config:
  - destination: ::/0
    metric: 102
    next-hop-address: fe80::c242:d000:645f:92a0
    next-hop-interface: enP2p1s0f1
    table-id: 254
interfaces:
- name: enP2p1s0f1
  type: ethernet
  state: up
  ipv4:
    enabled: true
    address:
    - ip: 10.9.96.49
      prefix-length: 24
    dhcp: false
  ipv6:
    enabled: true
    address:
    - ip: 2620:52:0:960:e643:4bff:fe57:4750
      prefix-length: 64
    - ip: fe80::e643:4bff:fe57:4750
      prefix-length: 64
    autoconf: false
    dhcp: false
```


second.yml

```
---
routes:
  config:
  - destination: ::/0
    metric: 102
    next-hop-address: fe80::c242:d000:645f:92a0
    next-hop-interface: capture-br1
    table-id: 254
interfaces:
- name: capture-br1
  type: linux-bridge
  state: up
  bridge:
    options:
      stp:
        enabled: false
    port:
    - name: enP2p1s0f1
      vlan: {}
  ipv4:
    enabled: true
    address:
    - ip: 10.9.96.49
      prefix-length: 24
    dhcp: false
  ipv6:
    enabled: true
    address:
    - ip: 2620:52:0:960:e643:4bff:fe57:4750
      prefix-length: 64
    - ip: fe80::e643:4bff:fe57:4750
      prefix-length: 64
    autoconf: false
    dhcp: false
```

Comment 10 Gris Ge 2022-02-08 08:42:22 UTC
The reproducer in https://bugzilla.redhat.com/show_bug.cgi?id=2048988#c9 is incorrect.

Still digging.

Comment 12 Gris Ge 2022-02-08 12:38:17 UTC
Created attachment 1859781 [details]
step1.yml

Comment 13 Gris Ge 2022-02-08 12:39:08 UTC
Created attachment 1859782 [details]
step2.yml

Comment 14 Gris Ge 2022-02-08 12:40:57 UTC
To reproduce this problem in VM.

1. use e1000e as driver of VM NIC interface. Assume it is named as enp9s0.
2. Download above two yaml files.
3. dnf remove NetworkManager-config-server -y
4. systemctl restart NetworkManager
5. nmstatectl apply step1.yml
6. nmstatectl apply step2.yml

Comment 15 Gris Ge 2022-02-08 12:47:34 UTC
Hi Petr,

The root cause of this issue is CNV does not install `NetworkManager-config-server` in host. Placing this package in container does not helps.

Please contact responsible team to include that rpm in host environment where NM daemon running. This rpm only contains a config file for NetworkManager.
If including extra rpm is too much for host environment, placing below file in /etc/NetworkManager/conf.d also helps:

[main]
no-auto-default=*
ignore-carrier=*


All nmstate CI is assuming NetworkManager-config-server installed, without it, NetworkManager will automatically create profile for newly discovered interface
which has approved to be causing many problem in RHV.

For current bug, nmstate can workaround it by supporting multipath route. But including above rpm can solve other potential problems.

Comment 16 Petr Horáček 2022-02-08 12:57:53 UTC
Gris, thanks a lot for digging into this. This is very helpful.

@ellorent would you please follow-up with a bug on RHCOS, asking for the config-server RPM, explaining the motivation that Gris described above?

Comment 17 Gris Ge 2022-02-08 13:53:37 UTC
The original problem could be solved by multiple approaches:
 A. nmstate support mutlipath route.
 B. NetworkManager by default installation should remove multipath route on
    interface which is attached to a bridge.
 C. RHCOS include `NetworkManager-config-server` rpm in host environment.

We will use this bug to track effort of A): nmstate support multipath route for
RHEL 8.7. (RHEL 8.6 is in late pharse, there is no valid use case requiring
multipath route support)

CNV team will work with RHCOS off-thread for option C) for RHEL 8.4, if nmstate
workaround still require, please leave a comment requesting zstream review.

Acceptance criteria of this bug:
 * Given a fresh install RHEL 8 system with multipath route configured by:
    sudo ip route add 198.51.100.0/24 proto static scope global \
        nexthop via 192.0.2.254 dev eth1 weight 1 onlink \
        nexthop via 192.0.2.253 dev eth1 weight 256 onlink
    sudo ip -6 route add 2001:db8:e::/64 proto static scope global \
        nexthop via 2001:db8:f::254 dev eth1 weight 1 onlink \
        nexthop via 2001:db8:f::253 dev eth1 weight 256 onlink
 * When user installed nmstate and invoke `nmstatectl show`.
 * Then nmstate should convert these multipath route into several normal
   route entry.

Comment 18 Quique Llorente 2022-02-08 14:58:05 UTC
(In reply to Petr Horáček from comment #16)
> Gris, thanks a lot for digging into this. This is very helpful.
> 
> @ellorent would you please follow-up with a bug on RHCOS, asking
> for the config-server RPM, explaining the motivation that Gris described
> above?

let's see how it rolls https://github.com/openshift/os/pull/705

Comment 19 Quique Llorente 2022-02-08 15:19:46 UTC
Looks like we have to ask for it first at fcos, the process is different now https://github.com/coreos/fedora-coreos-tracker/issues/1094.

Comment 20 Quique Llorente 2022-02-08 15:30:25 UTC
Looks like we are going to do this with Ignition/MachineConfig https://github.com/openshift/os/pull/705#issuecomment-1032714634

Comment 21 Quique Llorente 2022-02-08 15:50:41 UTC
Let's try to add MachineConfig to the kubernetes-nmstate operator https://github.com/coreos/fedora-coreos-tracker/issues/1094#issuecomment-1032758020

Comment 23 Quique Llorente 2022-02-09 12:21:03 UTC
Looks like multipath is going to land at nmstate soon, https://github.com/nmstate/nmstate/pull/1800 maybe we have to wait for it instead of configuring NetworkManager

What do you think @

Comment 24 Quique Llorente 2022-02-09 12:21:21 UTC
Looks like multipath is going to land at nmstate soon, https://github.com/nmstate/nmstate/pull/1800 maybe we have to wait for it instead of configuring NetworkManager

What do you think @gris

Comment 25 Thomas Haller 2022-02-09 15:52:25 UTC
I agree with Gris on the issue. The NetworkManager part is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1837254

Yes, you can (and maybe should) configure `[main].no-auto-default=*` (NetworkManager-config-server package). That might avoid the conflict in this case, as NetworkManager then possibly does not activate the ethernet device (which ends up getting a conflicting IPv6 route). But it's not a fix.

Comment 26 Quique Llorente 2022-02-10 09:19:55 UTC
@fge we need a backport of fixes at https://errata.devel.redhat.com/advisory/86674 for 8.4.0.z

Comment 28 Petr Horáček 2022-02-10 10:21:25 UTC
Since we already have https://bugzilla.redhat.com/show_bug.cgi?id=1837254 tracking this issue, would it make sense to re-purpose this BZ for CNV? We are not done with the investigation yet, but currently my understanding is following:

* This is not a regression. The issue is present in production today. We have hit it only now because of our existing CI was not testing bridge created on the default NIC
* This is not related to nmpolicy

I would suggest we move this to CNV, target it to 4.11 and document as a known issue.

What do you think @rnetser

Comment 29 Gris Ge 2022-02-10 13:15:25 UTC
Hi Petr,

Nmstate need this bug for 8.4.0 zstream approval review and 8.6 efforts.
Could you clone this bug to CNV?

Comment 30 Gris Ge 2022-02-10 13:30:26 UTC
Hi Mingyu,

For zstream review, please do pre-test for these scratch build:

 * RHEL 8.4: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=42949638
 * RHEL 8.5: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=42949845

Comment 32 Mingyu Shi 2022-02-10 14:06:58 UTC
Sorry for cancelling the needinfo request for rnester by mistake

@rnester Please see #comment28

Comment 33 Petr Horáček 2022-02-10 14:45:52 UTC
Thanks Mingyu. We met with Ruth and discussed this BZ offline. We are now trying to come up with a workaround that would allow us to continue with our feature at least on a tech preview level.

Comment 34 Mingyu Shi 2022-02-11 15:44:31 UTC
(In reply to Gris Ge from comment #30)
> Hi Mingyu,
> 
> For zstream review, please do pre-test for these scratch build:
> 
>  * RHEL 8.4:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=42949638
>  * RHEL 8.5:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=42949845

These two build works well.

But currently I failed to reproduce the problem with
(RHEL 8.4)
nmstate-1.0.2-16.el8_4.noarch
nispor-1.0.1-5.el8_4.x86_64
NetworkManager-1.30.0-13.el8_4.x86_64
or
(RHEL 8.5)
nmstate-1.1.0-5.el8_5.noarch
nispor-1.1.1-2.el8_5.x86_64
NetworkManager-1.32.10-4.el8.x86_64

Though I was taking the same step and using the same type of NIC in https://bugzilla.redhat.com/show_bug.cgi?id=2048988#c31

Comment 35 Gris Ge 2022-02-14 05:20:59 UTC
Created attachment 1860913 [details]
Reproducer script

New reproducer script. Does not matter whether you has `NetworkManager-config-server` installed or not.

Comment 38 Gris Ge 2022-02-14 06:13:37 UTC
Created attachment 1860916 [details]
NetworkManager trace log

Trace log of NetworkManager in case NM dev would like to investigate more.

Comment 40 Gris Ge 2022-02-14 10:05:59 UTC
Hi Petr,

Could your team test on 8.4.0.z official on nmstate-1.0.2-17.el8_4 ?

Thank you!

Comment 43 Thomas Haller 2022-02-14 19:07:04 UTC
(In reply to Gris Ge from comment #38)
> Created attachment 1860916 [details]
> NetworkManager trace log
> 
> Trace log of NetworkManager in case NM dev would like to investigate more.

yes, the problem is bug 1837254.

Thank for the log.

Comment 44 Petr Horáček 2022-02-15 16:59:56 UTC
Hey Gris, we will be soon checking the nmstate-1.0.2-18.el8.noarch.rpm, I suppose that would be enough? Thanks a lot for helping with this.

Comment 45 Mingyu Shi 2022-02-24 08:41:43 UTC
Verified with versions:
nmstate-1.2.1-1.el8.x86_64
nispor-1.2.3-1.el8.x86_64
NetworkManager-1.36.0-0.8.el8.x86_64

Comment 47 errata-xmlrpc 2022-05-10 13:34:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (nmstate bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:1772


Note You need to log in before you can comment on or make changes to this bug.