Bug 1879458 - Bridge creation fails on CNV due to def route lost
Summary: Bridge creation fails on CNV due to def route lost
Keywords:
Status: CLOSED DUPLICATE of bug 1901859
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 2.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.8.6
Assignee: Petr Horáček
QA Contact: Meni Yakove
URL:
Whiteboard:
: 1833358 1879459 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-16 10:45 UTC by Giuseppe Cofano
Modified: 2025-12-26 12:20 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-11 13:20:23 UTC
Target Upstream Version:
Embargoed:
jhou: needinfo-


Attachments (Terms of Use)
using nmtui config static IP for CoreOS installing (236.10 KB, image/png)
2020-12-23 15:45 UTC, kevin
no flags Details
using nmtui config static IP for CoreOS installing (159.82 KB, image/png)
2020-12-23 15:46 UTC, kevin
no flags Details
using nmtui config static IP for CoreOS installing (77.29 KB, image/jpeg)
2020-12-23 15:46 UTC, kevin
no flags Details
the output of nmcli & ip link (184.42 KB, image/jpeg)
2020-12-26 10:36 UTC, kevin
no flags Details

Description Giuseppe Cofano 2020-09-16 10:45:03 UTC
Description of problem:

Trying to create a bridge for VMs on OCP 4.5 as detailed here (https://docs.openshift.com/container-platform/4.5/virt/node_network/virt-updating-node-network-config.html#virt-creating-interface-on-nodes_virt-updating-node-network-config), the NNCP creation fails with the following error: "rolling back desired state configuration: failed runnig probes after network
      changes: failed to retrieve default gw at runProbes: timed out waiting for the
      condition". 
Digging into the code (https://github.com/nmstate/kubernetes-nmstate/blob/master/pkg/probe/probes.go#L98) I've seen that the failed check is the one on the default route (this one: Get("routes.running.#(destination==\"0.0.0.0/0\").next-hop-address").String()"). I actually see from both nmstate and IP route that the default route on the node gets lost when applying the NNCP, which is why the probe fails. The nodes loses connectivity due to that until the roll-back is applied.


Version-Release number of selected component (if applicable):
CNV 2.4

How reproducible:
Follow docs above with the following bridge definition:

apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br1-eno1np0
spec:
  nodeSelector:
    kubernetes.io/hostname: worker-0.ocp4rm.poste.exp
  desiredState:
    interfaces:
      - name: br1
        description: Linux bridge with eno1np0 as a port
        type: linux-bridge
        state: up
        ipv4:
          dhcp: true
          enabled: true
        bridge:
          options:
            stp:
              enabled: false
          port:
            - name: eno1np0


Steps to Reproduce:
1.
2.
3.

Actual results:
Bridge not created, node lost connectivity

Expected results:
Bridge created

Additional info:
The node networking is configured via kernel params at boot, it's a UPI cluster on bare metal.
The issue is solved with the following workaround, i.e. adding the definition of the default route explicitely in the NNCP object:

    routes:
      config:
      - destination: 0.0.0.0/0
        next-hop-address: 10.77.3.193
        next-hop-interface: bond0

Comment 1 Sudha Ponnaganti 2020-09-18 01:17:19 UTC
*** Bug 1879459 has been marked as a duplicate of this bug. ***

Comment 2 Sudha Ponnaganti 2020-09-18 01:18:29 UTC
*** Bug 1879441 has been marked as a duplicate of this bug. ***

Comment 3 Quique Llorente 2020-12-21 07:01:26 UTC
@jhou  Can you specify the the kernel params passed to configure networking and also attach the NodeNetworkState before applyting the NNCP ?

Comment 5 kevin 2020-12-23 15:45:00 UTC
Created attachment 1741582 [details]
using nmtui config static IP for CoreOS installing

Comment 6 kevin 2020-12-23 15:46:03 UTC
Created attachment 1741583 [details]
using nmtui config static IP for CoreOS installing

Comment 7 kevin 2020-12-23 15:46:27 UTC
Created attachment 1741584 [details]
using nmtui config static IP for CoreOS installing

Comment 8 Quique Llorente 2020-12-24 07:46:00 UTC
The PR that introduces the new --copy-network option to coreos-installer https://github.com/coreos/coreos-installer/pull/212, it contains the info about what 
it's done with it.

Comment 9 Gris Ge 2020-12-25 12:09:48 UTC
The suspicious part of this `default_connection.nmconnection`: it does not define the interface name or mac address.
With that, if the OS only has `default_connection.nmconnection` on bootup, all its NIC will use the same IP address/DNS/Routes.

To continue this debug, please provide the output of `ip link && nmcli`

Comment 10 kevin 2020-12-26 10:36:38 UTC
Created attachment 1742030 [details]
the output of nmcli & ip link

the output of nmcli & ip link

Comment 11 kevin 2020-12-26 10:37:48 UTC
I have attache output of nmcli & ip link

Comment 12 Gris Ge 2020-12-27 00:45:41 UTC
Hi Kevin,

From the screenshot, there is `ens192` NIC available. But the bug report are creating a bridge using `eno1np0` which seems does not exists.
And bug reporter also mentioned `bond0`.

Could you enlighten me on?

1. What's the initial network state after OS boot up?
  * list of NICs.
  * Which NIC is holding IP/gateway/DNS using this `default_connection.nmconnection.`

2. What's the desire state would you want to achieve through CNV? And what's the spec passing to CNV?

Comment 13 Gris Ge 2020-12-28 06:48:18 UTC
Hi Quique Llorente,

Even I cannot reproduced this original problem. But this `multi-connect=3`[1] has triggered a lot problems in my VM.

Before I continue my fixes in nmstate on this `multi-connect`, please kindly collect answers for:

 * Why CNV try to set NM profile without interface name or MAC defined?

 * Is this `multi-connect=3` intentional and by mistake?

 * What's the CNV use case for this `multi-connect=3` if that's intentional?



[1]: It means NM can use single profiles to activate multiple interface. When that profile has no interface name or MAC address defined, it means NM will apply the same configure to all interface it found.

Comment 14 Petr Horáček 2021-01-06 10:54:42 UTC
Sadly, I have to postpone this to 2.7. It is not a blocker / there is a workaround.

Comment 15 Petr Horáček 2021-01-06 11:05:56 UTC
*** Bug 1833358 has been marked as a duplicate of this bug. ***

Comment 16 Petr Horáček 2021-01-25 10:15:07 UTC
Moving to NEW. A potential fix has been merged on nmstate U/S. Once it is available in RHEL, we could try to verify that this issue is indeed gone.

Comment 19 Petr Horáček 2021-02-11 13:20:23 UTC
I'm merging this BZ with #1901859. Both seem to be facing the same issue. We should continue the discussion there.

*** This bug has been marked as a duplicate of bug 1901859 ***

Comment 20 Red Hat Bugzilla 2023-09-18 00:22:26 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.