Description of problem: Trying to create a bridge for VMs on OCP 4.5 as detailed here (https://docs.openshift.com/container-platform/4.5/virt/node_network/virt-updating-node-network-config.html#virt-creating-interface-on-nodes_virt-updating-node-network-config), the NNCP creation fails with the following error: "rolling back desired state configuration: failed runnig probes after network changes: failed to retrieve default gw at runProbes: timed out waiting for the condition". Digging into the code (https://github.com/nmstate/kubernetes-nmstate/blob/master/pkg/probe/probes.go#L98) I've seen that the failed check is the one on the default route (this one: Get("routes.running.#(destination==\"0.0.0.0/0\").next-hop-address").String()"). I actually see from both nmstate and IP route that the default route on the node gets lost when applying the NNCP, which is why the probe fails. The nodes loses connectivity due to that until the roll-back is applied. Version-Release number of selected component (if applicable): CNV 2.4 How reproducible: Follow docs above with the following bridge definition: apiVersion: nmstate.io/v1alpha1 kind: NodeNetworkConfigurationPolicy metadata: name: br1-eno1np0 spec: nodeSelector: kubernetes.io/hostname: worker-0.ocp4rm.poste.exp desiredState: interfaces: - name: br1 description: Linux bridge with eno1np0 as a port type: linux-bridge state: up ipv4: dhcp: true enabled: true bridge: options: stp: enabled: false port: - name: eno1np0 Steps to Reproduce: 1. 2. 3. Actual results: Bridge not created, node lost connectivity Expected results: Bridge created Additional info: The node networking is configured via kernel params at boot, it's a UPI cluster on bare metal. The issue is solved with the following workaround, i.e. adding the definition of the default route explicitely in the NNCP object: routes: config: - destination: 0.0.0.0/0 next-hop-address: 10.77.3.193 next-hop-interface: bond0
*** Bug 1879459 has been marked as a duplicate of this bug. ***
*** Bug 1879441 has been marked as a duplicate of this bug. ***
@jhou Can you specify the the kernel params passed to configure networking and also attach the NodeNetworkState before applyting the NNCP ?
Created attachment 1741582 [details] using nmtui config static IP for CoreOS installing
Created attachment 1741583 [details] using nmtui config static IP for CoreOS installing
Created attachment 1741584 [details] using nmtui config static IP for CoreOS installing
The PR that introduces the new --copy-network option to coreos-installer https://github.com/coreos/coreos-installer/pull/212, it contains the info about what it's done with it.
The suspicious part of this `default_connection.nmconnection`: it does not define the interface name or mac address. With that, if the OS only has `default_connection.nmconnection` on bootup, all its NIC will use the same IP address/DNS/Routes. To continue this debug, please provide the output of `ip link && nmcli`
Created attachment 1742030 [details] the output of nmcli & ip link the output of nmcli & ip link
I have attache output of nmcli & ip link
Hi Kevin, From the screenshot, there is `ens192` NIC available. But the bug report are creating a bridge using `eno1np0` which seems does not exists. And bug reporter also mentioned `bond0`. Could you enlighten me on? 1. What's the initial network state after OS boot up? * list of NICs. * Which NIC is holding IP/gateway/DNS using this `default_connection.nmconnection.` 2. What's the desire state would you want to achieve through CNV? And what's the spec passing to CNV?
Hi Quique Llorente, Even I cannot reproduced this original problem. But this `multi-connect=3`[1] has triggered a lot problems in my VM. Before I continue my fixes in nmstate on this `multi-connect`, please kindly collect answers for: * Why CNV try to set NM profile without interface name or MAC defined? * Is this `multi-connect=3` intentional and by mistake? * What's the CNV use case for this `multi-connect=3` if that's intentional? [1]: It means NM can use single profiles to activate multiple interface. When that profile has no interface name or MAC address defined, it means NM will apply the same configure to all interface it found.
Sadly, I have to postpone this to 2.7. It is not a blocker / there is a workaround.
*** Bug 1833358 has been marked as a duplicate of this bug. ***
Moving to NEW. A potential fix has been merged on nmstate U/S. Once it is available in RHEL, we could try to verify that this issue is indeed gone.
I'm merging this BZ with #1901859. Both seem to be facing the same issue. We should continue the discussion there. *** This bug has been marked as a duplicate of bug 1901859 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days