Bug 1879459

Summary: Bridge creation fails on CNV due to def route lost
Product: OpenShift Container Platform Reporter: Giuseppe Cofano <gcofano>
Component: UnknownAssignee: Sudha Ponnaganti <sponnaga>
Status: CLOSED DUPLICATE QA Contact: Jianwei Hou <jhou>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.5CC: aos-bugs, eparis, jokerman
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-18 01:17:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Giuseppe Cofano 2020-09-16 10:45:16 UTC
Description of problem:

Trying to create a bridge for VMs on OCP 4.5 as detailed here (https://docs.openshift.com/container-platform/4.5/virt/node_network/virt-updating-node-network-config.html#virt-creating-interface-on-nodes_virt-updating-node-network-config), the NNCP creation fails with the following error: "rolling back desired state configuration: failed runnig probes after network
      changes: failed to retrieve default gw at runProbes: timed out waiting for the
      condition". 
Digging into the code (https://github.com/nmstate/kubernetes-nmstate/blob/master/pkg/probe/probes.go#L98) I've seen that the failed check is the one on the default route (this one: Get("routes.running.#(destination==\"0.0.0.0/0\").next-hop-address").String()"). I actually see from both nmstate and IP route that the default route on the node gets lost when applying the NNCP, which is why the probe fails. The nodes loses connectivity due to that until the roll-back is applied.


Version-Release number of selected component (if applicable):
CNV 2.4

How reproducible:
Follow docs above with the following bridge definition:

apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br1-eno1np0
spec:
  nodeSelector:
    kubernetes.io/hostname: worker-0.ocp4rm.poste.exp
  desiredState:
    interfaces:
      - name: br1
        description: Linux bridge with eno1np0 as a port
        type: linux-bridge
        state: up
        ipv4:
          dhcp: true
          enabled: true
        bridge:
          options:
            stp:
              enabled: false
          port:
            - name: eno1np0


Steps to Reproduce:
1.
2.
3.

Actual results:
Bridge not created, node lost connectivity

Expected results:
Bridge created

Additional info:
The node networking is configured via kernel params at boot, it's a UPI cluster on bare metal.
The issue is solved with the following workaround, i.e. adding the definition of the default route explicitely in the NNCP object:

    routes:
      config:
      - destination: 0.0.0.0/0
        next-hop-address: 10.77.3.193
        next-hop-interface: bond0

Comment 1 Sudha Ponnaganti 2020-09-18 01:17:19 UTC

*** This bug has been marked as a duplicate of bug 1879458 ***