Description of problem: Applying nmpolicy capture teardown nncp may lead to a race condition in which the first node's nnce fails to apply Version-Release number of selected component (if applicable): CNV 4.10.0 nmstate-handler v4.10.0-48 How reproducible: Depends on how fast nnce is applied Steps to Reproduce: 1. Create capture nncp (with node selector; tested with 2 nodes) 2. Wait for it to be Available 3. Delete nncp (wait for it to be deleted) 4. Create teardown nncp Actual results: Sometimes the first node's nnce fails on: message: | failure generating desiredState and capturedStates failed to generate state, err failed to resolve capture expression, err resolve error resolve error step 'interfaces' from path '[interfaces]' not found at map state 'map[]' | capture.capture-br1-routes | routes.running.next-hop-interface := capture.capture-br1.interfaces.0.bridge.port.0.name If we wait before the creation of the teardown nncp, everything works as expected. Expected results: nncp should be applied successfully. Additional info: $ oc get nnce NAME STATUS c01-rn-410-21-sktq8-worker-0-dkddz.capture-br1-teardown Failing c01-rn-410-21-sktq8-worker-0-llxx8.capture-br1-teardown Available $ oc get nncp NAME STATUS capture-br1-teardown Degraded ================ Capture nncp ==================== apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: capture-br1-deployment spec: capture: default-gw: routes.running.destination=="0.0.0.0/0" default-gw-routes-takeover: capture.primary-nic-routes | routes.running.next-hop-interface := "capture-br1" primary-nic: interfaces.name==capture.default-gw.routes.running.0.next-hop-interface primary-nic-routes: routes.running.next-hop-interface==capture.primary-nic.interfaces.0.name desiredState: interfaces: - bridge: options: stp: enabled: false port: - name: '{{ capture.primary-nic.interfaces.0.name }}' ipv4: '{{ capture.primary-nic.interfaces.0.ipv4 }}' ipv6: '{{ capture.primary-nic.interfaces.0.ipv6 }}' name: capture-br1 state: up type: linux-bridge routes: config: '{{ capture.default-gw-routes-takeover.routes.running }}' nodeSelector: capture: allow =============== Teardown nncp ===================== apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: capture-br1-teardown spec: capture: capture-br1: interfaces.name == "capture-br1" capture-br1-routes: routes.running.next-hop-interface == "capture-br1" capture-br1-routes-takeover: capture.capture-br1-routes | routes.running.next-hop-interface := capture.capture-br1.interfaces.0.bridge.port.0.name desiredState: interfaces: - bridge: options: stp: enabled: false port: [] ipv4: auto-dns: true dhcp: false enabled: false ipv6: auto-dns: true autoconf: false dhcp: false enabled: false name: capture-br1 state: absent type: linux-bridge - ipv4: '{{ capture.capture-br1.interfaces.0.ipv4 }}' ipv6: '{{ capture.capture-br1.interfaces.0.ipv6 }}' name: '{{ capture.capture-br1.interfaces.0.bridge.port.0.name }}' state: up type: ethernet routes: config: '{{ capture.capture-br1-routes-takeover.routes.running }}' nodeSelector: capture: allow =============== failed nnce ===================== $ oc get nnce c01-rn-410-21-sktq8-worker-0-dkddz.capture-br1-teardown -oyaml apiVersion: nmstate.io/v1beta1 kind: NodeNetworkConfigurationEnactment metadata: creationTimestamp: "2022-02-23T17:39:06Z" generation: 1 labels: app.kubernetes.io/component: network app.kubernetes.io/managed-by: cnao-operator app.kubernetes.io/part-of: hyperconverged-cluster app.kubernetes.io/version: 4.10.0 nmstate.io/node: c01-rn-410-21-sktq8-worker-0-dkddz nmstate.io/policy: capture-br1-teardown name: c01-rn-410-21-sktq8-worker-0-dkddz.capture-br1-teardown ownerReferences: - apiVersion: v1 kind: Node name: c01-rn-410-21-sktq8-worker-0-dkddz uid: f5a9ead1-7732-491a-a552-f96f1851127b resourceVersion: "6124938" uid: 7f7ce686-03b1-4b8c-8a66-d2d430571821 status: conditions: - lastHearbeatTime: "2022-02-23T17:39:06Z" lastTransitionTime: "2022-02-23T17:39:06Z" message: | failure generating desiredState and capturedStates failed to generate state, err failed to resolve capture expression, err resolve error resolve error step 'interfaces' from path '[interfaces]' not found at map state 'map[]' | capture.capture-br1-routes | routes.running.next-hop-interface := capture.capture-br1.interfaces.0.bridge.port.0.name | ...............................................................^ messageEncoded: H4sIAAAAAAAA/6ROu24CMRDs+Yrp3OQs0iLlK1IiIi147rDEra31XkRxHx8hcIiUdHHj3XntjJIvixETlSaedUJiy8b07uKEaMJJqi8daTvcPEzw0l1Eu1EvoNlP2tjK5ZM9ALxWY2u56EPaBTQrv9fmrAhZnTbKiS1gtDKjip8R9k/8EKDFMZZFE8QxS70XQpil7g9hs/YK8fEPR3sdrCzOhhX3IdqimnWKyqsP51KH7xPYvf2VEJ8d4jYeLaeJsRbzuI0qMzcr4v/ex1cAAAD//4X6e16gAQAA reason: FailedToConfigure status: "True" type: Failing - lastHearbeatTime: "2022-02-23T17:39:06Z" lastTransitionTime: "2022-02-23T17:39:06Z" reason: FailedToConfigure status: "False" type: Available - lastHearbeatTime: "2022-02-23T17:39:06Z" lastTransitionTime: "2022-02-23T17:39:06Z" reason: FailedToConfigure status: "False" type: Progressing - lastHearbeatTime: "2022-02-23T17:39:06Z" lastTransitionTime: "2022-02-23T17:39:06Z" reason: FailedToConfigure status: "False" type: Pending - lastHearbeatTime: "2022-02-23T17:39:06Z" lastTransitionTime: "2022-02-23T17:39:06Z" reason: SuccessfullyConfigured status: "False" type: Aborted desiredStateMetaInfo: {} policyGeneration: 1 =============== nncp ===================== $ oc get nncp -oyaml apiVersion: v1 items: - apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: annotations: nmstate.io/webhook-mutating-timestamp: "1645637946522944281" creationTimestamp: "2022-02-23T17:39:06Z" generation: 1 name: capture-br1-teardown resourceVersion: "6125381" uid: 11628dd4-68fe-4df3-aebd-da833df76650 spec: capture: capture-br1: interfaces.name == "capture-br1" capture-br1-routes: routes.running.next-hop-interface == "capture-br1" capture-br1-routes-takeover: capture.capture-br1-routes | routes.running.next-hop-interface := capture.capture-br1.interfaces.0.bridge.port.0.name desiredState: interfaces: - bridge: options: stp: enabled: false port: [] ipv4: auto-dns: true dhcp: false enabled: false ipv6: auto-dns: true autoconf: false dhcp: false enabled: false name: capture-br1 state: absent type: linux-bridge - ipv4: '{{ capture.capture-br1.interfaces.0.ipv4 }}' ipv6: '{{ capture.capture-br1.interfaces.0.ipv6 }}' name: '{{ capture.capture-br1.interfaces.0.bridge.port.0.name }}' state: up type: ethernet routes: config: '{{ capture.capture-br1-routes-takeover.routes.running }}' nodeSelector: capture: allow status: conditions: - lastHearbeatTime: "2022-02-23T17:39:18Z" lastTransitionTime: "2022-02-23T17:39:06Z" reason: FailedToConfigure status: "False" type: Available - lastHearbeatTime: "2022-02-23T17:39:18Z" lastTransitionTime: "2022-02-23T17:39:06Z" message: 1/2 nodes failed to configure reason: FailedToConfigure status: "True" type: Degraded lastUnavailableNodeCountUpdate: "2022-02-23T17:39:18Z" kind: List metadata: resourceVersion: "" selfLink: ""
We have calculate capture bypassing NNS and calling directly nmstatectl show so we get accurate date https://github.com/nmstate/kubernetes-nmstate/blob/09067b48a7814f384b2c20fa7a62ada3f5cd3ccf/controllers/handler/nodenetworkconfigurationpolicy_controller.go#L170 I will prepare a fix ASAP
The u/s fix https://github.com/nmstate/kubernetes-nmstate/pull/998
Verified. kubernetes-nmstate-handler v4.10.1-2 OCP Version 4.10.6. Deployed described attached capture+teardown nncps, and condition didn't occur. Nmpolciy gets config using nmstatectl
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.10.1 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4668