Description of problem: The nncp status is changing between "ConfigurationProgressing" to "SuccessfullyConfigured" every few minutes without any change in the host labels or any change in the nncp configuration. The reconciliation is very frequent as below. ~~~ 2021-12-14T14:48:42.275132493Z {"level":"info","ts":1639493322.275068,"logger":"enactmentconditions","msg":"Reset","enactment":"node-2.node-1"} omg logs nmstate-handler-nkgwj| grep Reset|grep 2021-12-14T14|wc -l 113 ~~~ If we took one event: ~~~ 2021-12-07T18:11:28.818263447Z {"level":"info","ts":1638900688.8182015,"logger":"enactmentconditions","msg":"Reset","enactment":"node-2.node-2"} 2021-12-07T18:11:29.824451211Z {"level":"info","ts":1638900689.824379,"logger":"enactmentstatus","msg":"enactment updated at the node: true","enactment":"node-2.node-2"} 2021-12-07T18:11:29.833217028Z {"level":"info","ts":1638900689.8332036,"logger":"enactmentconditions","msg":"NotifyMatching","enactment":"node-2.node-2"} 2021-12-07T18:11:30.859917713Z {"level":"info","ts":1638900690.859867,"logger":"enactmentconditions","msg":"NotifyProgressing","enactment":"node-2.node-2"} 2021-12-07T18:11:42.771612096Z {"level":"info","ts":1638900702.771587,"logger":"enactmentconditions","msg":"NotifySuccess","enactment":"node-2.node-2"} 2021-12-07T18:11:43.839706501Z {"level":"info","ts":1638900703.839669,"logger":"policyconditions","msg":"enactments count: {failed: {true: 0, false: 3, unknown: 0}, progressing: {true: 0, false: 3, unknown: 0}, available: {true: 1, false : 2, unknown: 0}, matching: {true: 1, false: 2, unknown: 0}, aborted: {true: 0, false: 3, unknown: 0}}","policy":"node-2"} 2021-12-07T18:11:43.839719857Z {"level":"info","ts":1638900703.8397038,"logger":"policyconditions","msg":"SetPolicySuccess"} ~~~ There were no manual changes in the nncp object or the host labels that can trigger the change. Also, we watched the node manifests for changes (get nodes -o yaml -w) and the labels remain unchanged and the only change was heartbeat time in the status and annotation field. Version-Release number of selected component (if applicable): v4.8.3 How reproducible: Observed in a customer environment. The issue is persistent even after a whole reboot of the nodes. Steps to Reproduce: Unknown Actual results: nncp changing its status between "ConfigurationProgressing" to "SuccessfullyConfigured" every few minutes Expected results: nncp should not go into reconciliation loop if there are no changes in the watched objects. Additional info:
After some debugging, I suspect that there may be a bug in k8s controller-runtime. I've opened an issue on controller-runtime GH: https://github.com/kubernetes-sigs/controller-runtime/issues/1764 Since the reconcile trigger seems to bypass our filters, I don't see a way to workaround this issue on our side, as we can't tell, what is the origin of the reconcile request.
*** Bug 2037240 has been marked as a duplicate of this bug. ***
Failed QE with nmstate-handler version is: v4.10.0-45 nmstate-handler version is: v4.10.0-45
$ oc get nnce -w NAME STATUS c01-rn-410-7-wnjbz-master-0.c01-rn-410-7-wnjbz-worker-0-5ctss Available c01-rn-410-7-wnjbz-master-1.c01-rn-410-7-wnjbz-worker-0-5ctss Available c01-rn-410-7-wnjbz-master-2.c01-rn-410-7-wnjbz-worker-0-5ctss Available c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss Available c01-rn-410-7-wnjbz-worker-0-dsjq2.c01-rn-410-7-wnjbz-worker-0-5ctss Available c01-rn-410-7-wnjbz-worker-0-jp8t7.c01-rn-410-7-wnjbz-worker-0-5ctss Available c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss Progressing c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss Available
My bad, I moved it ON_QA preemptively. The patch did not get from M/S to D/S due to a CI failure. It should be resolved now. I will move this back ON_QA once the new build appears in errata.
Verified with nmstate-handler version is: v4.10.0-47 using: apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: <node name> spec: desiredState: interfaces: - bridge: options: stp: enabled: false port: - name: ens9 ipv4: auto-dns: true dhcp: false enabled: false ipv6: auto-dns: true autoconf: false dhcp: false enabled: false name: br1test state: up type: linux-bridge nodeSelector: kubernetes.io/hostname: <node name>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0947