Bug 2033252

Summary: nncp changing it's status between "ConfigurationProgressing" to "SuccessfullyConfigured" every few minutes
Product: Container Native Virtualization (CNV) Reporter: nijin ashok <nashok>
Component: NetworkingAssignee: Radim Hrazdil <rhrazdil>
Status: CLOSED ERRATA QA Contact: Meni Yakove <myakove>
Severity: high Docs Contact:
Priority: high    
Version: 4.8.3CC: cnv-qe-bugs, phoracek, rhrazdil, rnetser, sgott, shaselde
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kubernetes-nmstate-handler v4.10.0-47 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2042847 (view as bug list) Environment:
Last Closed: 2022-03-16 16:05:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2042847    

Description nijin ashok 2021-12-16 11:08:52 UTC
Description of problem:

The nncp status is changing between  "ConfigurationProgressing" to "SuccessfullyConfigured" every few minutes without any change in the host labels or any change in the nncp configuration. The reconciliation is very frequent as below.

~~~
2021-12-14T14:48:42.275132493Z {"level":"info","ts":1639493322.275068,"logger":"enactmentconditions","msg":"Reset","enactment":"node-2.node-1"}

omg logs nmstate-handler-nkgwj| grep Reset|grep 2021-12-14T14|wc -l
113
~~~

If we took one event:

~~~
2021-12-07T18:11:28.818263447Z {"level":"info","ts":1638900688.8182015,"logger":"enactmentconditions","msg":"Reset","enactment":"node-2.node-2"}

2021-12-07T18:11:29.824451211Z {"level":"info","ts":1638900689.824379,"logger":"enactmentstatus","msg":"enactment updated at the node: true","enactment":"node-2.node-2"}

2021-12-07T18:11:29.833217028Z {"level":"info","ts":1638900689.8332036,"logger":"enactmentconditions","msg":"NotifyMatching","enactment":"node-2.node-2"}

2021-12-07T18:11:30.859917713Z {"level":"info","ts":1638900690.859867,"logger":"enactmentconditions","msg":"NotifyProgressing","enactment":"node-2.node-2"}
2021-12-07T18:11:42.771612096Z {"level":"info","ts":1638900702.771587,"logger":"enactmentconditions","msg":"NotifySuccess","enactment":"node-2.node-2"}

2021-12-07T18:11:43.839706501Z {"level":"info","ts":1638900703.839669,"logger":"policyconditions","msg":"enactments count: {failed: {true: 0, false: 3, unknown: 0}, progressing: {true: 0, false: 3, unknown: 0}, available: {true: 1, false
: 2, unknown: 0}, matching: {true: 1, false: 2, unknown: 0}, aborted: {true: 0, false: 3, unknown: 0}}","policy":"node-2"}
2021-12-07T18:11:43.839719857Z {"level":"info","ts":1638900703.8397038,"logger":"policyconditions","msg":"SetPolicySuccess"}
~~~

There were no manual changes in the nncp object or the host labels that can trigger the change.

Also, we watched the node manifests for changes (get nodes -o yaml -w) and the labels remain unchanged and the only change was heartbeat time in the status and annotation field.


Version-Release number of selected component (if applicable):

v4.8.3

How reproducible:

Observed in a customer environment. The issue is persistent even after a whole reboot of the nodes.

Steps to Reproduce:

Unknown

Actual results:

nncp changing its status between "ConfigurationProgressing" to "SuccessfullyConfigured" every few minutes

Expected results:

nncp should not go into reconciliation loop if there are no changes in the watched objects.

Additional info:

Comment 8 Radim Hrazdil 2022-01-04 13:22:02 UTC
After some debugging, I suspect that there may be a bug in k8s controller-runtime. I've opened an issue on controller-runtime GH: https://github.com/kubernetes-sigs/controller-runtime/issues/1764
Since the reconcile trigger seems to bypass our filters, I don't see a way to workaround this issue on our side, as we can't tell, what is the origin of the reconcile request.

Comment 9 Ben Nemec 2022-01-12 16:38:24 UTC
*** Bug 2037240 has been marked as a duplicate of this bug. ***

Comment 14 Ruth Netser 2022-02-08 09:54:05 UTC
Failed QE with nmstate-handler version is: v4.10.0-45

nmstate-handler version is: v4.10.0-45

Comment 15 Ruth Netser 2022-02-08 09:54:46 UTC
$ oc get nnce -w
NAME                                                                  STATUS
c01-rn-410-7-wnjbz-master-0.c01-rn-410-7-wnjbz-worker-0-5ctss         Available
c01-rn-410-7-wnjbz-master-1.c01-rn-410-7-wnjbz-worker-0-5ctss         Available
c01-rn-410-7-wnjbz-master-2.c01-rn-410-7-wnjbz-worker-0-5ctss         Available
c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss   Available
c01-rn-410-7-wnjbz-worker-0-dsjq2.c01-rn-410-7-wnjbz-worker-0-5ctss   Available
c01-rn-410-7-wnjbz-worker-0-jp8t7.c01-rn-410-7-wnjbz-worker-0-5ctss   Available
c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss   
c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss   
c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss   Progressing
c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss   Available

Comment 16 Petr Horáček 2022-02-08 10:02:16 UTC
My bad, I moved it ON_QA preemptively. The patch did not get from M/S to D/S due to a CI failure. It should be resolved now. I will move this back ON_QA once the new build appears in errata.

Comment 18 Ruth Netser 2022-02-10 16:39:34 UTC
Verified with nmstate-handler version is: v4.10.0-47 using:

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: <node name>
spec:
  desiredState:
    interfaces:
    - bridge:
        options:
          stp:
            enabled: false
        port:
        - name: ens9
      ipv4:
        auto-dns: true
        dhcp: false
        enabled: false
      ipv6:
        auto-dns: true
        autoconf: false
        dhcp: false
        enabled: false
      name: br1test
      state: up
      type: linux-bridge
  nodeSelector:
    kubernetes.io/hostname: <node name>

Comment 22 errata-xmlrpc 2022-03-16 16:05:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947