Bug 2033252 - nncp changing it's status between "ConfigurationProgressing" to "SuccessfullyConfigured" every few minutes
Summary: nncp changing it's status between "ConfigurationProgressing" to "Successfully...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 4.8.3
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: 4.10.0
Assignee: Radim Hrazdil
QA Contact: Meni Yakove
URL:
Whiteboard:
: 2037240 (view as bug list)
Depends On:
Blocks: 2042847
TreeView+ depends on / blocked
 
Reported: 2021-12-16 11:08 UTC by nijin ashok
Modified: 2025-04-04 13:53 UTC (History)
6 users (show)

Fixed In Version: kubernetes-nmstate-handler v4.10.0-47
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2042847 (view as bug list)
Environment:
Last Closed: 2022-03-16 16:05:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github nmstate kubernetes-nmstate pull 957 0 None Merged Fix periodic re-reconciles of NNCPs with name equal to node 2022-01-19 17:13:34 UTC
Red Hat Product Errata RHSA-2022:0947 0 None Closed RHEL EUS Errata Documentation 2022-06-16 06:26:55 UTC

Description nijin ashok 2021-12-16 11:08:52 UTC
Description of problem:

The nncp status is changing between  "ConfigurationProgressing" to "SuccessfullyConfigured" every few minutes without any change in the host labels or any change in the nncp configuration. The reconciliation is very frequent as below.

~~~
2021-12-14T14:48:42.275132493Z {"level":"info","ts":1639493322.275068,"logger":"enactmentconditions","msg":"Reset","enactment":"node-2.node-1"}

omg logs nmstate-handler-nkgwj| grep Reset|grep 2021-12-14T14|wc -l
113
~~~

If we took one event:

~~~
2021-12-07T18:11:28.818263447Z {"level":"info","ts":1638900688.8182015,"logger":"enactmentconditions","msg":"Reset","enactment":"node-2.node-2"}

2021-12-07T18:11:29.824451211Z {"level":"info","ts":1638900689.824379,"logger":"enactmentstatus","msg":"enactment updated at the node: true","enactment":"node-2.node-2"}

2021-12-07T18:11:29.833217028Z {"level":"info","ts":1638900689.8332036,"logger":"enactmentconditions","msg":"NotifyMatching","enactment":"node-2.node-2"}

2021-12-07T18:11:30.859917713Z {"level":"info","ts":1638900690.859867,"logger":"enactmentconditions","msg":"NotifyProgressing","enactment":"node-2.node-2"}
2021-12-07T18:11:42.771612096Z {"level":"info","ts":1638900702.771587,"logger":"enactmentconditions","msg":"NotifySuccess","enactment":"node-2.node-2"}

2021-12-07T18:11:43.839706501Z {"level":"info","ts":1638900703.839669,"logger":"policyconditions","msg":"enactments count: {failed: {true: 0, false: 3, unknown: 0}, progressing: {true: 0, false: 3, unknown: 0}, available: {true: 1, false
: 2, unknown: 0}, matching: {true: 1, false: 2, unknown: 0}, aborted: {true: 0, false: 3, unknown: 0}}","policy":"node-2"}
2021-12-07T18:11:43.839719857Z {"level":"info","ts":1638900703.8397038,"logger":"policyconditions","msg":"SetPolicySuccess"}
~~~

There were no manual changes in the nncp object or the host labels that can trigger the change.

Also, we watched the node manifests for changes (get nodes -o yaml -w) and the labels remain unchanged and the only change was heartbeat time in the status and annotation field.


Version-Release number of selected component (if applicable):

v4.8.3

How reproducible:

Observed in a customer environment. The issue is persistent even after a whole reboot of the nodes.

Steps to Reproduce:

Unknown

Actual results:

nncp changing its status between "ConfigurationProgressing" to "SuccessfullyConfigured" every few minutes

Expected results:

nncp should not go into reconciliation loop if there are no changes in the watched objects.

Additional info:

Comment 8 Radim Hrazdil 2022-01-04 13:22:02 UTC
After some debugging, I suspect that there may be a bug in k8s controller-runtime. I've opened an issue on controller-runtime GH: https://github.com/kubernetes-sigs/controller-runtime/issues/1764
Since the reconcile trigger seems to bypass our filters, I don't see a way to workaround this issue on our side, as we can't tell, what is the origin of the reconcile request.

Comment 9 Ben Nemec 2022-01-12 16:38:24 UTC
*** Bug 2037240 has been marked as a duplicate of this bug. ***

Comment 14 Ruth Netser 2022-02-08 09:54:05 UTC
Failed QE with nmstate-handler version is: v4.10.0-45

nmstate-handler version is: v4.10.0-45

Comment 15 Ruth Netser 2022-02-08 09:54:46 UTC
$ oc get nnce -w
NAME                                                                  STATUS
c01-rn-410-7-wnjbz-master-0.c01-rn-410-7-wnjbz-worker-0-5ctss         Available
c01-rn-410-7-wnjbz-master-1.c01-rn-410-7-wnjbz-worker-0-5ctss         Available
c01-rn-410-7-wnjbz-master-2.c01-rn-410-7-wnjbz-worker-0-5ctss         Available
c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss   Available
c01-rn-410-7-wnjbz-worker-0-dsjq2.c01-rn-410-7-wnjbz-worker-0-5ctss   Available
c01-rn-410-7-wnjbz-worker-0-jp8t7.c01-rn-410-7-wnjbz-worker-0-5ctss   Available
c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss   
c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss   
c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss   Progressing
c01-rn-410-7-wnjbz-worker-0-5ctss.c01-rn-410-7-wnjbz-worker-0-5ctss   Available

Comment 16 Petr Horáček 2022-02-08 10:02:16 UTC
My bad, I moved it ON_QA preemptively. The patch did not get from M/S to D/S due to a CI failure. It should be resolved now. I will move this back ON_QA once the new build appears in errata.

Comment 18 Ruth Netser 2022-02-10 16:39:34 UTC
Verified with nmstate-handler version is: v4.10.0-47 using:

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: <node name>
spec:
  desiredState:
    interfaces:
    - bridge:
        options:
          stp:
            enabled: false
        port:
        - name: ens9
      ipv4:
        auto-dns: true
        dhcp: false
        enabled: false
      ipv6:
        auto-dns: true
        autoconf: false
        dhcp: false
        enabled: false
      name: br1test
      state: up
      type: linux-bridge
  nodeSelector:
    kubernetes.io/hostname: <node name>

Comment 22 errata-xmlrpc 2022-03-16 16:05:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947


Note You need to log in before you can comment on or make changes to this bug.