Bug 2022742 - NNCP creation fails when node of a cluster is unavailable
Summary: NNCP creation fails when node of a cluster is unavailable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 4.9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.1
Assignee: Radim Hrazdil
QA Contact: Adi Zavalkovsky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-12 13:43 UTC by Geetika Kapoor
Modified: 2022-05-18 20:28 UTC (History)
3 users (show)

Fixed In Version: kubernetes-nmstate-handler v4.10.1-2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-18 20:26:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github nmstate kubernetes-nmstate pull 981 0 None open Don't involve notReady nodes in NNCP status calculation 2022-02-03 07:41:15 UTC
Github nmstate kubernetes-nmstate pull 985 0 None open [release-0.64] Don't involve notReady nodes in status calculation 2022-02-08 09:29:02 UTC
Red Hat Product Errata RHSA-2022:4668 0 None None None 2022-05-18 20:28:24 UTC

Description Geetika Kapoor 2021-11-12 13:43:07 UTC
Description of problem:

NNCP status is not getting calculated and is unknown when one of the node of a cluster is down or in restart phase. This is a very realistic use case as all the nodes of a cluster can't be up all times and in that phase if nncp is configured it is not getting configured. Here we have two cases :

1. Node is never up
2. Node gets up

Case 1 : if node is never up, oc get nncp will not show anything and status is not calculated.

$ oc get nncp
NAME                  STATUS
nncp-maxunavailable

Case 2: if node gets up, it will directly goto "Abort" status and nncp status moved to "Degraded".


Version-Release number of selected component (if applicable):

4.9

How reproducible:

always 

Steps to Reproduce:
1. Make a node down (as a big cluster can't have all node up or maybe due to someissues it went down so it is very realistic use case.)
2. Now configure policy on the cluster.
3. You will see all available nodes have either abort, failing, available status but still if you check nncp status it is empty.
$ oc get nncp
NAME                  STATUS
nncp-maxunavailable
4. Now make that node up. It will directly goto "Abort" and nncp status moved to "Degraded".

Actual results:

nncp status is not getting calculated 

Expected results:

result should be calculated based on current available nodes not on basis of total nodes

Additional info:

 Concerns :
Point is if you want to make "degrade" directly without perfornming any operation on node there is no point in waiting to display status.
Maybe a user have no intention to make that node up so in that case nncp never have available status even if other nodes have a correct policy setup.

Comment 2 Petr Horáček 2022-02-03 12:20:40 UTC
Blockers only: Moving to 4.10.1

Comment 3 Adi Zavalkovsky 2022-04-04 12:18:31 UTC
Verified.
OCP Version 4.10.6
kubernetes-nmstate-handler v4.10.1-2

Applied following nncp:
apiVersion: nmstate.io/v1beta1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br-worker2
spec:
  nodeSelector:
    bz: 'yes'
  desiredState:
    interfaces:
      - name: br1
        type: linux-bridge
        state: up
        bridge:
          options:
            stp:
              enabled: false
          port:
            - name: ens4

On two nodes labeled bz: 'yes' -
n-adiz-410-8d2hr-worker-0-8th5h   NotReady   worker   7d2h   v1.23.5+b0357ed
n-adiz-410-8d2hr-worker-0-ff9wp   Ready      worker   7d2h   v1.23.5+b0357ed

[cnv-qe-jenkins@n-adiz-410-8d2hr-executor ~]$ oc get nnce
NAME                                         STATUS
n-adiz-410-8d2hr-worker-0-ff9wp.br-worker2   Available
[cnv-qe-jenkins@n-adiz-410-8d2hr-executor ~]$ oc get nncp 
NAME         STATUS
br-worker2   Available

Comment 9 errata-xmlrpc 2022-05-18 20:26:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.1 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4668


Note You need to log in before you can comment on or make changes to this bug.