2022742 – NNCP creation fails when node of a cluster is unavailable

Bug 2022742 - NNCP creation fails when node of a cluster is unavailable

Summary: NNCP creation fails when node of a cluster is unavailable

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.10.1
Assignee:	Radim Hrazdil
QA Contact:	Adi Zavalkovsky
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-11-12 13:43 UTC by Geetika Kapoor
Modified:	2022-05-18 20:28 UTC (History)
CC List:	3 users (show)
Fixed In Version:	kubernetes-nmstate-handler v4.10.1-2
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-05-18 20:26:54 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	nmstate kubernetes-nmstate pull 981	None	open	Don't involve notReady nodes in NNCP status calculation	2022-02-03 07:41:15 UTC
Github	nmstate kubernetes-nmstate pull 985	None	open	[release-0.64] Don't involve notReady nodes in status calculation	2022-02-08 09:29:02 UTC
Red Hat Product Errata	RHSA-2022:4668	None	None	None	2022-05-18 20:28:24 UTC

Description Geetika Kapoor 2021-11-12 13:43:07 UTC

Description of problem:

NNCP status is not getting calculated and is unknown when one of the node of a cluster is down or in restart phase. This is a very realistic use case as all the nodes of a cluster can't be up all times and in that phase if nncp is configured it is not getting configured. Here we have two cases :

1. Node is never up
2. Node gets up

Case 1 : if node is never up, oc get nncp will not show anything and status is not calculated.

$ oc get nncp
NAME STATUS
nncp-maxunavailable

Case 2: if node gets up, it will directly goto "Abort" status and nncp status moved to "Degraded".

Version-Release number of selected component (if applicable):

4.9

How reproducible:

always

Steps to Reproduce:
1. Make a node down (as a big cluster can't have all node up or maybe due to someissues it went down so it is very realistic use case.)
2. Now configure policy on the cluster.
3. You will see all available nodes have either abort, failing, available status but still if you check nncp status it is empty.
$ oc get nncp
NAME STATUS
nncp-maxunavailable
4. Now make that node up. It will directly goto "Abort" and nncp status moved to "Degraded".

Actual results:

nncp status is not getting calculated

Expected results:

result should be calculated based on current available nodes not on basis of total nodes

Additional info:

Concerns :
Point is if you want to make "degrade" directly without perfornming any operation on node there is no point in waiting to display status.
Maybe a user have no intention to make that node up so in that case nncp never have available status even if other nodes have a correct policy setup.

Comment 2 Petr Horáček 2022-02-03 12:20:40 UTC

Blockers only: Moving to 4.10.1

Comment 3 Adi Zavalkovsky 2022-04-04 12:18:31 UTC

Verified.
OCP Version 4.10.6
kubernetes-nmstate-handler v4.10.1-2

Applied following nncp:
apiVersion: nmstate.io/v1beta1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br-worker2
spec:
  nodeSelector:
    bz: 'yes'
  desiredState:
    interfaces:
      - name: br1
        type: linux-bridge
        state: up
        bridge:
          options:
            stp:
              enabled: false
          port:
            - name: ens4

On two nodes labeled bz: 'yes' -
n-adiz-410-8d2hr-worker-0-8th5h   NotReady   worker   7d2h   v1.23.5+b0357ed
n-adiz-410-8d2hr-worker-0-ff9wp   Ready      worker   7d2h   v1.23.5+b0357ed

[cnv-qe-jenkins@n-adiz-410-8d2hr-executor ~]$ oc get nnce
NAME                                         STATUS
n-adiz-410-8d2hr-worker-0-ff9wp.br-worker2   Available
[cnv-qe-jenkins@n-adiz-410-8d2hr-executor ~]$ oc get nncp 
NAME         STATUS
br-worker2   Available

Comment 9 errata-xmlrpc 2022-05-18 20:26:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.1 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4668

Note You need to log in before you can comment on or make changes to this bug.