2030716 – installing CNV collides with existing nmstate operator

Bug 2030716 - installing CNV collides with existing nmstate operator

Summary: installing CNV collides with existing nmstate operator

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.8.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.8.6
Assignee:	Quique Llorente
QA Contact:	awax
Docs Contact:
URL:
Whiteboard:
Depends On:	2049142 2054109 2054277 2054283
Blocks:
TreeView+	depends on / blocked

Reported:	2021-12-09 15:01 UTC by Bram Verschueren
Modified:	2025-08-08 12:08 UTC (History)
CC List:	8 users (show)
Fixed In Version:	cluster-network-addons-operator v4.8.5-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-10 14:12:59 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	2049142	1	medium	CLOSED	Missing "app" label	2022-08-10 10:46:54 UTC
Red Hat Knowledge Base (Solution)	6574891	0	None	None	None	2021-12-14 09:59:35 UTC

Internal Links: 2049142

Description Bram Verschueren 2021-12-09 15:01:52 UTC

Description of problem:
Installing the CNV operator in a cluster with an existing NMState operator prevents NNCP's from being applied.

Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.19    True        False         2m6s    Cluster version is 4.8.19

$ oc get csv -n openshift-cnv
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v4.8.3   OpenShift Virtualization   4.8.3     kubevirt-hyperconverged-operator.v4.8.2   Succeeded


How reproducible:


Steps to Reproduce:


1. install nmstate-operator in "openshift-nmstate" and create an instance

$ oc get csv -n openshift-nmstate
NAME                                             DISPLAY                       VERSION              REPLACES   PHASE
kubernetes-nmstate-operator.4.8.0-202111191337   Kubernetes NMState Operator   4.8.0-202111191337              Installing

$ oc get clusterrolebindings nmstate-handler -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: "2021-12-09T14:05:58Z"
  name: nmstate-handler
  ownerReferences:
  - apiVersion: nmstate.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: NMState
    name: nmstate
    uid: 5ecfa4b1-db8c-4960-88ff-32c365d48d60
  resourceVersion: "40296"
  uid: 8a2d642b-7fc4-4cb8-85c6-d072f331f7c1
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: nmstate-handler
subjects:
- kind: ServiceAccount
  name: nmstate-handler
  namespace: openshift-nmstate           <<-------- initially "openshift-nmstate"

2. create NNCP for dummy interface

$ cat <<EOF|oc apply -f - 
apiVersion: nmstate.io/v1beta1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: dummy-if
spec:
  nodeSelector: 
    node-role.kubernetes.io/worker: "" 
  desiredState:
    interfaces:
    - name: dummy
      type: dummy
      state: up
      ipv4:
        address:
        - ip: 10.244.0.1
          prefix-length: 24
        dhcp: false
        enabled: true
EOF
nodenetworkconfigurationpolicy.nmstate.io/dummy-if created

$ oc get nncp
NAME       STATUS
dummy-if   SuccessfullyConfigured

$ oc debug node/<node-name> -- ip link show dummy
Starting pod/<node-name>-debug ...
To use host binaries, run `chroot /host`

12: dummy: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 12:8f:7c:34:b8:11 brd ff:ff:ff:ff:ff:ff

$ oc get ds -n openshift-nmstate
NAME              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                   AGE
nmstate-handler   6         6         6       6            6           beta.kubernetes.io/arch=amd64   5m

3. install CNV 


$ oc get csv -n openshift-cnv
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v4.8.3   OpenShift Virtualization   4.8.3     kubevirt-hyperconverged-operator.v4.8.2   Installing

$ cat <<EOF|oc apply -f - 
apiVersion: nmstate.io/v1beta1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: dummy-if-2
spec:
  nodeSelector: 
    node-role.kubernetes.io/worker: "" 
  desiredState:
    interfaces:
    - name: dummy2
      type: dummy
      state: up
      ipv4:
        address:
        - ip: 10.244.0.1
          prefix-length: 24
        dhcp: false
        enabled: true
EOF
nodenetworkconfigurationpolicy.nmstate.io/dummy-if-2 created

$ oc get nncp
NAME         STATUS
dummy-if     SuccessfullyConfigured
dummy-if-2   

$ oc debug node/<node-name> -- ip link show dummy2
Starting pod/<node-name>-debug ...
To use host binaries, run `chroot /host`
Device "dummy2" does not exist.


$ oc logs ds/nmstate-handler -n openshift-cnv
Found 6 pods, using pod/nmstate-handler-4tjq9
{"level":"info","ts":1639059222.5338857,"logger":"setup","msg":"Try to take exclusive lock on file: /var/k8s_nmstate/handler_lock"}

$ oc logs ds/nmstate-handler -n openshift-nmstate|tail -n5
Found 6 pods, using pod/nmstate-handler-7gh4s
E1209 14:22:16.739419       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:229: Failed to watch *v1beta1.NodeNetworkConfigurationEnactment: failed to list *v1beta1.NodeNetworkConfigurationEnactment: nodenetworkconfigurationenactments.nmstate.io is forbidden: User "system:serviceaccount:openshift-nmstate:nmstate-handler" cannot list resource "nodenetworkconfigurationenactments" in API group "nmstate.io" at the cluster scope
E1209 14:22:23.039872       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:229: Failed to watch *v1beta1.NodeNetworkConfigurationPolicy: failed to list *v1beta1.NodeNetworkConfigurationPolicy: nodenetworkconfigurationpolicies.nmstate.io is forbidden: User "system:serviceaccount:openshift-nmstate:nmstate-handler" cannot list resource "nodenetworkconfigurationpolicies" in API group "nmstate.io" at the cluster scope
E1209 14:22:29.549222       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:229: Failed to watch *v1.Node: failed to list *v1.Node: nodes is forbidden: User "system:serviceaccount:openshift-nmstate:nmstate-handler" cannot list resource "nodes" in API group "" at the cluster scope
E1209 14:22:29.712126       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:229: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:openshift-nmstate:nmstate-handler" cannot list resource "pods" in API group "" at the cluster scope
E1209 14:22:32.602496       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:229: Failed to watch *v1beta1.NodeNetworkState: failed to list *v1beta1.NodeNetworkState: nodenetworkstates.nmstate.io is forbidden: User "system:serviceaccount:openshift-nmstate:nmstate-handler" cannot list resource "nodenetworkstates" in API group "nmstate.io" at the cluster scope

$ oc get ds -n openshift-cnv
NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                   AGE
bridge-marker                  6         6         6       6            6           beta.kubernetes.io/arch=amd64   3m44s
kube-cni-linux-bridge-plugin   6         6         6       6            6           beta.kubernetes.io/arch=amd64   3m45s
nmstate-handler                6         6         0       6            0           beta.kubernetes.io/arch=amd64   3m43s    <<---- 0/6 available
virt-handler                   3         3         3       3            3           kubernetes.io/os=linux          2m11s

$ oc get clusterrolebindings nmstate-handler -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: "2021-12-09T14:05:58Z"
  labels:
    networkaddonsoperator.network.kubevirt.io/version: sha256_4e63115a111ed2e3ffd772c242e2432c92ccc46cf7182cf2efb804ba
  name: nmstate-handler
  ownerReferences:
  - apiVersion: networkaddonsoperator.network.kubevirt.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: NetworkAddonsConfig
    name: cluster
    uid: ae8552ab-39f9-475b-9434-8342706efb9f
  resourceVersion: "45582"
  uid: 8a2d642b-7fc4-4cb8-85c6-d072f331f7c1
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: nmstate-handler
subjects:
- kind: ServiceAccount
  name: nmstate-handler
  namespace: openshift-cnv            <<--------- changed to "openshift-cnv"

4. [workaround] delete nmstate instance

$ oc delete nmstate nmstate
nmstate.nmstate.io "nmstate" deleted

$ oc get ds -n openshift-cnv
NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                   AGE
bridge-marker                  6         6         6       6            6           beta.kubernetes.io/arch=amd64   16m
kube-cni-linux-bridge-plugin   6         6         6       6            6           beta.kubernetes.io/arch=amd64   16m
nmstate-handler                6         6         6       6            6           beta.kubernetes.io/arch=amd64   16m   <<--- 6/6 available
virt-handler                   3         3         3       3            3           kubernetes.io/os=linux          14m

$ oc debug node/<node-name> -- ip link show dummy2
Starting pod/<node-name>-debug ...
To use host binaries, run `chroot /host`
24: dummy2: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 86:bb:41:34:a9:ec brd ff:ff:ff:ff:ff:ff

$ oc logs ds/nmstate-handler -n openshift-cnv|tail -5
Found 6 pods, using pod/nmstate-handler-4tjq9
{"level":"info","ts":1639060184.391553,"logger":"policyconditions","msg":"enactments count: {failed: {true: 0, false: 6, unknown: 0}, progressing: {true: 0, false: 6, unknown: 0}, available: {true: 3, false: 3, unknown: 0}, matching: {true: 3, false: 3, unknown: 0}, aborted: {true: 0, false: 6, unknown: 0}}","policy":"dummy-if"}
{"level":"info","ts":1639060184.391586,"logger":"policyconditions","msg":"SetPolicySuccess"}
{"level":"info","ts":1639060184.916024,"logger":"client","msg":"Skipping NodeNetworkState update, node network configuration not changed"}
{"level":"info","ts":1639060224.3191195,"logger":"controllers.Node","msg":"Network configuration changed, updating NodeNetworkState"}
{"level":"info","ts":1639060224.3192284,"logger":"client","msg":"Skipping NodeNetworkState update, node network configuration not changed"}



Actual results:
NNCP is not applied after installing CNV operator.

Expected results:
Either standalone or cnv-based nmstate should handle NNCP's, or the user should be warned about having both active at the same time.

Additional info:

Comment 1 Bram Verschueren 2021-12-10 08:41:44 UTC

The same behavior on 4.10 nightly.

Deleting the nmstate-handler pods in the openshift-nmstate namespace seems sufficient to ensure the nmstate-handler in the openshift-cnv namespace acquires the handler lock:

$ oc delete pods -n openshift-nmstate -l app=kubernetes-nmstate -l component=kubernetes-nmstate-handler
$ oc get ds/nmstate-handler -n openshift-nmstate
NAME              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                   AGE
nmstate-handler   6         6         0       6            0           beta.kubernetes.io/arch=amd64   25m  <<---
$ oc get ds/nmstate-handler -n openshift-cnv
NAME              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                   AGE
nmstate-handler   6         6         6       6            6           beta.kubernetes.io/arch=amd64   9m38s  <<---
$ oc logs ds/nmstate-handler -n openshift-nmstate
Found 6 pods, using pod/nmstate-handler-5h5q5
{"level":"info","ts":1639125319.3930314,"logger":"setup","msg":"Try to take exclusive lock on file: /var/k8s_nmstate/handler_lock"}
$ oc get nncp
NAME         STATUS
dummy-if     Available
dummy-if-2   Available

Comment 2 Dan Kenigsberg 2021-12-21 19:37:16 UTC

(In reply to Bram Verschueren from comment #1)
> The same behavior on 4.10 nightly.

Which versions of the standalone NMState and CNV have you tested?
In 4.10, CNV's nmstate should automatically back off and not even run if the standalone NMState is installed.
@ellorent do you think that this logic is easily backportable to 4.9 or even 4.8?

Until then, I would recommend the customer to uninstall the standalone NMState.

Comment 3 Quique Llorente 2021-12-22 08:48:12 UTC

(In reply to Dan Kenigsberg from comment #2)
> (In reply to Bram Verschueren from comment #1)
> > The same behavior on 4.10 nightly.
> 
> Which versions of the standalone NMState and CNV have you tested?
> In 4.10, CNV's nmstate should automatically back off and not even run if the
> standalone NMState is installed.
> @ellorent do you think that this logic is easily backportable to
> 4.9 or even 4.8?
> 
> Until then, I would recommend the customer to uninstall the standalone
> NMState.

Integration with the kubernetes-nmstate-operator at CNAO should be present at 4.9, I can try to backport it to 4.8.

Comment 4 Dan Kenigsberg 2021-12-22 08:56:58 UTC

(In reply to Quique Llorente from comment #3)
> Integration with the kubernetes-nmstate-operator at CNAO should be present
> at 4.9, I can try to backport it to 4.8.

Please check how difficult/risky would it be. But let us first ask @bverschu if upgrading to 4.9 is an option for the customer.

Comment 5 Quique Llorente 2021-12-22 09:00:59 UTC

Anyhow I have create the PR to backport to 4.8 https://github.com/kubevirt/cluster-network-addons-operator/pull/1122, should be safe the nmstate operator code from 4.8 hasn't change.

Comment 6 Bram Verschueren 2022-01-04 08:06:28 UTC

(In reply to Dan Kenigsberg from comment #2)
> (In reply to Bram Verschueren from comment #1)
> > The same behavior on 4.10 nightly.
> 
> Which versions of the standalone NMState and CNV have you tested?
> In 4.10, CNV's nmstate should automatically back off and not even run if the
> standalone NMState is installed.
> @ellorent do you think that this logic is easily backportable to
> 4.9 or even 4.8?
> 
> Until then, I would recommend the customer to uninstall the standalone
> NMState.

The versions in the reproducer were kubevirt-hyperconverged-operator.v4.8.3 and standalone nmstate kubernetes-nmstate-operator.4.8.0-202111191337.

Deleting the kubernetes-nmstate-handler pods in the openshift-nmstate namespace allowed the nmstate-handler in the openshift-cnv to acquire the lock and continue operations, afterwards I deleted the standalone nmstate operator.
This is now documented in https://access.redhat.com/solutions/6574891.

Comment 7 Dan Kenigsberg 2022-01-04 08:52:07 UTC

(In reply to Quique Llorente from comment #5)
> Anyhow I have create the PR to backport to 4.8
> https://github.com/kubevirt/cluster-network-addons-operator/pull/1122,
> should be safe the nmstate operator code from 4.8 hasn't change.

Shouldn't this bug move to CNAO and set to POST and a 4.8.z target version?

Comment 8 Petr Horáček 2022-01-20 10:04:19 UTC

Still waiting for 4.8.5 errata to get created.

Comment 9 awax 2022-02-01 12:00:26 UTC

Testes on a 4.8 cluster:
[cnv-qe-jenkins@n-awax-48-48wnq-executor ~]$ oc version
Client Version: 4.10.0-202201310820.p0.g7c299f1.assembly.stream-7c299f1
Server Version: 4.8.21
Kubernetes Version: v1.21.5+6a39d04

[cnv-qe-jenkins@n-awax-48-48wnq-executor ~]$ oc get csv -n openshift-cnv
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v4.8.4   OpenShift Virtualization   4.8.4     kubevirt-hyperconverged-operator.v4.8.3   InstallReady


The bug still exists.
Moving the status back to POST.

Comment 10 Quique Llorente 2022-02-01 12:13:24 UTC

u/s fix https://github.com/kubevirt/cluster-network-addons-operator/pull/1177

Comment 11 Quique Llorente 2022-02-01 12:41:48 UTC

Looks like u/s integration test was working since label was not missing at the deployment.

Comment 12 Quique Llorente 2022-02-01 15:45:18 UTC

Looks like it's a bug at openshift operator, they are missing the "app" label from u/s repo https://bugzilla.redhat.com/show_bug.cgi?id=2049142

Comment 13 Petr Horáček 2022-02-03 12:55:05 UTC

Quique, could you check with the operator team whether they will backport their fix all the way to 4.8? If they don't do that, we cannot resolve this BZ, do I understand it correctly?

Comment 14 Quique Llorente 2022-02-03 13:13:08 UTC

@cstabler can we backport the fix to 4.8 ?

Comment 15 Christoph Stäbler 2022-02-03 13:33:35 UTC

@ellorent sure. I planned it anyhow. But let https://bugzilla.redhat.com/show_bug.cgi?id=2049142 get verified first.

Comment 18 Petr Horáček 2022-03-10 14:12:59 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=2054283#c2

Note You need to log in before you can comment on or make changes to this bug.