Description of problem: Installing the CNV operator in a cluster with an existing NMState operator prevents NNCP's from being applied. Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.19 True False 2m6s Cluster version is 4.8.19 $ oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v4.8.3 OpenShift Virtualization 4.8.3 kubevirt-hyperconverged-operator.v4.8.2 Succeeded How reproducible: Steps to Reproduce: 1. install nmstate-operator in "openshift-nmstate" and create an instance $ oc get csv -n openshift-nmstate NAME DISPLAY VERSION REPLACES PHASE kubernetes-nmstate-operator.4.8.0-202111191337 Kubernetes NMState Operator 4.8.0-202111191337 Installing $ oc get clusterrolebindings nmstate-handler -o yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: creationTimestamp: "2021-12-09T14:05:58Z" name: nmstate-handler ownerReferences: - apiVersion: nmstate.io/v1beta1 blockOwnerDeletion: true controller: true kind: NMState name: nmstate uid: 5ecfa4b1-db8c-4960-88ff-32c365d48d60 resourceVersion: "40296" uid: 8a2d642b-7fc4-4cb8-85c6-d072f331f7c1 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: nmstate-handler subjects: - kind: ServiceAccount name: nmstate-handler namespace: openshift-nmstate <<-------- initially "openshift-nmstate" 2. create NNCP for dummy interface $ cat <<EOF|oc apply -f - apiVersion: nmstate.io/v1beta1 kind: NodeNetworkConfigurationPolicy metadata: name: dummy-if spec: nodeSelector: node-role.kubernetes.io/worker: "" desiredState: interfaces: - name: dummy type: dummy state: up ipv4: address: - ip: 10.244.0.1 prefix-length: 24 dhcp: false enabled: true EOF nodenetworkconfigurationpolicy.nmstate.io/dummy-if created $ oc get nncp NAME STATUS dummy-if SuccessfullyConfigured $ oc debug node/<node-name> -- ip link show dummy Starting pod/<node-name>-debug ... To use host binaries, run `chroot /host` 12: dummy: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 12:8f:7c:34:b8:11 brd ff:ff:ff:ff:ff:ff $ oc get ds -n openshift-nmstate NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE nmstate-handler 6 6 6 6 6 beta.kubernetes.io/arch=amd64 5m 3. install CNV $ oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v4.8.3 OpenShift Virtualization 4.8.3 kubevirt-hyperconverged-operator.v4.8.2 Installing $ cat <<EOF|oc apply -f - apiVersion: nmstate.io/v1beta1 kind: NodeNetworkConfigurationPolicy metadata: name: dummy-if-2 spec: nodeSelector: node-role.kubernetes.io/worker: "" desiredState: interfaces: - name: dummy2 type: dummy state: up ipv4: address: - ip: 10.244.0.1 prefix-length: 24 dhcp: false enabled: true EOF nodenetworkconfigurationpolicy.nmstate.io/dummy-if-2 created $ oc get nncp NAME STATUS dummy-if SuccessfullyConfigured dummy-if-2 $ oc debug node/<node-name> -- ip link show dummy2 Starting pod/<node-name>-debug ... To use host binaries, run `chroot /host` Device "dummy2" does not exist. $ oc logs ds/nmstate-handler -n openshift-cnv Found 6 pods, using pod/nmstate-handler-4tjq9 {"level":"info","ts":1639059222.5338857,"logger":"setup","msg":"Try to take exclusive lock on file: /var/k8s_nmstate/handler_lock"} $ oc logs ds/nmstate-handler -n openshift-nmstate|tail -n5 Found 6 pods, using pod/nmstate-handler-7gh4s E1209 14:22:16.739419 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:229: Failed to watch *v1beta1.NodeNetworkConfigurationEnactment: failed to list *v1beta1.NodeNetworkConfigurationEnactment: nodenetworkconfigurationenactments.nmstate.io is forbidden: User "system:serviceaccount:openshift-nmstate:nmstate-handler" cannot list resource "nodenetworkconfigurationenactments" in API group "nmstate.io" at the cluster scope E1209 14:22:23.039872 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:229: Failed to watch *v1beta1.NodeNetworkConfigurationPolicy: failed to list *v1beta1.NodeNetworkConfigurationPolicy: nodenetworkconfigurationpolicies.nmstate.io is forbidden: User "system:serviceaccount:openshift-nmstate:nmstate-handler" cannot list resource "nodenetworkconfigurationpolicies" in API group "nmstate.io" at the cluster scope E1209 14:22:29.549222 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:229: Failed to watch *v1.Node: failed to list *v1.Node: nodes is forbidden: User "system:serviceaccount:openshift-nmstate:nmstate-handler" cannot list resource "nodes" in API group "" at the cluster scope E1209 14:22:29.712126 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:229: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:openshift-nmstate:nmstate-handler" cannot list resource "pods" in API group "" at the cluster scope E1209 14:22:32.602496 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:229: Failed to watch *v1beta1.NodeNetworkState: failed to list *v1beta1.NodeNetworkState: nodenetworkstates.nmstate.io is forbidden: User "system:serviceaccount:openshift-nmstate:nmstate-handler" cannot list resource "nodenetworkstates" in API group "nmstate.io" at the cluster scope $ oc get ds -n openshift-cnv NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE bridge-marker 6 6 6 6 6 beta.kubernetes.io/arch=amd64 3m44s kube-cni-linux-bridge-plugin 6 6 6 6 6 beta.kubernetes.io/arch=amd64 3m45s nmstate-handler 6 6 0 6 0 beta.kubernetes.io/arch=amd64 3m43s <<---- 0/6 available virt-handler 3 3 3 3 3 kubernetes.io/os=linux 2m11s $ oc get clusterrolebindings nmstate-handler -o yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: creationTimestamp: "2021-12-09T14:05:58Z" labels: networkaddonsoperator.network.kubevirt.io/version: sha256_4e63115a111ed2e3ffd772c242e2432c92ccc46cf7182cf2efb804ba name: nmstate-handler ownerReferences: - apiVersion: networkaddonsoperator.network.kubevirt.io/v1 blockOwnerDeletion: true controller: true kind: NetworkAddonsConfig name: cluster uid: ae8552ab-39f9-475b-9434-8342706efb9f resourceVersion: "45582" uid: 8a2d642b-7fc4-4cb8-85c6-d072f331f7c1 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: nmstate-handler subjects: - kind: ServiceAccount name: nmstate-handler namespace: openshift-cnv <<--------- changed to "openshift-cnv" 4. [workaround] delete nmstate instance $ oc delete nmstate nmstate nmstate.nmstate.io "nmstate" deleted $ oc get ds -n openshift-cnv NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE bridge-marker 6 6 6 6 6 beta.kubernetes.io/arch=amd64 16m kube-cni-linux-bridge-plugin 6 6 6 6 6 beta.kubernetes.io/arch=amd64 16m nmstate-handler 6 6 6 6 6 beta.kubernetes.io/arch=amd64 16m <<--- 6/6 available virt-handler 3 3 3 3 3 kubernetes.io/os=linux 14m $ oc debug node/<node-name> -- ip link show dummy2 Starting pod/<node-name>-debug ... To use host binaries, run `chroot /host` 24: dummy2: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 86:bb:41:34:a9:ec brd ff:ff:ff:ff:ff:ff $ oc logs ds/nmstate-handler -n openshift-cnv|tail -5 Found 6 pods, using pod/nmstate-handler-4tjq9 {"level":"info","ts":1639060184.391553,"logger":"policyconditions","msg":"enactments count: {failed: {true: 0, false: 6, unknown: 0}, progressing: {true: 0, false: 6, unknown: 0}, available: {true: 3, false: 3, unknown: 0}, matching: {true: 3, false: 3, unknown: 0}, aborted: {true: 0, false: 6, unknown: 0}}","policy":"dummy-if"} {"level":"info","ts":1639060184.391586,"logger":"policyconditions","msg":"SetPolicySuccess"} {"level":"info","ts":1639060184.916024,"logger":"client","msg":"Skipping NodeNetworkState update, node network configuration not changed"} {"level":"info","ts":1639060224.3191195,"logger":"controllers.Node","msg":"Network configuration changed, updating NodeNetworkState"} {"level":"info","ts":1639060224.3192284,"logger":"client","msg":"Skipping NodeNetworkState update, node network configuration not changed"} Actual results: NNCP is not applied after installing CNV operator. Expected results: Either standalone or cnv-based nmstate should handle NNCP's, or the user should be warned about having both active at the same time. Additional info:
The same behavior on 4.10 nightly. Deleting the nmstate-handler pods in the openshift-nmstate namespace seems sufficient to ensure the nmstate-handler in the openshift-cnv namespace acquires the handler lock: $ oc delete pods -n openshift-nmstate -l app=kubernetes-nmstate -l component=kubernetes-nmstate-handler $ oc get ds/nmstate-handler -n openshift-nmstate NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE nmstate-handler 6 6 0 6 0 beta.kubernetes.io/arch=amd64 25m <<--- $ oc get ds/nmstate-handler -n openshift-cnv NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE nmstate-handler 6 6 6 6 6 beta.kubernetes.io/arch=amd64 9m38s <<--- $ oc logs ds/nmstate-handler -n openshift-nmstate Found 6 pods, using pod/nmstate-handler-5h5q5 {"level":"info","ts":1639125319.3930314,"logger":"setup","msg":"Try to take exclusive lock on file: /var/k8s_nmstate/handler_lock"} $ oc get nncp NAME STATUS dummy-if Available dummy-if-2 Available
(In reply to Bram Verschueren from comment #1) > The same behavior on 4.10 nightly. Which versions of the standalone NMState and CNV have you tested? In 4.10, CNV's nmstate should automatically back off and not even run if the standalone NMState is installed. @ellorent do you think that this logic is easily backportable to 4.9 or even 4.8? Until then, I would recommend the customer to uninstall the standalone NMState.
(In reply to Dan Kenigsberg from comment #2) > (In reply to Bram Verschueren from comment #1) > > The same behavior on 4.10 nightly. > > Which versions of the standalone NMState and CNV have you tested? > In 4.10, CNV's nmstate should automatically back off and not even run if the > standalone NMState is installed. > @ellorent do you think that this logic is easily backportable to > 4.9 or even 4.8? > > Until then, I would recommend the customer to uninstall the standalone > NMState. Integration with the kubernetes-nmstate-operator at CNAO should be present at 4.9, I can try to backport it to 4.8.
(In reply to Quique Llorente from comment #3) > Integration with the kubernetes-nmstate-operator at CNAO should be present > at 4.9, I can try to backport it to 4.8. Please check how difficult/risky would it be. But let us first ask @bverschu if upgrading to 4.9 is an option for the customer.
Anyhow I have create the PR to backport to 4.8 https://github.com/kubevirt/cluster-network-addons-operator/pull/1122, should be safe the nmstate operator code from 4.8 hasn't change.
(In reply to Dan Kenigsberg from comment #2) > (In reply to Bram Verschueren from comment #1) > > The same behavior on 4.10 nightly. > > Which versions of the standalone NMState and CNV have you tested? > In 4.10, CNV's nmstate should automatically back off and not even run if the > standalone NMState is installed. > @ellorent do you think that this logic is easily backportable to > 4.9 or even 4.8? > > Until then, I would recommend the customer to uninstall the standalone > NMState. The versions in the reproducer were kubevirt-hyperconverged-operator.v4.8.3 and standalone nmstate kubernetes-nmstate-operator.4.8.0-202111191337. Deleting the kubernetes-nmstate-handler pods in the openshift-nmstate namespace allowed the nmstate-handler in the openshift-cnv to acquire the lock and continue operations, afterwards I deleted the standalone nmstate operator. This is now documented in https://access.redhat.com/solutions/6574891.
(In reply to Quique Llorente from comment #5) > Anyhow I have create the PR to backport to 4.8 > https://github.com/kubevirt/cluster-network-addons-operator/pull/1122, > should be safe the nmstate operator code from 4.8 hasn't change. Shouldn't this bug move to CNAO and set to POST and a 4.8.z target version?
Still waiting for 4.8.5 errata to get created.
Testes on a 4.8 cluster: [cnv-qe-jenkins@n-awax-48-48wnq-executor ~]$ oc version Client Version: 4.10.0-202201310820.p0.g7c299f1.assembly.stream-7c299f1 Server Version: 4.8.21 Kubernetes Version: v1.21.5+6a39d04 [cnv-qe-jenkins@n-awax-48-48wnq-executor ~]$ oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v4.8.4 OpenShift Virtualization 4.8.4 kubevirt-hyperconverged-operator.v4.8.3 InstallReady The bug still exists. Moving the status back to POST.
u/s fix https://github.com/kubevirt/cluster-network-addons-operator/pull/1177
Looks like u/s integration test was working since label was not missing at the deployment.
Looks like it's a bug at openshift operator, they are missing the "app" label from u/s repo https://bugzilla.redhat.com/show_bug.cgi?id=2049142
Quique, could you check with the operator team whether they will backport their fix all the way to 4.8? If they don't do that, we cannot resolve this BZ, do I understand it correctly?
@cstabler can we backport the fix to 4.8 ?
@ellorent sure. I planned it anyhow. But let https://bugzilla.redhat.com/show_bug.cgi?id=2049142 get verified first.
https://bugzilla.redhat.com/show_bug.cgi?id=2054283#c2