Description of problem: Upgraded cnv from 4.9.0(production image) to 4.9.1 (staging), post upgrade test failed due to hostpath provisioner pods not being restarted. Version-Release number of selected component (if applicable): CNV-4.9.1 How reproducible: Saw it twice. Steps to Reproduce: 1. Deploy a cluster with 4.9.0 production image 2. Attempt a CNV upgrade using staging image of 4.9.1 3. Actual results: =========== [cnv-qe-jenkins@c01-dbn-49p-qqwj4-executor ~]$ kubectl get pods -n openshift-cnv NAME READY STATUS RESTARTS AGE bridge-marker-7mhrv 1/1 Running 0 17m bridge-marker-fddlf 1/1 Running 0 17m bridge-marker-kwr22 1/1 Running 0 17m bridge-marker-vgnh8 1/1 Running 0 17m bridge-marker-wk5c6 1/1 Running 0 17m bridge-marker-x78qv 1/1 Running 0 17m cdi-apiserver-696d66459b-5k4j5 1/1 Running 0 18m cdi-deployment-57fc5585cc-hmdkn 1/1 Running 0 18m cdi-operator-54455f5769-ck5rx 1/1 Running 0 18m cdi-uploadproxy-85495cd56d-wmv8z 1/1 Running 0 18m cluster-network-addons-operator-5d6766cbc7-d5j79 1/1 Running 0 18m hco-operator-769b886f48-zgqdm 1/1 Running 0 18m hco-webhook-6fb4895fcc-79x8l 1/1 Running 0 11m hostpath-provisioner-l779s 1/1 Running 0 24h hostpath-provisioner-ldgq7 1/1 Running 0 24h hostpath-provisioner-operator-79956c7f74-49gcm 1/1 Running 0 18m hostpath-provisioner-xxw4x 1/1 Running 0 24h hyperconverged-cluster-cli-download-ddc88c975-sdjz9 1/1 Running 0 18m kube-cni-linux-bridge-plugin-bgjkf 1/1 Running 0 15m kube-cni-linux-bridge-plugin-bxbzc 1/1 Running 0 14m kube-cni-linux-bridge-plugin-k6xnh 1/1 Running 0 13m kube-cni-linux-bridge-plugin-rwksv 1/1 Running 0 15m kube-cni-linux-bridge-plugin-rxgx4 1/1 Running 0 16m kube-cni-linux-bridge-plugin-tbjcz 1/1 Running 0 17m kubemacpool-cert-manager-7b7bcfc9db-hgk8r 1/1 Running 0 17m kubemacpool-mac-controller-manager-88b9c5b99-8wgfz 1/1 Running 0 17m nmstate-cert-manager-5445f89dc-79qn8 1/1 Running 0 17m nmstate-handler-6qqkb 1/1 Running 0 17m nmstate-handler-jq59x 1/1 Running 0 16m nmstate-handler-kmrb6 1/1 Running 0 17m nmstate-handler-qm4c9 1/1 Running 0 16m nmstate-handler-qzbgj 1/1 Running 0 16m nmstate-handler-rlgvj 1/1 Running 0 17m nmstate-webhook-7db7676bff-l9x25 1/1 Running 0 17m nmstate-webhook-7db7676bff-xqdsn 1/1 Running 0 17m node-maintenance-operator-697fcf5fc5-xl56c 1/1 Running 0 11m ssp-operator-588cd89ff4-vnplj 1/1 Running 0 11m virt-api-65f7fb9d9f-ns4vb 1/1 Running 0 12m virt-api-65f7fb9d9f-qsz7s 1/1 Running 0 12m virt-controller-6db5bcbdcd-7fztq 1/1 Running 0 13m virt-controller-6db5bcbdcd-lfhmr 1/1 Running 0 13m virt-handler-mdkwg 1/1 Running 0 16m virt-handler-v79nl 1/1 Running 0 15m virt-handler-zk2dv 1/1 Running 0 14m virt-operator-b494cbf78-crzff 1/1 Running 0 17m virt-operator-b494cbf78-sgf6c 1/1 Running 0 18m virt-template-validator-5dc6645f69-5n6pt 1/1 Running 0 17m virt-template-validator-5dc6645f69-stpgv 1/1 Running 0 18m vm-import-controller-5557d57bb7-4m56z 1/1 Running 0 18m vm-import-operator-8567d595fd-kvkgn 1/1 Running 0 18m [cnv-qe-jenkins@c01-dbn-49p-qqwj4-executor ~]$ =============== Expected results: All pods should have been restarted. Additional info:
Associated logs from the pods: ================================== cnv-qe-jenkins@c01-dbnp-490-hl62r-executor ~]$ kubectl logs hostpath-provisioner-gb5b4 -n openshift-cnv I1115 20:06:22.461624 1 hostpath-provisioner.go:82] initiating kubevirt/hostpath-provisioner on node: c01-dbnp-490-hl62r-worker-0-9sp8h I1115 20:06:22.461690 1 hostpath-provisioner.go:277] creating provisioner controller with name: kubevirt.io/hostpath-provisioner I1115 20:06:22.462010 1 controller.go:772] Starting provisioner controller kubevirt.io/hostpath-provisioner_hostpath-provisioner-gb5b4_8a7d3118-350f-418f-8b81-b54e9b7b0c3f! I1115 20:06:22.562204 1 controller.go:821] Started provisioner controller kubevirt.io/hostpath-provisioner_hostpath-provisioner-gb5b4_8a7d3118-350f-418f-8b81-b54e9b7b0c3f! I1115 20:09:50.406102 1 hostpath-provisioner.go:95] isCorrectNodeByBindingMode mode: WaitForFirstConsumer I1115 20:09:50.406131 1 hostpath-provisioner.go:118] missing volume.kubernetes.io/selected-node annotation, skipping operations for pvc I1115 20:09:50.411439 1 hostpath-provisioner.go:95] isCorrectNodeByBindingMode mode: WaitForFirstConsumer I1115 20:09:50.411455 1 hostpath-provisioner.go:118] missing volume.kubernetes.io/selected-node annotation, skipping operations for pvc I1115 20:09:50.420885 1 hostpath-provisioner.go:95] isCorrectNodeByBindingMode mode: WaitForFirstConsumer I1115 20:09:50.420902 1 hostpath-provisioner.go:118] missing volume.kubernetes.io/selected-node annotation, skipping operations for pvc [cnv-qe-jenkins@c01-dbnp-490-hl62r-executor ~]$ =================================== Log from HPP operator pod: http://pastebin.test.redhat.com/1008652
On HPP operator logs we see: {"level":"error","ts":1637081879.5886474,"logger":"controller_hostpathprovisioner","msg":"Unable to create DaemonSet","Request.Namespace":"","Request.Name":"hostpath-provisioner","error":"DaemonSet.apps "hostpath-provisioner" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{"app.kubernetes.io/component":"storage", "app.kubernetes.io/managed-by":"hostpath-provisioner-operator", "app.kubernetes.io/part-of":"hyperconverged-cluster", "app.kubernetes.io/version":"v4.9.1", "k8s-app":"hostpath-provisioner"}: `selector` does not match template `labels`","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error /remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:128 kubevirt.io/hostpath-provisioner-operator/pkg/controller/hostpathprovisioner.(*ReconcileHostPathProvisioner).reconcileUpdate /remote-source/app/pkg/controller/hostpathprovisioner/controller.go:342 kubevirt.io/hostpath-provisioner-operator/pkg/controller/hostpathprovisioner.(*ReconcileHostPathProvisioner).Reconcile while on the previous object from CNV 4.9.0: $ oc get daemonsets -n openshift-cnv hostpath-provisioner -o json | jq ".metadata.labels" { "app.kubernetes.io/component": "storage", "app.kubernetes.io/managed-by": "hostpath-provisioner-operator", "app.kubernetes.io/part-of": "hyperconverged-cluster", "app.kubernetes.io/version": "v4.9.0", "k8s-app": "hostpath-provisioner" } So the issue is in the operator trying to update hostpath-provisioner daemonset passing a template with a label ("app.kubernetes.io/version") that doesn't match the existing object ("v4.9.0" vs "v4.9.1").
Verified for CNV v4.9.2-22, HCO v4.9.2-6, hostpath-provisioner-operator v4.9.2-2 Upgraded CNV v4.8.4 -> 4.9.0 -> 4.9.1 -> 4.9.2-22 $ oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v4.9.2 OpenShift Virtualization 4.9.2 kubevirt-hyperconverged-operator.v4.9.1 Succeeded $ oc get ip -n openshift-cnv NAME CSV APPROVAL APPROVED install-6f2x9 kubevirt-hyperconverged-operator.v4.9.1 Manual true install-pcjnw kubevirt-hyperconverged-operator.v4.9.0 Manual true install-ws4fk kubevirt-hyperconverged-operator.v4.8.4 Manual true install-x4h9g kubevirt-hyperconverged-operator.v4.9.2 Manual true hostpath-provisioner pods are restarted: $ oc get pods -n openshift-cnv | grep hostpath hostpath-provisioner-8qm4q 1/1 Running 0 17h hostpath-provisioner-nskhg 1/1 Running 0 17h hostpath-provisioner-operator-85984bf8f4-2qwj5 1/1 Running 0 17h hostpath-provisioner-wxd2k 1/1 Running 0 17h Upgraded CNV v4.9.0 -> 4.9.1 -> 4.9.2-22 [cnv-qe-jenkins@c01-jen49-up2-vkns4-executor ~]$ oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v4.9.2 OpenShift Virtualization 4.9.2 kubevirt-hyperconverged-operator.v4.9.1 Succeeded [cnv-qe-jenkins@c01-jen49-up2-vkns4-executor ~]$ oc get ip -n openshift-cnv NAME CSV APPROVAL APPROVED install-2h54g kubevirt-hyperconverged-operator.v4.9.0 Manual true install-7cmmn kubevirt-hyperconverged-operator.v4.9.1 Manual true install-csmg7 kubevirt-hyperconverged-operator.v4.9.2 Manual true [cnv-qe-jenkins@c01-jen49-up2-vkns4-executor ~]$ oc get pods -n openshift-cnv | grep hostpath hostpath-provisioner-8hxxc 1/1 Running 0 162m hostpath-provisioner-knwfv 1/1 Running 0 162m hostpath-provisioner-operator-85984bf8f4-c5xnk 1/1 Running 0 163m hostpath-provisioner-tlwt2 1/1 Running 0 162m Upgraded CNV 4.9.1 -> 4.9.2-22 $ oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v4.9.2 OpenShift Virtualization 4.9.2 kubevirt-hyperconverged-operator.v4.9.1 Succeeded $ oc get ip -n openshift-cnv NAME CSV APPROVAL APPROVED install-5g5x8 kubevirt-hyperconverged-operator.v4.9.2 Manual true install-75zzm kubevirt-hyperconverged-operator.v4.9.1 Manual true $ oc get pods -n openshift-cnv | grep hostpath hostpath-provisioner-5xn4c 1/1 Running 0 10m hostpath-provisioner-brpd6 1/1 Running 0 10m hostpath-provisioner-kvdp6 1/1 Running 0 10m hostpath-provisioner-operator-85984bf8f4-qpgh9 1/1 Running 0 10m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.9.2 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0191