Bug 2022895

Summary: Post upgrade (4.9.0 -> 4.9.1) hostpath provisioner pods are not restarted
Product: Container Native Virtualization (CNV) Reporter: Debarati Basu-Nag <dbasunag>
Component: StorageAssignee: Alex Kalenyuk <akalenyu>
Status: CLOSED ERRATA QA Contact: Jenia Peimer <jpeimer>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.9.1CC: cnv-qe-bugs, ibesso, ngavrilo, stirabos, yadu, ycui
Target Milestone: ---   
Target Release: 4.9.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: hostpath-provisioner-rhel8-operator v4.9.2-2, CNV v4.9.2-4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-19 17:49:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Debarati Basu-Nag 2021-11-12 21:53:56 UTC
Description of problem: Upgraded cnv from 4.9.0(production image) to 4.9.1 (staging), post upgrade test failed due to hostpath provisioner pods not being restarted.


Version-Release number of selected component (if applicable):
CNV-4.9.1

How reproducible:
Saw it twice.

Steps to Reproduce:
1. Deploy a cluster with 4.9.0 production image
2. Attempt a CNV upgrade using staging image of 4.9.1
3.

Actual results:
===========
[cnv-qe-jenkins@c01-dbn-49p-qqwj4-executor ~]$ kubectl get pods -n openshift-cnv
NAME                                                  READY   STATUS    RESTARTS   AGE
bridge-marker-7mhrv                                   1/1     Running   0          17m
bridge-marker-fddlf                                   1/1     Running   0          17m
bridge-marker-kwr22                                   1/1     Running   0          17m
bridge-marker-vgnh8                                   1/1     Running   0          17m
bridge-marker-wk5c6                                   1/1     Running   0          17m
bridge-marker-x78qv                                   1/1     Running   0          17m
cdi-apiserver-696d66459b-5k4j5                        1/1     Running   0          18m
cdi-deployment-57fc5585cc-hmdkn                       1/1     Running   0          18m
cdi-operator-54455f5769-ck5rx                         1/1     Running   0          18m
cdi-uploadproxy-85495cd56d-wmv8z                      1/1     Running   0          18m
cluster-network-addons-operator-5d6766cbc7-d5j79      1/1     Running   0          18m
hco-operator-769b886f48-zgqdm                         1/1     Running   0          18m
hco-webhook-6fb4895fcc-79x8l                          1/1     Running   0          11m
hostpath-provisioner-l779s                            1/1     Running   0          24h
hostpath-provisioner-ldgq7                            1/1     Running   0          24h
hostpath-provisioner-operator-79956c7f74-49gcm        1/1     Running   0          18m
hostpath-provisioner-xxw4x                            1/1     Running   0          24h
hyperconverged-cluster-cli-download-ddc88c975-sdjz9   1/1     Running   0          18m
kube-cni-linux-bridge-plugin-bgjkf                    1/1     Running   0          15m
kube-cni-linux-bridge-plugin-bxbzc                    1/1     Running   0          14m
kube-cni-linux-bridge-plugin-k6xnh                    1/1     Running   0          13m
kube-cni-linux-bridge-plugin-rwksv                    1/1     Running   0          15m
kube-cni-linux-bridge-plugin-rxgx4                    1/1     Running   0          16m
kube-cni-linux-bridge-plugin-tbjcz                    1/1     Running   0          17m
kubemacpool-cert-manager-7b7bcfc9db-hgk8r             1/1     Running   0          17m
kubemacpool-mac-controller-manager-88b9c5b99-8wgfz    1/1     Running   0          17m
nmstate-cert-manager-5445f89dc-79qn8                  1/1     Running   0          17m
nmstate-handler-6qqkb                                 1/1     Running   0          17m
nmstate-handler-jq59x                                 1/1     Running   0          16m
nmstate-handler-kmrb6                                 1/1     Running   0          17m
nmstate-handler-qm4c9                                 1/1     Running   0          16m
nmstate-handler-qzbgj                                 1/1     Running   0          16m
nmstate-handler-rlgvj                                 1/1     Running   0          17m
nmstate-webhook-7db7676bff-l9x25                      1/1     Running   0          17m
nmstate-webhook-7db7676bff-xqdsn                      1/1     Running   0          17m
node-maintenance-operator-697fcf5fc5-xl56c            1/1     Running   0          11m
ssp-operator-588cd89ff4-vnplj                         1/1     Running   0          11m
virt-api-65f7fb9d9f-ns4vb                             1/1     Running   0          12m
virt-api-65f7fb9d9f-qsz7s                             1/1     Running   0          12m
virt-controller-6db5bcbdcd-7fztq                      1/1     Running   0          13m
virt-controller-6db5bcbdcd-lfhmr                      1/1     Running   0          13m
virt-handler-mdkwg                                    1/1     Running   0          16m
virt-handler-v79nl                                    1/1     Running   0          15m
virt-handler-zk2dv                                    1/1     Running   0          14m
virt-operator-b494cbf78-crzff                         1/1     Running   0          17m
virt-operator-b494cbf78-sgf6c                         1/1     Running   0          18m
virt-template-validator-5dc6645f69-5n6pt              1/1     Running   0          17m
virt-template-validator-5dc6645f69-stpgv              1/1     Running   0          18m
vm-import-controller-5557d57bb7-4m56z                 1/1     Running   0          18m
vm-import-operator-8567d595fd-kvkgn                   1/1     Running   0          18m
[cnv-qe-jenkins@c01-dbn-49p-qqwj4-executor ~]$ 

===============

Expected results:
All pods should have been restarted.
Additional info:

Comment 1 Debarati Basu-Nag 2021-11-16 17:41:33 UTC
Associated logs from the pods: 
==================================
cnv-qe-jenkins@c01-dbnp-490-hl62r-executor ~]$ kubectl logs hostpath-provisioner-gb5b4 -n openshift-cnv
I1115 20:06:22.461624       1 hostpath-provisioner.go:82] initiating kubevirt/hostpath-provisioner on node: c01-dbnp-490-hl62r-worker-0-9sp8h
I1115 20:06:22.461690       1 hostpath-provisioner.go:277] creating provisioner controller with name: kubevirt.io/hostpath-provisioner
I1115 20:06:22.462010       1 controller.go:772] Starting provisioner controller kubevirt.io/hostpath-provisioner_hostpath-provisioner-gb5b4_8a7d3118-350f-418f-8b81-b54e9b7b0c3f!
I1115 20:06:22.562204       1 controller.go:821] Started provisioner controller kubevirt.io/hostpath-provisioner_hostpath-provisioner-gb5b4_8a7d3118-350f-418f-8b81-b54e9b7b0c3f!
I1115 20:09:50.406102       1 hostpath-provisioner.go:95] isCorrectNodeByBindingMode mode: WaitForFirstConsumer
I1115 20:09:50.406131       1 hostpath-provisioner.go:118] missing volume.kubernetes.io/selected-node annotation, skipping operations for pvc
I1115 20:09:50.411439       1 hostpath-provisioner.go:95] isCorrectNodeByBindingMode mode: WaitForFirstConsumer
I1115 20:09:50.411455       1 hostpath-provisioner.go:118] missing volume.kubernetes.io/selected-node annotation, skipping operations for pvc
I1115 20:09:50.420885       1 hostpath-provisioner.go:95] isCorrectNodeByBindingMode mode: WaitForFirstConsumer
I1115 20:09:50.420902       1 hostpath-provisioner.go:118] missing volume.kubernetes.io/selected-node annotation, skipping operations for pvc
[cnv-qe-jenkins@c01-dbnp-490-hl62r-executor ~]$
===================================

Log from HPP operator pod: http://pastebin.test.redhat.com/1008652

Comment 2 Simone Tiraboschi 2021-11-16 21:11:55 UTC
On HPP operator logs we see:

{"level":"error","ts":1637081879.5886474,"logger":"controller_hostpathprovisioner","msg":"Unable to create DaemonSet","Request.Namespace":"","Request.Name":"hostpath-provisioner","error":"DaemonSet.apps "hostpath-provisioner" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{"app.kubernetes.io/component":"storage", "app.kubernetes.io/managed-by":"hostpath-provisioner-operator", "app.kubernetes.io/part-of":"hyperconverged-cluster", "app.kubernetes.io/version":"v4.9.1", "k8s-app":"hostpath-provisioner"}: `selector` does not match template `labels`","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error
	/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:128
kubevirt.io/hostpath-provisioner-operator/pkg/controller/hostpathprovisioner.(*ReconcileHostPathProvisioner).reconcileUpdate
	/remote-source/app/pkg/controller/hostpathprovisioner/controller.go:342
kubevirt.io/hostpath-provisioner-operator/pkg/controller/hostpathprovisioner.(*ReconcileHostPathProvisioner).Reconcile

while on the previous object from CNV 4.9.0:

$ oc get daemonsets -n openshift-cnv hostpath-provisioner -o json | jq ".metadata.labels"
{
  "app.kubernetes.io/component": "storage",
  "app.kubernetes.io/managed-by": "hostpath-provisioner-operator",
  "app.kubernetes.io/part-of": "hyperconverged-cluster",
  "app.kubernetes.io/version": "v4.9.0",
  "k8s-app": "hostpath-provisioner"
}


So the issue is in the operator trying to update hostpath-provisioner daemonset passing a template with a label ("app.kubernetes.io/version") that doesn't match the existing object ("v4.9.0" vs "v4.9.1").

Comment 3 Jenia Peimer 2022-01-05 14:06:37 UTC
Verified for 
CNV v4.9.2-22, 
HCO v4.9.2-6, 
hostpath-provisioner-operator v4.9.2-2


Upgraded CNV v4.8.4 -> 4.9.0 -> 4.9.1 -> 4.9.2-22

$ oc get csv -n openshift-cnv 
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v4.9.2   OpenShift Virtualization   4.9.2     kubevirt-hyperconverged-operator.v4.9.1   Succeeded

$ oc get ip -n openshift-cnv 
NAME            CSV                                       APPROVAL   APPROVED
install-6f2x9   kubevirt-hyperconverged-operator.v4.9.1   Manual     true
install-pcjnw   kubevirt-hyperconverged-operator.v4.9.0   Manual     true
install-ws4fk   kubevirt-hyperconverged-operator.v4.8.4   Manual     true
install-x4h9g   kubevirt-hyperconverged-operator.v4.9.2   Manual     true

hostpath-provisioner pods are restarted:

$ oc get pods -n openshift-cnv | grep hostpath
hostpath-provisioner-8qm4q                             1/1     Running   0          17h
hostpath-provisioner-nskhg                             1/1     Running   0          17h
hostpath-provisioner-operator-85984bf8f4-2qwj5         1/1     Running   0          17h
hostpath-provisioner-wxd2k                             1/1     Running   0          17h



Upgraded CNV v4.9.0 -> 4.9.1 -> 4.9.2-22

[cnv-qe-jenkins@c01-jen49-up2-vkns4-executor ~]$ oc get csv -n openshift-cnv 
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v4.9.2   OpenShift Virtualization   4.9.2     kubevirt-hyperconverged-operator.v4.9.1   Succeeded

[cnv-qe-jenkins@c01-jen49-up2-vkns4-executor ~]$ oc get ip -n openshift-cnv 
NAME            CSV                                       APPROVAL   APPROVED
install-2h54g   kubevirt-hyperconverged-operator.v4.9.0   Manual     true
install-7cmmn   kubevirt-hyperconverged-operator.v4.9.1   Manual     true
install-csmg7   kubevirt-hyperconverged-operator.v4.9.2   Manual     true

[cnv-qe-jenkins@c01-jen49-up2-vkns4-executor ~]$ oc get pods -n openshift-cnv | grep hostpath
hostpath-provisioner-8hxxc                             1/1     Running   0          162m
hostpath-provisioner-knwfv                             1/1     Running   0          162m
hostpath-provisioner-operator-85984bf8f4-c5xnk         1/1     Running   0          163m
hostpath-provisioner-tlwt2                             1/1     Running   0          162m



Upgraded CNV 4.9.1 -> 4.9.2-22

$  oc get csv -n openshift-cnv 
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v4.9.2   OpenShift Virtualization   4.9.2     kubevirt-hyperconverged-operator.v4.9.1   Succeeded

$ oc get ip -n openshift-cnv 
NAME            CSV                                       APPROVAL   APPROVED
install-5g5x8   kubevirt-hyperconverged-operator.v4.9.2   Manual     true
install-75zzm   kubevirt-hyperconverged-operator.v4.9.1   Manual     true

$ oc get pods -n openshift-cnv | grep hostpath
hostpath-provisioner-5xn4c                             1/1     Running   0          10m
hostpath-provisioner-brpd6                             1/1     Running   0          10m
hostpath-provisioner-kvdp6                             1/1     Running   0          10m
hostpath-provisioner-operator-85984bf8f4-qpgh9         1/1     Running   0          10m

Comment 9 errata-xmlrpc 2022-01-19 17:49:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.9.2 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0191