1782979 – virt-operator incorrectly assumes previous version will have same pod name

Bug 1782979 - virt-operator incorrectly assumes previous version will have same pod name

Summary: virt-operator incorrectly assumes previous version will have same pod name

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	2.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	2.2.0
Assignee:	sgott
QA Contact:	zhe peng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-12-12 18:23 UTC by Irina Gulina
Modified:	2020-01-30 16:27 UTC (History)
CC List:	10 users (show)
Fixed In Version:	hco-bundle-registry-container-v2.2.0-225
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-01-30 16:27:36 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
kubevirt cr yml (2.90 KB, text/plain) 2019-12-14 20:22 UTC, Irina Gulina	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	kubevirt kubevirt pull 2945	0	None	closed	virt-operator should honor image prefix from previous version	2020-12-21 05:20:20 UTC
Red Hat Product Errata	RHEA-2020:0307	0	None	None	None	2020-01-30 16:27:52 UTC

Description Irina Gulina 2019-12-12 18:23:55 UTC

Description of problem:
During 2.1 (production) to 2.2 upgrade 2 temporary kubevirt pods are created. One of them is pulling 2.1.0-17 image, and reports ErrImagePull error. The upgrade is never completed

Version-Release number of selected component (if applicable):
libvirt 4.3.0-0.nightly-2019-12-12-021332
CNV 2.1 in production
CNV 2.2 build with hco 2.2.0-9 and virt 2.2.0 - 10


How reproducible:
always


Steps to Reproduce:
steps will be added within next comment

Actual results:
see logs attached

Expected results:
succeeded upgrade

Comment 1 sgott 2019-12-12 18:33:32 UTC

"kubevirt pod" in this context is somewhat ambiguous. Could you please clarify the name of the pod in question?

Comment 5 Irina Gulina 2019-12-12 18:49:37 UTC

a temporary pod started by virt-operator for gathering information about what to install for the desired KubeVirt version, in this particular case it's named as: 
>> kubevirt-5dd54d2a01b19df6e3fd87db25884add904d08df-jobg9m46xtmzn

Comment 9 Marc Sluiter 2019-12-13 08:58:37 UTC

> a temporary pod started by virt-operator for gathering information about what to install for the desired KubeVirt version

And for the old, already installed version, when updating.
The registry for the operator image of the old version is coming from "observedKubeVirtRegistry" in the KubeVirt CR status, which was set during deployment of the old version.
Can you check what is set there, if it is correct, and if the image is still available?

Comment 10 Irina Gulina 2019-12-14 20:22:40 UTC

Created attachment 1645215 [details]
kubevirt cr yml

See CR attached

Mind that all other images has been downloaded and corresponding pods being replaced: 
>> oc status -n openshift-cnv | awk -F "/" '/rh-osbs/ {print $NF}' | sort -u 
container-native-virtualization-bridge-marker:v2.2.0-2
container-native-virtualization-cluster-network-addons-operator:v2.2.0-6
container-native-virtualization-cnv-containernetworking-plugins:v2.2.0-2
container-native-virtualization-hostpath-provisioner-rhel8-operator:v2.2.0-7
container-native-virtualization-hyperconverged-cluster-operator:v2.2.0-9
container-native-virtualization-kubemacpool:v2.2.0-3
container-native-virtualization-kubernetes-nmstate-handler-rhel8:v2.2.0-12
container-native-virtualization-kubevirt-cpu-node-labeller:v2.2.0-2
container-native-virtualization-kubevirt-ssp-operator:v2.2.0-14
container-native-virtualization-kubevirt-template-validator:v2.2.0-4
container-native-virtualization-node-maintenance-operator:v2.2.0-2
container-native-virtualization-ovs-cni-marker:v2.2.0-3
container-native-virtualization-virt-cdi-apiserver:v2.2.0-3
container-native-virtualization-virt-cdi-controller:v2.2.0-3
container-native-virtualization-virt-cdi-operator:v2.2.0-3
container-native-virtualization-virt-cdi-uploadproxy:v2.2.0-3
container-native-virtualization-virt-operator:v2.2.0-10


It's just that kubevirt temp pod is hanging.

Comment 11 Irina Gulina 2019-12-14 20:44:14 UTC

Ops, I see now CSV is attached twice instead of reproducing steps. Here they are:


1. In UI install CNV 2.1 from production as our 2.1 install doc say. 
2. In shell fetch 2.2 content: 
export CONTENT_ONLY=true CNV_VERSION=2.2.0 && curl -k <marketplace qe script from 2.2 branch> | <quay credentials> bash -x
It should report Content Successfully Created
3. Edit subscription in UI or CLI
spec:
  channel: "2.2"
  installPlanApproval: Automatic
  name: kubevirt-hyperconverged
  source: hco-catalogsource-config

4. and set approved to true for InstallPlan

CNV will start the upgrade then and hang on kubevirt temp pod.

Comment 12 sgott 2019-12-16 16:47:49 UTC

Setting needinfo on irina per Comment #9

Comment 17 sgott 2019-12-20 21:46:13 UTC

PR for a fix is posted here: https://github.com/kubevirt/kubevirt/pull/2945

Keep in mind, this PR only provides the tooling for a fix. You will still need to apply a workaround:

The affected version of KubeVirt predates when we stored imagePrefix in the deployment config. Since this PR references that, you will need to manually add this value to your existing KubeVirt CR. You will need to change:

observedDeploymentConfig: '{"id":"bddfa980b7ed8ea0ccb89d1bde2a1009df6947ba","namespace":"openshift-cnv","registry":"registry.redhat.io/container-native-virtualization","kubeVirtVersion":"v2.1.0-17","additionalProperties":{"ImagePullPolicy":"","MonitorAccount":"","MonitorNamespace":""}}'

to include a new key-value pair:

"imagePrefix": ""

thus, your new observedDeploymentConfig could look something like this:

'{"id":"bddfa980b7ed8ea0ccb89d1bde2a1009df6947ba","namespace":"openshift-cnv","registry":"registry.redhat.io/container-native-virtualization","kubeVirtVersion":"v2.1.0-17","imagePrefix":"","additionalProperties":{"ImagePullPolicy":"","MonitorAccount":"","MonitorNamespace":""}}'

Keep in mind that "id" has meaning, so you probably shouldn't blindly copy/paste this blob.

Comment 21 zhe peng 2020-01-16 11:16:34 UTC

verified with build:
Client Version: 4.3.0-0.nightly-2020-01-16-031402
Server Version: 4.3.0-0.nightly-2020-01-16-031402
Kubernetes Version: v1.16.2

step:
same with comment 11
start upgrade, check status
$ oc get csv
NAME                                      DISPLAY                                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v2.1.0   Container-native virtualization Operator   2.1.0                                               Replacing
kubevirt-hyperconverged-operator.v2.2.0   Container-native virtualization            2.2.0     kubevirt-hyperconverged-operator.v2.1.0   Installing

check pod, no kubevirt temp pod hang on.
upgrade finished without error.
$ oc get csv
NAME                                      DISPLAY                           VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v2.2.0   Container-native virtualization   2.2.0     kubevirt-hyperconverged-operator.v2.1.0   Succeeded

$ oc describe deployment virt-api
Labels:                 app.kubernetes.io/managed-by=kubevirt-operator
                        kubevirt.io=virt-api
Annotations:            deployment.kubernetes.io/revision: 2
                        kubevirt.io/install-strategy-identifier: a61bc7e6341aa0d7660bb8bad3ac17a4d0e57fbb
                        kubevirt.io/install-strategy-registry: registry-proxy.engineering.redhat.com/rh-osbs
                        kubevirt.io/install-strategy-version: v2.2.0-13

move to verified

Comment 23 errata-xmlrpc 2020-01-30 16:27:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0307

Note You need to log in before you can comment on or make changes to this bug.