Description of problem: During 2.1 (production) to 2.2 upgrade 2 temporary kubevirt pods are created. One of them is pulling 2.1.0-17 image, and reports ErrImagePull error. The upgrade is never completed Version-Release number of selected component (if applicable): libvirt 4.3.0-0.nightly-2019-12-12-021332 CNV 2.1 in production CNV 2.2 build with hco 2.2.0-9 and virt 2.2.0 - 10 How reproducible: always Steps to Reproduce: steps will be added within next comment Actual results: see logs attached Expected results: succeeded upgrade
"kubevirt pod" in this context is somewhat ambiguous. Could you please clarify the name of the pod in question?
a temporary pod started by virt-operator for gathering information about what to install for the desired KubeVirt version, in this particular case it's named as: >> kubevirt-5dd54d2a01b19df6e3fd87db25884add904d08df-jobg9m46xtmzn
> a temporary pod started by virt-operator for gathering information about what to install for the desired KubeVirt version And for the old, already installed version, when updating. The registry for the operator image of the old version is coming from "observedKubeVirtRegistry" in the KubeVirt CR status, which was set during deployment of the old version. Can you check what is set there, if it is correct, and if the image is still available?
Created attachment 1645215 [details] kubevirt cr yml See CR attached Mind that all other images has been downloaded and corresponding pods being replaced: >> oc status -n openshift-cnv | awk -F "/" '/rh-osbs/ {print $NF}' | sort -u container-native-virtualization-bridge-marker:v2.2.0-2 container-native-virtualization-cluster-network-addons-operator:v2.2.0-6 container-native-virtualization-cnv-containernetworking-plugins:v2.2.0-2 container-native-virtualization-hostpath-provisioner-rhel8-operator:v2.2.0-7 container-native-virtualization-hyperconverged-cluster-operator:v2.2.0-9 container-native-virtualization-kubemacpool:v2.2.0-3 container-native-virtualization-kubernetes-nmstate-handler-rhel8:v2.2.0-12 container-native-virtualization-kubevirt-cpu-node-labeller:v2.2.0-2 container-native-virtualization-kubevirt-ssp-operator:v2.2.0-14 container-native-virtualization-kubevirt-template-validator:v2.2.0-4 container-native-virtualization-node-maintenance-operator:v2.2.0-2 container-native-virtualization-ovs-cni-marker:v2.2.0-3 container-native-virtualization-virt-cdi-apiserver:v2.2.0-3 container-native-virtualization-virt-cdi-controller:v2.2.0-3 container-native-virtualization-virt-cdi-operator:v2.2.0-3 container-native-virtualization-virt-cdi-uploadproxy:v2.2.0-3 container-native-virtualization-virt-operator:v2.2.0-10 It's just that kubevirt temp pod is hanging.
Ops, I see now CSV is attached twice instead of reproducing steps. Here they are: 1. In UI install CNV 2.1 from production as our 2.1 install doc say. 2. In shell fetch 2.2 content: export CONTENT_ONLY=true CNV_VERSION=2.2.0 && curl -k <marketplace qe script from 2.2 branch> | <quay credentials> bash -x It should report Content Successfully Created 3. Edit subscription in UI or CLI spec: channel: "2.2" installPlanApproval: Automatic name: kubevirt-hyperconverged source: hco-catalogsource-config 4. and set approved to true for InstallPlan CNV will start the upgrade then and hang on kubevirt temp pod.
Setting needinfo on irina per Comment #9
PR for a fix is posted here: https://github.com/kubevirt/kubevirt/pull/2945 Keep in mind, this PR only provides the tooling for a fix. You will still need to apply a workaround: The affected version of KubeVirt predates when we stored imagePrefix in the deployment config. Since this PR references that, you will need to manually add this value to your existing KubeVirt CR. You will need to change: observedDeploymentConfig: '{"id":"bddfa980b7ed8ea0ccb89d1bde2a1009df6947ba","namespace":"openshift-cnv","registry":"registry.redhat.io/container-native-virtualization","kubeVirtVersion":"v2.1.0-17","additionalProperties":{"ImagePullPolicy":"","MonitorAccount":"","MonitorNamespace":""}}' to include a new key-value pair: "imagePrefix": "" thus, your new observedDeploymentConfig could look something like this: '{"id":"bddfa980b7ed8ea0ccb89d1bde2a1009df6947ba","namespace":"openshift-cnv","registry":"registry.redhat.io/container-native-virtualization","kubeVirtVersion":"v2.1.0-17","imagePrefix":"","additionalProperties":{"ImagePullPolicy":"","MonitorAccount":"","MonitorNamespace":""}}' Keep in mind that "id" has meaning, so you probably shouldn't blindly copy/paste this blob.
verified with build: Client Version: 4.3.0-0.nightly-2020-01-16-031402 Server Version: 4.3.0-0.nightly-2020-01-16-031402 Kubernetes Version: v1.16.2 step: same with comment 11 start upgrade, check status $ oc get csv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v2.1.0 Container-native virtualization Operator 2.1.0 Replacing kubevirt-hyperconverged-operator.v2.2.0 Container-native virtualization 2.2.0 kubevirt-hyperconverged-operator.v2.1.0 Installing check pod, no kubevirt temp pod hang on. upgrade finished without error. $ oc get csv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v2.2.0 Container-native virtualization 2.2.0 kubevirt-hyperconverged-operator.v2.1.0 Succeeded $ oc describe deployment virt-api Labels: app.kubernetes.io/managed-by=kubevirt-operator kubevirt.io=virt-api Annotations: deployment.kubernetes.io/revision: 2 kubevirt.io/install-strategy-identifier: a61bc7e6341aa0d7660bb8bad3ac17a4d0e57fbb kubevirt.io/install-strategy-registry: registry-proxy.engineering.redhat.com/rh-osbs kubevirt.io/install-strategy-version: v2.2.0-13 move to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0307