Bug 1969756 - Windows VMs fail to start on air-gapped environments
Summary: Windows VMs fail to start on air-gapped environments
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Installation
Version: 2.5.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 2.6.6
Assignee: Simone Tiraboschi
QA Contact: Guohua Ouyang
URL:
Whiteboard:
Depends On: 1942839 1969754
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-09 08:13 UTC by Guohua Ouyang
Modified: 2021-08-10 17:34 UTC (History)
15 users (show)

Fixed In Version: hco-bundle-registry-container-v2.6.6-30
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1942839
Environment:
Last Closed: 2021-08-10 17:33:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt hyperconverged-cluster-operator pull 1385 0 None closed Expose an enviromental value to virtio win image (#1222) 2021-06-12 08:07:44 UTC
Red Hat Product Errata RHSA-2021:3119 0 None None None 2021-08-10 17:34:27 UTC

Description Guohua Ouyang 2021-06-09 08:13:47 UTC
Clone the bug to 2.6.4 as it's requested for OCP 4.7.z

+++ This bug was initially created as a clone of Bug #1942839 +++

Description of problem:

When creating A windows VM in a disconnected environment, except of having the CDI importer and the operating system disk pulled, there is also a virtio-win containerDisk that is being used. This container image is being referenced with tags 'registry.redhat.io/container-native-virtualization/virtio-win', but it's not part of the mirroring phase of the operator and all the relatedImages. 

In a disconnected environment Openshift uses ImageContentSourcePolicy and by default won't allow pulling any images with tags. This causes the virt-launcher to fail and the VM to be stuck on starting. 

Version-Release number of selected component (if applicable):

OCP4.6.8 
Openshift Virtualization 2.5

How reproducible:

Try to create a Windows VM (that requires the default virtio-win drivers) in a disconnected (air-gapped) environment. 
 
Steps to Reproduce:
1. Create a Windows machine from a boot source 
2. Wait until the VM starts 
3. Verify that is stuck on starting phase and that the virt-launcher is in imagePullBackoff

Actual results:

VM is stuck on starting, virt-launcher fails to pull the virtio-win containerDisk image

Expected results:

Windows VM is successfully created 

Additional info:

This can be fixed by editing the VM manifest and point the containerDisk that is being referenced to the internal registry to pass the imageContentSourcePolicy. 

The solution here is to reference this image with a digest instead of tag (v2.6 for example) and contain this image in the Operator's relatedImages section so it could be successfully mirrored with OPM.

--- Additional comment from Fabian Deutsch on 2021-03-30 17:53:54 CST ---

I took the libertiy to increase prio to high as this is related to a customer case.

@yzamir can this be targeted to one of the upcoming releases?

--- Additional comment from Dan Kenigsberg on 2021-03-30 19:05:40 CST ---

> This container image is ... not part of the mirroring phase of the operator and all the relatedImages

Should we not fix this in HCO, @stirabos ?

--- Additional comment from Shon Paz on 2021-03-30 19:14:23 CST ---

@Dan it could be that this image exits but not at the expected path, for example if the original path is "registry.redhat.io/container-native-virtualization/virtio-win:v2.6.0" I would expect that after the mirror finishes we'll have it on "my_registry/container-native-virtualization/virtio-win:v2.6.0"

--- Additional comment from Yaacov Zamir on 2021-03-30 19:45:28 CST ---

Hi, this image is hard coded in the UI
https://github.com/openshift/console/blob/5db89ab1457d6374bc4d80419613eeb93c219231/frontend/packages/kubevirt-plugin/src/constants/vm/constants.ts#L71

AFAIK this is the only hardcoded image we use. it is added to a VM when a VM is labeled as running Windows OS.

What is the best way to make the UI more disconnected environment friendly ?
Does using "@sha256:digest number" will solve the problem ?
If it does solve the issue, where do we get the correct digest number ?

Can HCO make it avaliable inside a disconnected env without the need to hardcode a digest number ?

--- Additional comment from Shon Paz on 2021-03-30 20:02:07 CST ---

After having a short brief with @Yaacov: 

I see this issue is divided into two parts: 

* Changing the console to point to the right virtio-win image, as presented here: 

https://github.com/openshift/console/blob/5db89ab1457d6374bc4d80419613eeb93c219231/frontend/packages/kubevirt-plugin/src/constants/vm/constants.ts#L72

To fix this we could change the following two lines: 

export const WINTOOLS_CONTAINER_VERSION = '011060472f068e42e2c0c0b3451a99b5607dd037ba70945004f98b2de74b89a2';
export const WINTOOLS_CONTAINER_NAME_DOWNSTREAM = `registry.redhat.io/container-native-virtualization/virtio-win@sha256:${WINTOOLS_CONTAINER_VERSION}`;

This means that the UI will edit the VM's yaml with the right image, but this also means that this will have to change constantly as the virtio-win image versions are moving forward, as every image has a unique digest. 

* Documenting this image in the relatedImages so that it'll be mirrored by OPM as part of the operator's images, with the proper path: 
  - The original image is should be presented as: "registry.redhat.io/container-native-virtualization/virtio-win@sha256:011060472f068e42e2c0c0b3451a99b5607dd037ba70945004f98b2de74b89a2"
  - Should be mirrored using Skopeo tool to the following path: "my_registry/container-native-virtualization/virtio-win@sha256:011060472f068e42e2c0c0b3451a99b5607dd037ba70945004f98b2de74b89a2"


To verify the image's digest you could the following steps: 

* Step 1: 
skopeo login registry.redhat.io
Login Succeeded!

* Step 2: skopeo inspect docker://registry.redhat.io/container-native-virtualization/virtio-win:v2.6.0 | jq '.Digest'
"sha256:011060472f068e42e2c0c0b3451a99b5607dd037ba70945004f98b2de74b89a2"

* Step 3:
podman pull --authfile /root/ocp4-disconnected/pull-secret.json registry.redhat.io/container-native-virtualization/virtio-win@sha256:011060472f068e42e2c0c0b3451a99b5607dd037ba70945004f98b2de74b89a2

--- Additional comment from Yaacov Zamir on 2021-03-30 20:26:34 CST ---

Thanks,

Based on comment#5 we will fix the UI by changing from using v2.6.0 to digest,

Fabian, is this the correct fix here ?

--- Additional comment from Fabian Deutsch on 2021-03-30 21:16:07 CST ---

Yaacov, the digest will change with every update of CNV

@stirabos can the UI somehow read the relevant digest from some object?

--- Additional comment from Simone Tiraboschi on 2021-03-30 21:58:20 CST ---

(In reply to Dan Kenigsberg from comment #2)
> > This container image is ... not part of the mirroring phase of the operator and all the relatedImages
> 
> Should we not fix this in HCO, @stirabos ?

That container is already in the relatedImages list for:

- 2.4
http://pkgs.devel.redhat.com/cgit/containers/hco-bundle-registry/tree/kubevirt-hyperconverged/2.4.7/kubevirt-hyperconverged-operator.clusterserviceversion.yaml?h=cnv-2.4-rhel-8#n2012

- 2.5
http://pkgs.devel.redhat.com/cgit/containers/hco-bundle-registry/tree/kubevirt-hyperconverged/2.5.6/kubevirt-hyperconverged-operator.clusterserviceversion.yaml?h=cnv-2.5-rhel-8#n2268

- 2.6
http://pkgs.devel.redhat.com/cgit/containers/hco-bundle-registry/tree/kubevirt-hyperconverged/2.6.1/kubevirt-hyperconverged-operator.clusterserviceversion.yaml?h=cnv-2.6-rhel-8&id=d6f4d7197d26a44bd112b32b4ef27efb0f834fbf#n2413

and so on.

so it should be already correctly mirrored on disconnected environments since at least 3 minor releases.

This looks to me just an issue with the UI that should read the right value from a config map or something like that instead of using a floating tag.

--- Additional comment from Yaacov Zamir on 2021-03-31 13:01:42 CST ---

Update:
After talking to @stirabos the fix will be:

a - HCO will pubplish the virtio-win container image hash in a config-map UI can easily consume it.
b - UI will consume the virtio-win container image hash published by HCO and fallback to ":latest" on edge cases kubevirt is installed without HCO and the hash is un available.

How to verify fix:
a - set the image+digest in the config map.
b - create a windows VM
c - check that the virtio-win image added by the UI is the one defined in the config map

--- Additional comment from Ying Cui on 2021-03-31 15:56:53 CST ---

Current PR still need some adjustments. Moving back to assigned.

--- Additional comment from Ying Cui on 2021-03-31 16:01:11 CST ---



--- Additional comment from Simone Tiraboschi on 2021-03-31 18:28:54 CST ---

(In reply to Yaacov Zamir from comment #9)
> Update:
> After talking to @stirabos the fix will be:
> 
> a - HCO will pubplish the virtio-win container image hash in a config-map UI
> can easily consume it.
> b - UI will consume the virtio-win container image hash published by HCO and
> fallback to ":latest" on edge cases kubevirt is installed without HCO and
> the hash is un available.

The upstream PR is ready for review:
https://github.com/kubevirt/hyperconverged-cluster-operator/pull/1222

HCO will add (and reconcile on upgrades) a new value for "virtio-win-image" key on the config map already consumed by the UI component.

--- Additional comment from OpenShift Automated Release Tooling on 2021-04-13 06:27:18 CST ---

Elliott changed bug status from MODIFIED to ON_QA.

--- Additional comment from Guohua Ouyang on 2021-04-20 09:18:08 CST ---

I tried below steps to verify the bug but not success, not sure whether I did it right:

1. set 'virtio-win-image' in configMap v2v-vmware
$ oc get cm -n openshift-cnv v2v-vmware -o yaml                                                       
apiVersion: v1
data:
  kubevirt-vmware-image: registry.redhat.io/container-native-virtualization/kubevirt-vmware@sha256:657078b4bd260e86e1c44e3cbbf809ea5daf448b91e1425b4858da540e810921
  kubevirt-vmware-image-pull-policy: IfNotPresent
  v2v-conversion-image: registry.redhat.io/container-native-virtualization/kubevirt-v2v-conversion@sha256:467d3240b68bbccf525bd64771af210f2f3b54de22518d694fe45da33d2f01b9
  virtio-win-image: quay.io/kubevirt/virtio-container-disk@sha256:373fae39339fb0b77a101c825b66b5ceadd79397119e46d0c540df26f74fc06a
kind: ConfigMap

2. create a windows VM, check the virtio-container-disk image in virt-launcher pod, but the image is always pulling from docker.io.
$ oc get pod virt-launcher-win10-grotesque-swordfish-wp65j -o yaml | grep virtio-container-disk        
    image: kubevirt/virtio-container-disk
    image: kubevirt/virtio-container-disk
    image: docker.io/kubevirt/virtio-container-disk:latest
    imageID: docker.io/kubevirt/virtio-container-disk@sha256:373fae39339fb0b77a101c825b66b5ceadd79397119e46d0c540df26f74fc06a
    image: docker.io/kubevirt/virtio-container-disk:latest
    imageID: docker.io/kubevirt/virtio-container-disk@sha256:373fae39339fb0b77a101c825b66b5ceadd79397119e46d0c540df26f74fc06a

--- Additional comment from OpenShift Automated Release Tooling on 2021-04-21 16:40:38 CST ---

Elliott changed bug status from MODIFIED to ON_QA.

--- Additional comment from Guohua Ouyang on 2021-04-22 16:51:59 CST ---

This bug cannot be verified with okd, will verify it with a latest d/s build.

--- Additional comment from Guohua Ouyang on 2021-05-12 16:21:49 CST ---

Block by 1958811

--- Additional comment from Luke Stanton on 2021-06-09 04:14:03 CST ---

I apologize if I've overlooked something, but I was wondering if there is a known workaround for this issue. IBM/Boeing case 02948081 is running into this and not able to move forward in their environment.

--- Additional comment from Yaacov Zamir on 2021-06-09 13:14:31 CST ---

> wondering if there is a known workaround for this issue.

a - this fix is for 4.8, do we need this backported to -> 4.7 -> 4.6 ?

b - a fix will be to add a custom disk instead of using the provided registry.redhat.io/container-native-virtualization/virtio-win

    b.1 - get the registry.redhat.io/container-native-virtualization/virtio-win image and copy it inside the air gaped environment
    b.2 - when creating a vm using a create vm wizard go to storage tab
    b.3 - edit the guest agent disk, and change the disk image to the one you copied into the cluster

c - you can create a template that has the disk image you imported in step b already set up as disk in the template.

--- Additional comment from Kobig on 2021-06-09 15:03:34 CST ---

Hi @Yaacov Zamir,

I think we should backport it to at least 4.7, because we have a lot of customers that wont upgrade/install 4.8 in day one,  
so this can effect their installation process and brake the idea of OOTB experience and the docs that we are sending to customers.

--- Additional comment from Yaacov Zamir on 2021-06-09 15:12:21 CST ---

hi all,

We want to backport this to 4.7 see comment#20

Our UI fix depend on the fix in hyperconverged-cluster-operator.

Simone hi,
Can you backport https://github.com/kubevirt/hyperconverged-cluster-operator/pull/1222 to 4.7 ?
We will backport the UI fix after the HCO fix is backported.

Matan hi,
Can you follow up with Simon to make sure we do the UI fix ASAP after the HCO fix is backported.

cc:// Tal

Comment 1 Inbar Rose 2021-07-19 09:08:17 UTC
You verified this in 4.8 can you verify this in 2.6.6 now that it has been fixed?
thanks

Comment 2 Guohua Ouyang 2021-07-19 10:09:15 UTC
(In reply to Inbar Rose from comment #1)
> You verified this in 4.8 can you verify this in 2.6.6 now that it has been
> fixed?
> thanks

Sure, I will verify it once I get a CNV 2.6.6 cluster.

Comment 3 Guohua Ouyang 2021-07-20 07:32:20 UTC
Verified on 2.6.6 cluster, virtio win image is in the configMap.
$ oc get csv -n openshift-cnv                                                                         
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v2.6.6   OpenShift Virtualization   2.6.6     kubevirt-hyperconverged-operator.v2.6.5   Succeeded

$ oc get cm v2v-vmware -o yaml -n openshift-cnv                                                      
apiVersion: v1
data:
  kubevirt-vmware-image: registry.redhat.io/container-native-virtualization/kubevirt-vmware@sha256:274378367d00f5e2753961511bb98a245f83d24002d5fbffb9904a928ac23603
  kubevirt-vmware-image-pull-policy: IfNotPresent
  v2v-conversion-image: registry.redhat.io/container-native-virtualization/kubevirt-v2v-conversion@sha256:7f33fcb590c4b6464f6a7a05945dc9d3b742da60238763b1605194009166531c
  virtio-win-image: registry.redhat.io/container-native-virtualization/virtio-win@sha256:65349602afbdec3eb4310d061082ccfdf687f1333da47d11e545f1ac3607b00f
kind: ConfigMap

Comment 8 errata-xmlrpc 2021-08-10 17:33:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.6 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3119


Note You need to log in before you can comment on or make changes to this bug.