Bug 1942839 - Windows VMs fail to start on air-gapped environments [NEEDINFO]
Summary: Windows VMs fail to start on air-gapped environments
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Console Kubevirt Plugin
Version: 4.6.z
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Matan Schatzman
QA Contact: Guohua Ouyang
URL:
Whiteboard:
: 1944273 (view as bug list)
Depends On:
Blocks: 1969754 1969756
TreeView+ depends on / blocked
 
Reported: 2021-03-25 07:41 UTC by Shon Paz
Modified: 2021-09-15 07:02 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1969754 1969756 (view as bug list)
Environment:
Last Closed: 2021-07-27 22:55:38 UTC
Target Upstream Version:
lstanton: needinfo? (mschatzm)
yzamir: needinfo? (tnisan)
yzamir: needinfo? (mschatzm)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt hyperconverged-cluster-operator pull 1222 0 None closed Expose an environmental value to virtio win image 2021-05-11 11:23:18 UTC
Github openshift console pull 8514 0 None closed Bug 1942839: Use digest for virtio-win container image 2021-05-11 11:23:18 UTC
Github openshift console pull 8549 0 None closed Bug 1942839: Image is now pulled from config map data 2021-05-11 11:23:19 UTC
Github openshift console pull 8719 0 None closed Bug 1942839: Fix async return 2021-05-11 11:23:16 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:56:10 UTC

Description Shon Paz 2021-03-25 07:41:51 UTC
Description of problem:

When creating A windows VM in a disconnected environment, except of having the CDI importer and the operating system disk pulled, there is also a virtio-win containerDisk that is being used. This container image is being referenced with tags 'registry.redhat.io/container-native-virtualization/virtio-win', but it's not part of the mirroring phase of the operator and all the relatedImages. 

In a disconnected environment Openshift uses ImageContentSourcePolicy and by default won't allow pulling any images with tags. This causes the virt-launcher to fail and the VM to be stuck on starting. 

Version-Release number of selected component (if applicable):

OCP4.6.8 
Openshift Virtualization 2.5

How reproducible:

Try to create a Windows VM (that requires the default virtio-win drivers) in a disconnected (air-gapped) environment. 
 
Steps to Reproduce:
1. Create a Windows machine from a boot source 
2. Wait until the VM starts 
3. Verify that is stuck on starting phase and that the virt-launcher is in imagePullBackoff

Actual results:

VM is stuck on starting, virt-launcher fails to pull the virtio-win containerDisk image

Expected results:

Windows VM is successfully created 

Additional info:

This can be fixed by editing the VM manifest and point the containerDisk that is being referenced to the internal registry to pass the imageContentSourcePolicy. 

The solution here is to reference this image with a digest instead of tag (v2.6 for example) and contain this image in the Operator's relatedImages section so it could be successfully mirrored with OPM.

Comment 2 Dan Kenigsberg 2021-03-30 11:05:40 UTC
> This container image is ... not part of the mirroring phase of the operator and all the relatedImages

Should we not fix this in HCO, @stirabos@redhat.com ?

Comment 3 Shon Paz 2021-03-30 11:14:23 UTC
@Dan it could be that this image exits but not at the expected path, for example if the original path is "registry.redhat.io/container-native-virtualization/virtio-win:v2.6.0" I would expect that after the mirror finishes we'll have it on "my_registry/container-native-virtualization/virtio-win:v2.6.0"

Comment 4 Yaacov Zamir 2021-03-30 11:45:28 UTC
Hi, this image is hard coded in the UI
https://github.com/openshift/console/blob/5db89ab1457d6374bc4d80419613eeb93c219231/frontend/packages/kubevirt-plugin/src/constants/vm/constants.ts#L71

AFAIK this is the only hardcoded image we use. it is added to a VM when a VM is labeled as running Windows OS.

What is the best way to make the UI more disconnected environment friendly ?
Does using "@sha256:digest number" will solve the problem ?
If it does solve the issue, where do we get the correct digest number ?

Can HCO make it avaliable inside a disconnected env without the need to hardcode a digest number ?

Comment 5 Shon Paz 2021-03-30 12:02:07 UTC
After having a short brief with @Yaacov: 

I see this issue is divided into two parts: 

* Changing the console to point to the right virtio-win image, as presented here: 

https://github.com/openshift/console/blob/5db89ab1457d6374bc4d80419613eeb93c219231/frontend/packages/kubevirt-plugin/src/constants/vm/constants.ts#L72

To fix this we could change the following two lines: 

export const WINTOOLS_CONTAINER_VERSION = '011060472f068e42e2c0c0b3451a99b5607dd037ba70945004f98b2de74b89a2';
export const WINTOOLS_CONTAINER_NAME_DOWNSTREAM = `registry.redhat.io/container-native-virtualization/virtio-win@sha256:${WINTOOLS_CONTAINER_VERSION}`;

This means that the UI will edit the VM's yaml with the right image, but this also means that this will have to change constantly as the virtio-win image versions are moving forward, as every image has a unique digest. 

* Documenting this image in the relatedImages so that it'll be mirrored by OPM as part of the operator's images, with the proper path: 
  - The original image is should be presented as: "registry.redhat.io/container-native-virtualization/virtio-win@sha256:011060472f068e42e2c0c0b3451a99b5607dd037ba70945004f98b2de74b89a2"
  - Should be mirrored using Skopeo tool to the following path: "my_registry/container-native-virtualization/virtio-win@sha256:011060472f068e42e2c0c0b3451a99b5607dd037ba70945004f98b2de74b89a2"


To verify the image's digest you could the following steps: 

* Step 1: 
skopeo login registry.redhat.io
Login Succeeded!

* Step 2: skopeo inspect docker://registry.redhat.io/container-native-virtualization/virtio-win:v2.6.0 | jq '.Digest'
"sha256:011060472f068e42e2c0c0b3451a99b5607dd037ba70945004f98b2de74b89a2"

* Step 3:
podman pull --authfile /root/ocp4-disconnected/pull-secret.json registry.redhat.io/container-native-virtualization/virtio-win@sha256:011060472f068e42e2c0c0b3451a99b5607dd037ba70945004f98b2de74b89a2

Comment 6 Yaacov Zamir 2021-03-30 12:26:34 UTC
Thanks,

Based on comment#5 we will fix the UI by changing from using v2.6.0 to digest,

Fabian, is this the correct fix here ?

Comment 7 Fabian Deutsch 2021-03-30 13:16:07 UTC
Yaacov, the digest will change with every update of CNV

@stirabos@redhat.com can the UI somehow read the relevant digest from some object?

Comment 9 Yaacov Zamir 2021-03-31 05:01:42 UTC
Update:
After talking to @stirabos@redhat.com the fix will be:

a - HCO will pubplish the virtio-win container image hash in a config-map UI can easily consume it.
b - UI will consume the virtio-win container image hash published by HCO and fallback to ":latest" on edge cases kubevirt is installed without HCO and the hash is un available.

How to verify fix:
a - set the image+digest in the config map.
b - create a windows VM
c - check that the virtio-win image added by the UI is the one defined in the config map

Comment 10 Ying Cui 2021-03-31 07:56:53 UTC
Current PR still need some adjustments. Moving back to assigned.

Comment 11 Ying Cui 2021-03-31 08:01:11 UTC
*** Bug 1944273 has been marked as a duplicate of this bug. ***

Comment 12 Simone Tiraboschi 2021-03-31 10:28:54 UTC
(In reply to Yaacov Zamir from comment #9)
> Update:
> After talking to @stirabos@redhat.com the fix will be:
> 
> a - HCO will pubplish the virtio-win container image hash in a config-map UI
> can easily consume it.
> b - UI will consume the virtio-win container image hash published by HCO and
> fallback to ":latest" on edge cases kubevirt is installed without HCO and
> the hash is un available.

The upstream PR is ready for review:
https://github.com/kubevirt/hyperconverged-cluster-operator/pull/1222

HCO will add (and reconcile on upgrades) a new value for "virtio-win-image" key on the config map already consumed by the UI component.

Comment 14 Guohua Ouyang 2021-04-20 01:18:08 UTC
I tried below steps to verify the bug but not success, not sure whether I did it right:

1. set 'virtio-win-image' in configMap v2v-vmware
$ oc get cm -n openshift-cnv v2v-vmware -o yaml                                                       
apiVersion: v1
data:
  kubevirt-vmware-image: registry.redhat.io/container-native-virtualization/kubevirt-vmware@sha256:657078b4bd260e86e1c44e3cbbf809ea5daf448b91e1425b4858da540e810921
  kubevirt-vmware-image-pull-policy: IfNotPresent
  v2v-conversion-image: registry.redhat.io/container-native-virtualization/kubevirt-v2v-conversion@sha256:467d3240b68bbccf525bd64771af210f2f3b54de22518d694fe45da33d2f01b9
  virtio-win-image: quay.io/kubevirt/virtio-container-disk@sha256:373fae39339fb0b77a101c825b66b5ceadd79397119e46d0c540df26f74fc06a
kind: ConfigMap

2. create a windows VM, check the virtio-container-disk image in virt-launcher pod, but the image is always pulling from docker.io.
$ oc get pod virt-launcher-win10-grotesque-swordfish-wp65j -o yaml | grep virtio-container-disk        
    image: kubevirt/virtio-container-disk
    image: kubevirt/virtio-container-disk
    image: docker.io/kubevirt/virtio-container-disk:latest
    imageID: docker.io/kubevirt/virtio-container-disk@sha256:373fae39339fb0b77a101c825b66b5ceadd79397119e46d0c540df26f74fc06a
    image: docker.io/kubevirt/virtio-container-disk:latest
    imageID: docker.io/kubevirt/virtio-container-disk@sha256:373fae39339fb0b77a101c825b66b5ceadd79397119e46d0c540df26f74fc06a

Comment 16 Guohua Ouyang 2021-04-22 08:51:59 UTC
This bug cannot be verified with okd, will verify it with a latest d/s build.

Comment 17 Guohua Ouyang 2021-05-12 08:21:49 UTC
Block by 1958811

Comment 21 Yaacov Zamir 2021-06-09 07:12:21 UTC
hi all,

We want to backport this to 4.7 see comment#20

Our UI fix depend on the fix in hyperconverged-cluster-operator.

Simone hi,
Can you backport https://github.com/kubevirt/hyperconverged-cluster-operator/pull/1222 to 4.7 ?
We will backport the UI fix after the HCO fix is backported.

Matan hi,
Can you follow up with Simon to make sure we do the UI fix ASAP after the HCO fix is backported.

cc:// Tal

Comment 22 Guohua Ouyang 2021-06-09 08:15:21 UTC
Cloned a bug to OCP 4.7.z: https://bugzilla.redhat.com/show_bug.cgi?id=1969754
Cloned a bug to CNV 2.6.4: https://bugzilla.redhat.com/show_bug.cgi?id=1969756

Comment 24 Guohua Ouyang 2021-06-21 06:30:21 UTC
verified on 4.8.0-fc.9.

Comment 27 errata-xmlrpc 2021-07-27 22:55:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 28 Simone Tiraboschi 2021-08-12 15:48:29 UTC
(In reply to Yaacov Zamir from comment #21)
> Simone hi,
> Can you backport
> https://github.com/kubevirt/hyperconverged-cluster-operator/pull/1222 to 4.7
> ?

done as for https://bugzilla.redhat.com/show_bug.cgi?id=1969756

Comment 29 Dan Kenigsberg 2021-09-14 13:55:01 UTC
@yzamir@redhat.com was this backported to kubevirt plugin in 4.7.z?

Comment 30 Yaacov Zamir 2021-09-15 07:02:21 UTC
We have a 4.7 bug to treak it, https://bugzilla.redhat.com/show_bug.cgi?id=1969754 - not yet


Note You need to log in before you can comment on or make changes to this bug.