Bug 2010742 - [CNV-4.9] VMI is in LiveMigrate loop when Upgrading Cluster from 2.6.7/4.7.32 to OCP 4.8.13
Summary: [CNV-4.9] VMI is in LiveMigrate loop when Upgrading Cluster from 2.6.7/4.7.32...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.9.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.9.0
Assignee: Jed Lejosne
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On: 2008511 2013494
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-05 12:39 UTC by Kedar Bidarkar
Modified: 2021-11-02 16:01 UTC (History)
10 users (show)

Fixed In Version: virt-operator-container-v4.9.0-58 hco-bundle-registry-container-v4.9.0-246
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2008511
Environment:
Last Closed: 2021-11-02 16:01:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 6517 0 None open [release-0.41] migration: generate empty isos on target for cloud-inits, configmaps, secrets, ... 2021-10-06 15:04:23 UTC
Red Hat Product Errata RHSA-2021:4104 0 None None None 2021-11-02 16:01:30 UTC

Comment 2 Sarah Bennert 2021-10-05 16:43:02 UTC
vm-c3 is a containerdisk VM

image: kubevirt/cirros-container-disk-demo:latest

Comment 3 Fabian Deutsch 2021-10-06 07:28:30 UTC
I think the problem could be obvious here ":latest" images will have the problem, that they are likely to be differetn when pulled on the source, and some time later, pulled on the destination.

The question is how to deal with this - should we prevent live migration if :latest images re used? Or expand :latest into a shasum? ... Probably the first (prevention of live migration). If a user needs live migration with container disk, then the user can easily use a non-latest tga, or even a shasum Question is, if kubevirt should enforce shasums to be really certain.

Comment 4 Roman Mohr 2021-10-06 07:40:45 UTC
Fabian that is a good observation and we should fix that. One option would be to not allow migrations like you said, but we can actualy get the shasum pretty easy, so I would directly go to falling back to the shasum for the migration target pod. Here the info from the pod status once it is started:

```
  - containerID: cri-o://51feeec858a19f95172fe1c8767dfae72a63c1b45b9b542f90b3fd7b2cc16458
    image: registry:5000/kubevirt/cirros-container-disk-demo:devel
    imageID: registry:5000/kubevirt/cirros-container-disk-demo@sha256:90e064fca2f47eabce210d218a45ba48cc7105b027d3f39761f242506cad15d6
    lastState: {}
    name: volumecontainerdisk
```

You can see what the user requested: "registry:5000/kubevirt/cirros-container-disk-demo:devel" and also what exact shasum we got: "registry:5000/kubevirt/cirros-container-disk-demo@sha256:90e064fca2f47eabce210d218a45ba48cc7105b027d3f39761f242506cad15d6".

Comment 5 Roman Mohr 2021-10-06 07:54:24 UTC
Jed, since this is about container disks, I will take it and fix it in parallel with your other fixes.

Comment 6 Israel Pinto 2021-10-06 08:12:16 UTC
Roman,

I think we have 2 issues here:
1. The container disk VM: since the image is latest and can be changed like Fabian mention it should be a different bug
2. The problem describe in this bug is with both containerDisk and DV based VMI's fail to migrate since we generate empty isos on target for cloud-inits, configmaps, secrets, ... for older KubeVirt versions, The VMs were created with CNV 2.5.

Comment 7 Roman Mohr 2021-10-06 08:21:47 UTC
(In reply to Israel Pinto from comment #6)
> Roman,
> 
> I think we have 2 issues here:

Definitely.

> 1. The container disk VM: since the image is latest and can be changed like
> Fabian mention it should be a different bug
> 2. The problem describe in this bug is with both containerDisk and DV based
> VMI's fail to migrate since we generate empty isos on target for
> cloud-inits, configmaps, secrets, ... for older KubeVirt versions, The VMs
> were created with CNV 2.5.

If this is really just a clone of https://bugzilla.redhat.com/show_bug.cgi?id=2008511 for the 4.9 release, then yes, we need a different bug.
Is it just a clone for another release?

Comment 8 Israel Pinto 2021-10-06 08:35:27 UTC
Yes it clone for 4.9 , I will file new bug for the container disk.

Comment 9 Israel Pinto 2021-10-06 09:18:50 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=2011207

Comment 12 Kedar Bidarkar 2021-10-17 09:22:01 UTC
[kbidarka@localhost upgrade_test_490]$ virtctl console vm1-ocs-rhel84
Successfully connected to vm1-ocs-rhel84 console. The escape sequence is ^]

[cloud-user@vm1-ocs-rhel84 ~]$ [kbidarka@localhost upgrade_test_490]$ 
[kbidarka@localhost upgrade_test_490]$ virtctl console vm2-ocs-rhel84

Successfully connected to vm2-ocs-rhel84 console. The escape sequence is ^]

[cloud-user@vm2-ocs-rhel84 ~]$ [kbidarka@localhost upgrade_test_490]$ 
[kbidarka@localhost upgrade_test_490]$ virtctl console vm3-ocs-rhel84
Successfully connected to vm3-ocs-rhel84 console. The escape sequence is ^]

[cloud-user@vm3-ocs-rhel84 ~]$ 

[kbidarka@localhost upgrade_test_490]$ oc get csv -n openshift-cnv 
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v4.9.0   OpenShift Virtualization   4.9.0     kubevirt-hyperconverged-operator.v4.8.2   Succeeded



Virt-launcher Pods got upgraded successfully from "/virt-launcher/images/v4.8.2-5" to "virt-launcher/images/v4.9.0-58"

Comment 15 errata-xmlrpc 2021-11-02 16:01:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4104


Note You need to log in before you can comment on or make changes to this bug.