Bug 2013494 - [CNV-2.6.8] VMI is in LiveMigrate loop when Upgrading Cluster from 2.6.7/4.7.32 to OCP 4.8.13
Summary: [CNV-2.6.8] VMI is in LiveMigrate loop when Upgrading Cluster from 2.6.7/4.7....
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.6.7
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 2.6.8
Assignee: Jed Lejosne
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On: 2008511
Blocks: 2010742
TreeView+ depends on / blocked
 
Reported: 2021-10-13 04:13 UTC by Kedar Bidarkar
Modified: 2021-11-17 18:40 UTC (History)
11 users (show)

Fixed In Version: virt-operator-container-v2.6.8-5 hco-bundle-registry-container-v2.6.8-22
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2008511
Environment:
Last Closed: 2021-11-17 18:40:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 6591 0 None open [release-0.36] migration: generate empty isos on target for cloud-inits, configmaps, secrets, ... 2021-10-13 13:45:24 UTC
Github kubevirt kubevirt pull 6697 0 None open [release-0.36] virt-handler: use /proc instead of virt-chroot to stat virt-launcher files 2021-10-28 20:31:34 UTC
Red Hat Product Errata RHSA-2021:4725 0 None None None 2021-11-17 18:40:41 UTC

Comment 1 Kedar Bidarkar 2021-10-13 04:28:18 UTC
During OCP Upgrade to 4.8.14 ( CNV 4.8.z, is not involved yet )  from 4.7.33
-----------------------------------------------------------------------------

Source VMI Pod Version: container-native-virtualization/virt-launcher/images/v2.5.8-3"  ( yes virt-launcher was still using 2.5.8-3, when on CNV 2.6.7/4.7.33 )
Target VMI Pod Version: container-native-virtualization/virt-operator/images/v2.6.7-8"

---

NOTE: Paid close attention to the VMI Pod Versions during this upgrade
The below issue is seen when VMI Pod LiveMigrates/Upgrades from 2.5.8-3 to 2.6.7-8 ( during OCP-4.8.14)


{"component":"virt-launcher","kind":"","level":"error","msg":"Live migration failed.","name":"vm3-ocs-rhel84","namespace":"default","pos":"manager.go:565","reason":"virError(Code=9, Domain=10, Message='operation failed: migration of disk vdb failed: Source and target image have different sizes')","timestamp":"2021-10-12T19:36:46.340948Z","uid":"7c294dfc-89d6-4b1a-a60d-70e390efa0da"}

Comment 4 Jed Lejosne 2021-10-28 20:45:42 UTC
This new failure is caused by virt-chroot not working properly in CNV 2.6.
More specifically, the --user option which fails on every user, even root, saying the user doesn't exist.
I have not figured out why that happens.

However, after talking to Roman, we figured out that virt-chroot was not needed in the codepath involved in the issue, and in fact just made the code unnecessarily complicated.
So I pushed a fix to KubeVirt main and backported it to release-0.36 (linked above), which should fix the issue (by not using virt-chroot anymore).

It is worth noting that another (unrelated) function uses `virt-chroot --user`, GetImageInfo(), and in that case the use of virt-chroot makes sense.
I assume that function does not work in CNV 2.6 either, but I'm not sure what the impact of it is.

Comment 5 Roman Mohr 2021-10-29 12:43:04 UTC
(In reply to Jed Lejosne from comment #4)
> This new failure is caused by virt-chroot not working properly in CNV 2.6.
> More specifically, the --user option which fails on every user, even root,
> saying the user doesn't exist.
> I have not figured out why that happens.
> 
> However, after talking to Roman, we figured out that virt-chroot was not
> needed in the codepath involved in the issue, and in fact just made the code
> unnecessarily complicated.
> So I pushed a fix to KubeVirt main and backported it to release-0.36 (linked
> above), which should fix the issue (by not using virt-chroot anymore).
> 
> It is worth noting that another (unrelated) function uses `virt-chroot
> --user`, GetImageInfo(), and in that case the use of virt-chroot makes sense.
> I assume that function does not work in CNV 2.6 either, but I'm not sure
> what the impact of it is.

This is not an issue. This is only called at startup where the new launcher image
is already in use (there is a small race windows where new handlers can get old launcher pods and that should normally compatible too but it is not worth fixing here and it would only be a transient error).

Comment 8 zhe peng 2021-11-03 09:50:44 UTC
verify with build: v2.6.8-22 

Summary:
Start with VM in 2.5.8 (CNV 2.5.8, OCP 4.6)
Do OCP upgrade to 4.7, Applied 2.6.8 ICSP immediately and started CNV upgrade with the following scenarios:

Scenario 1: 
Upgrade CNV From: 2.5.8 To: 2.6.4 
Migrate VM in CNV 2.6.4 LiveMigration - PASSED
	Virt-Launcher version - 2.6.4/2.6.3-2
Continue the upgrade to CNV 2.6.8 
Virt-Launcher version 2.6.4/2.6.3-2 to 2.6.8-5 - PASSED

Scenario 2: 
Upgrade CNV From: 2.5.8 To: 2.6.5 
Migrate VM in CNV 2.6.5 LiveMigration - PASSED
	Virt-Launcher version - 2.6.5-2
Continue the upgrade to CNV 2.6.8 
Virt-Launcher version 2.6.5-2 to 2.6.8-5 - PASSED

Scenario 3: 
Upgrade CNV From: 2.5.8 To: 2.6.6 
Migrate VM in CNV 2.6.6 LiveMigration - PASSED
Virt-Launcher version - 2.6.6-7
Continue the upgrade to CNV 2.6.8
Virt-Launcher version 2.6.6-7 to 2.6.8-5 - PASSED

Scenario 4: 
Upgrade CNV From: 2.5.8 To: 2.6.7 
Migrate VM in CNV 2.6.7 LiveMigration  - FAILED  (https://bugzilla.redhat.com/show_bug.cgi?id=2019705)
	Virt-Launcher version 2.5.8 to 2.6.7 
		Source Virt-Launcher Pod 2.5.8 continues to be in Running state.
Target Virt-launcher Pod 2.6.7 enters Completed state
VMIM Object shows Status: FAILED
Continue the upgrade to CNV 2.6.8
Virt-Launcher version 2.5.8 to 2.6.8-5  - PASSED


Scenario 5: 
Upgrade CNV From: 2.5.8 To: 2.6.8 
Was tested as part of Scenario 4 itself.
As we see above,
the virt-launcher upgrade from version 2.5.8 to 2.6.8-5 - PASSED

move this to verified.

Comment 9 Kedar Bidarkar 2021-11-08 10:38:04 UTC
LiveMigration of VMI with the following scenario:  PASSED

source virt-launcher Pod: v2.6.7
Target Virt-Launcher Pod: v2.6.8-5

Comment 15 errata-xmlrpc 2021-11-17 18:40:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.8 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4725


Note You need to log in before you can comment on or make changes to this bug.