Bug 1969894 - [Regression][VMIO][Warm] The third precopy does not end in warm migration
Summary: [Regression][VMIO][Warm] The third precopy does not end in warm migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: V2V
Version: 2.6.3
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.8.0
Assignee: Sam Lucidi
QA Contact: Maayan Hadasi
URL:
Whiteboard:
Depends On:
Blocks: 1975727
TreeView+ depends on / blocked
 
Reported: 2021-06-09 12:28 UTC by Maayan Hadasi
Modified: 2021-07-27 14:33 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1975727 (view as bug list)
Environment:
Last Closed: 2021-07-27 14:32:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
screenshot (72.00 KB, image/png)
2021-06-09 12:28 UTC, Maayan Hadasi
no flags Details
importer log (13.62 KB, text/plain)
2021-06-10 07:35 UTC, Maayan Hadasi
no flags Details
vmware_vm_directory_after_CBT_restart (281.78 KB, image/png)
2021-06-10 07:37 UTC, Maayan Hadasi
no flags Details
vm_selection_page (118.85 KB, image/png)
2021-06-10 07:38 UTC, Maayan Hadasi
no flags Details
vmware_vm_directory_once_the_bug_occurred (355.07 KB, image/png)
2021-06-10 07:38 UTC, Maayan Hadasi
no flags Details
2_copies_migration_plan_screenshot (55.53 KB, image/png)
2021-06-10 07:39 UTC, Maayan Hadasi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt vm-import-operator pull 494 0 None open Don't clean up snapshots while a warm import is in progress 2021-06-14 17:04:46 UTC
Red Hat Product Errata RHSA-2021:2920 0 None None None 2021-07-27 14:33:33 UTC

Description Maayan Hadasi 2021-06-09 12:28:19 UTC
Created attachment 1789556 [details]
screenshot

Description of problem:
When running a warm migration, the 3rd precopy seems to run forever
(see attached screenshot)

- importer pod status is CrashLoopBackOff
- After running Cutover - the migration plan fails on: DataVolumeCreationFailed
Message:
Error while importing disk image: d8ea4a7e-0233-4635-90bb-18188bdb0c43-2001. Unable to connect to vddk data source: ServerFaultCode: Error caused by file /vmfs/volumes/5bfe7583-fe7f9906-f066-b8ca3a638940/mguetta-2disks-for-warm/mguetta-2disks-for-warm_1-000002.vmdk (pod CrashLoopBackoff restart exceeded)


Versions:
Installation from stage June 09 2021, that match mtv-operator-bundle-container-2.0.0-23 -- on Baremetal
CNV 2.6.3-25
OCP 4.7.0


How reproducible:
100%


Steps to Reproduce:
1. Have a running source VM
2. Warm migrate this VM
3. Wait for the 3rd precopy to execute


Actual results:
The third precopy does not end


Additional info:
- This issue was found using NFS storage class
- The issue is not reproducible when running warm migration with a powered off VM

Comment 1 Fabien Dupont 2021-06-09 14:54:03 UTC
How full is the datastore? Is there any error message on the VMware side?
I've found articles [1][2] that say it might be caused by invalid CBT. Could you please reset CBT for this VM? See https://kb.vmware.com/s/article/2139574.

[1] https://www.veritas.com/support/en_US/article.100027732
[2] https://support.assurestor.com/support/solutions/articles/16000065189-getbackupsessionfiles-failed-error-querychangeddiskareas-soap-1-1-fault-serverfaultcode-no-subc

Comment 2 Maayan Hadasi 2021-06-10 07:35:41 UTC
Created attachment 1789740 [details]
importer log

Comment 3 Maayan Hadasi 2021-06-10 07:37:10 UTC
Created attachment 1789749 [details]
vmware_vm_directory_after_CBT_restart

Comment 4 Maayan Hadasi 2021-06-10 07:38:26 UTC
Created attachment 1789755 [details]
vm_selection_page

Comment 5 Maayan Hadasi 2021-06-10 07:38:56 UTC
Created attachment 1789763 [details]
vmware_vm_directory_once_the_bug_occurred

Comment 6 Maayan Hadasi 2021-06-10 07:39:22 UTC
Created attachment 1789764 [details]
2_copies_migration_plan_screenshot

Comment 7 Maayan Hadasi 2021-06-10 07:41:54 UTC
Hi @fdupont

I followed the instructions in this documentation: https://kb.vmware.com/s/article/2139574 in order to reset CBT
And it is the first time that the issue was reproduced in the 2nd precopy and not in the 3rd one

please see attachments (related to the same run):
- importer_log
- vmware_vm_directory_after_CBT_restart
- vm_selection_page
- vmware_vm_directory_once_the_bug_occurred
- 2_copies_migration_plan_screenshot

Comment 8 Maayan Hadasi 2021-06-10 11:38:14 UTC
More information :

The issue was reproduced in Warm VMIO migration, triggered by API
In addition, we noticed that once the 3rd snapshot is taken - the 1st snapshot is removed from the VMware VM

Comment 10 Amos Mastbaum 2021-07-06 05:13:17 UTC
@fdupont
In cnv48-451 it works as before,
The Snapshots are delete and as long as the max vmware cbt snapshots is not reached the Migration is succssefulle.
Plese confirm it is ok before we verify.
thanks.

Comment 11 Maayan Hadasi 2021-07-06 09:27:51 UTC
Verified as fixed. No CBT snapshot is deleted during the precopies stage in warm migration plan


Versions:
CNV 4.8.0-451 iib 86746
MTV 2.1.0-21 iib 88402
OCP 4.8.0-rc.1

Comment 14 errata-xmlrpc 2021-07-27 14:32:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2920


Note You need to log in before you can comment on or make changes to this bug.