Description of problem: For a VM that has already 28 snapshots (actual maximum number of snapshots for a VM seen on VMware 6.5), a warm migration plan remain in state "Running - preparing for precopies" for around an hour, and then fails with error that do not indicate the cause: "Error while attempting warm import: warm import retry limit reached" Version-Release number of selected component (if applicable): mtv-operator-bundle-container-2.0.0-4 Expected results: 1. At first while warm migration is running, need to notify the user what the plan is pending (CBT enablement on VMs). 2. When the timeout elapses, there should be a clear error message saying that timeout had past and the CBT snapshot could not have been created. Additional info: This VMware article says 32 snapshots and that the snapshots should not be more than 2 or 3 days old. https://kb.vmware.com/s/article/1025279
@istein would you mind opening a doc bug for MTV 2.0.0 to document these limits.
There are two things in the BZ description. 1. "notify the user what the plan is pending" - The current behavior of the migration plan is to mark the virtual machine as failed if CBT is not enabled. So, I don't see when the situation would happen that the migration starts if CBT is not enabled and that we would have to notify the user. This would only be the case when creating a VirtualMachineImport CR manually. 2. "the timeout elapses" - Which timeout are you talking about? The one to wait for CBT to be enabled? If yes, we don't expect it to happen when using MTV. Regarding the limits, the VMware documentation says "Maximum of 32 snapshots are supported in a chain". The important part is "in a chain", as the snapshots created for warm migration are in their own chain, hence will not conflict with snapshots created by the user of other applications. That said, we still want to be good citizens and https://github.com/kubevirt/vm-import-operator/pull/488 will ensure we only have 2 snapshots in the chain created by MTV. And unless the user sets the precopy interval to more than 3 days, they will never be more than 3 days old.
Thanks for your reply Fabien. 1. There is a mistake in the bug description. The plan is waiting for a successful creation of a CBT snanpshot, and not to CBT enablement. It failed to get executed, since there were already 28 existing "regular" snapshots. The "snapshot cleanup" PR doesn't address this problem. The request here is: To add a message explaining that the plan retries to create a CBT snapshot. 2. The timeout of the plan waiting for a CBT snapshot successful creation (that cannot succeed due to existing 28 snapshots). Currently plan fails with error: "Error while attempting warm import: warm import retry limit reached" The request here is: To have a more clear error. Something like: "Error while attempting warm import: warm import CBT snapshot retry limit reached". We have a cloned doc bug to cover this in MTV 2.0 Document. I do not know however how to refer to the difference between the VMware formal doc, saying the the max number of allowed snapshots is 32, while actually more than 28 is not allowed. wdyt we should do about it?
This would be easier to address that in MTV 2.2.0, in the single VM import logic, instead of VMIO. Moving this BZ to 2.2.0, while keeping the doc BZ for 2.1.0.
As of 2.2.0, it should be reflected in the migration status if an error occurs while attempting to create a snapshot.
Please verify with mtv-operator-bundle-2.0.0-57 / iib:126435, or later.
verified MTV 2.2.0-87 OCP 4.9.7 CNV 4.9.1-23
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (MTV 2.2.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:5066