Description of problem: When implementing multi-stage VDDK import, a list of checkpoints has been added to the DataVolume spec.source. The current implementation expects a snapshot Managed Object Reference (MORef). This works well, but the number of snapshots progressively increases until the maximum number of snapshots. The limit is low, so this means that for long warm migrations, the interval between snapshots needs to be increased. This leads in turn to bigger deltas to copy over. The idea is to delete snapshots when they are not needed anymore to keep at most 2 snapshots and allow any value for the snapshots interval. To allow that, VMIO or MTV can get the Change ID associated to the snapshot, and then delete the snapshot. The Change ID would then be passed to CDI as a checkpoint. The logic in the VDDK importer would need to check the string format to identify whether the checkpoints are Snapshot MORrf or Change ID, and adjust its behavior accordingly.
Please verify with hco-bundle-registry-container-4.9.1-15 / iib:132476, or later.
It's missing a backport PR and is not available in 4.9.1 yet. Moving back to MODIFIED.
Please verify with hco-bundle-registry-container-v4.9.1-21 / iib:132758, or later.
Fabien, How this bug should be verified please?
To verify, you need to pass Change IDs as checkpoints to the DataVolume, instead of Snapshot MORef. @marnold, could you please explain how to get the Change IDs and update the DataVolume?
I don't know if QE normally tests CDI directly like that. It seems like it should also be valid to run a warm migration from MTV and watch the DataVolume to make sure it is using change IDs instead of snapshot IDs. The list is in spec.checkpoints, and the field to watch is "previous". I think it will only show change IDs starting with the second iteration. (I can also supply a way to directly test CDI if this isn't good enough). To get the change ID without building any extra tools, the easiest way I have found so far is to use the VMware object browser. The path looks like: "content" -> "rootFolder" -> select data center under "childEntity" -> "vmFolder" -> target VM under "childEntity" -> "snapshot" -> "rootSnapshotList" -> target "snapshot" link -> "config" -> "hardware" -> find correct "VirtualDisk" object under "device" -> "backing". The change ID should now be under the "changeId" field, and it should look something like "52 eb 6a a1 07 46 85 f7-36 a9 46 0e b2 a6 20 27/107".
I should clarify my test suggestion a bit, because it's not super obvious how change IDs are supposed to work. I was thinking this bug could be conveniently verified with an MTV warm migration as long as MTV is supplying change IDs, and warm migrations are running successfully. At a minimum, the MTV build must have the fix for 1942651. Otherwise, the only way to test it is to do a manual warm migration with CDI CRDs, and that's far less convenient. The checkpoints list in the spec is an input to CDI. What you should see is that the list of checkpoints grows as MTV initiates pre-copies, and there should be change IDs in the 'previous' field only starting with the second iteration. The first one is always blank, and 'current' can still contain a snapshot ID. If the copies are successful, then CDI is successfully finding and getting deltas, and that verifies the fix. The only other way is to create a DataVolume CRD set up for warm migration, manually apply new checkpoints with change IDs, and manually delete snapshots from VMware as you go. This is very tedious.
spec: checkpoints: - current: snapshot-8314 previous: '' - current: snapshot-8315 previous: 52 4c bd 71 27 c9 0d 5c-84 52 48 a6 a9 99 ee d4/169 According to https://rhev-node-05.rdu2.scalelab.redhat.com/mob/?moid=snapshot%2d8313&doPath=config%2ehardware%2edevice%5b2001%5d%2ebacking 52 4c bd 71 27 c9 0d 5c-84 52 48 a6 a9 99 ee d4/169 is snapshot-8313 Is That expected? @marnold cc @istein
It could be fine, depending on whether or not the disk actually changed between snapshot-8313 and snapshot-8314. Change IDs are independent of snapshot IDs, and I am pretty sure multiple snapshots can have the same change ID. It seems to say "52 4c bd 71 27 c9 0d 5c-84 52 48 a6 a9 99 ee d4/168" when I click on it now, though.
Moving bug to verified based on Matthew's + Fabien Dupont's inputs, that the result reflects the expected behavior.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 4.9.1 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:5091