Bug 2000298 - [RFE] Add support for VMware Change ID in spec.source.checkpoints
Summary: [RFE] Add support for VMware Change ID in spec.source.checkpoints
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 4.8.1
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.9.1
Assignee: Matthew Arnold
QA Contact: Ilanit Stein
URL:
Whiteboard:
Depends On:
Blocks: 1942651
TreeView+ depends on / blocked
 
Reported: 2021-09-01 19:37 UTC by Fabien Dupont
Modified: 2021-12-13 19:59 UTC (History)
6 users (show)

Fixed In Version: v4.9.1-21
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-13 19:59:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt containerized-data-importer pull 1933 0 None Merged VDDK: accept snapshot change IDs in previous checkpoint fields 2021-11-08 19:25:23 UTC
Github kubevirt containerized-data-importer pull 2013 0 None Merged [release-v1.38] VDDK: accept snapshot change IDs in previous checkpoint fields 2021-11-08 19:25:26 UTC
Red Hat Product Errata RHBA-2021:5091 0 None None None 2021-12-13 19:59:17 UTC

Description Fabien Dupont 2021-09-01 19:37:26 UTC
Description of problem:

When implementing multi-stage VDDK import, a list of checkpoints has been added to the DataVolume spec.source. The current implementation expects a snapshot Managed Object Reference (MORef).

This works well, but the number of snapshots progressively increases until the maximum number of snapshots. The limit is low, so this means that for long warm migrations, the interval between snapshots needs to be increased. This leads in turn to bigger deltas to copy over.

The idea is to delete snapshots when they are not needed anymore to keep at most 2 snapshots and allow any value for the snapshots interval. To allow that, VMIO or MTV can get the Change ID associated to the snapshot, and then delete the snapshot. The Change ID would then be passed to CDI as a checkpoint.

The logic in the VDDK importer would need to check the string format to identify whether the checkpoints are Snapshot MORrf or Change ID, and adjust its behavior accordingly.

Comment 1 Fabien Dupont 2021-11-08 13:18:14 UTC
Please verify with hco-bundle-registry-container-4.9.1-15 / iib:132476, or later.

Comment 2 Fabien Dupont 2021-11-08 15:10:29 UTC
It's missing a backport PR and is not available in 4.9.1 yet. Moving back to MODIFIED.

Comment 3 Fabien Dupont 2021-11-09 12:56:16 UTC
Please verify with hco-bundle-registry-container-v4.9.1-21 / iib:132758, or later.

Comment 4 Ilanit Stein 2021-11-15 10:30:34 UTC
Fabien,

How this bug should be verified please?

Comment 5 Fabien Dupont 2021-11-16 09:01:04 UTC
To verify, you need to pass Change IDs as checkpoints to the DataVolume, instead of Snapshot MORef.
@marnold, could you please explain how to get the Change IDs and update the DataVolume?

Comment 6 Matthew Arnold 2021-11-17 16:42:41 UTC
I don't know if QE normally tests CDI directly like that. It seems like it should also be valid to run a warm migration from MTV and watch the DataVolume to make sure it is using change IDs instead of snapshot IDs. The list is in spec.checkpoints, and the field to watch is "previous". I think it will only show change IDs starting with the second iteration. (I can also supply a way to directly test CDI if this isn't good enough).

To get the change ID without building any extra tools, the easiest way I have found so far is to use the VMware object browser. The path looks like: "content" -> "rootFolder" -> select data center under "childEntity" -> "vmFolder" ->  target VM under "childEntity" -> "snapshot" -> "rootSnapshotList" -> target "snapshot" link -> "config" -> "hardware" -> find correct "VirtualDisk" object under "device" -> "backing". The change ID should now be under the "changeId" field, and it should look something like "52 eb 6a a1 07 46 85 f7-36 a9 46 0e b2 a6 20 27/107".

Comment 8 Matthew Arnold 2021-11-18 14:46:23 UTC
I should clarify my test suggestion a bit, because it's not super obvious how change IDs are supposed to work. I was thinking this bug could be conveniently verified with an MTV warm migration as long as MTV is supplying change IDs, and warm migrations are running successfully. At a minimum, the MTV build must have the fix for 1942651. Otherwise, the only way to test it is to do a manual warm migration with CDI CRDs, and that's far less convenient.

The checkpoints list in the spec is an input to CDI. What you should see is that the list of checkpoints grows as MTV initiates pre-copies, and there should be change IDs in the 'previous' field only starting with the second iteration. The first one is always blank, and 'current' can still contain a snapshot ID. If the copies are successful, then CDI is successfully finding and getting deltas, and that verifies the fix.

The only other way is to create a DataVolume CRD set up for warm migration, manually apply new checkpoints with change IDs, and manually delete snapshots from VMware as you go. This is very tedious.

Comment 9 Amos Mastbaum 2021-11-21 16:03:50 UTC
spec:
  checkpoints:
    - current: snapshot-8314
      previous: ''
    - current: snapshot-8315
      previous: 52 4c bd 71 27 c9 0d 5c-84 52 48 a6 a9 99 ee d4/169


According to https://rhev-node-05.rdu2.scalelab.redhat.com/mob/?moid=snapshot%2d8313&doPath=config%2ehardware%2edevice%5b2001%5d%2ebacking

52 4c bd 71 27 c9 0d 5c-84 52 48 a6 a9 99 ee d4/169 is snapshot-8313

Is That expected?

@marnold 

cc @istein

Comment 10 Matthew Arnold 2021-11-22 02:26:18 UTC
It could be fine, depending on whether or not the disk actually changed between snapshot-8313 and snapshot-8314. Change IDs are independent of snapshot IDs, and I am pretty sure multiple snapshots can have the same change ID. It seems to say "52 4c bd 71 27 c9 0d 5c-84 52 48 a6 a9 99 ee d4/168" when I click on it now, though.

Comment 11 Ilanit Stein 2021-11-22 15:36:32 UTC
Moving bug to verified based on Matthew's + Fabien Dupont's inputs, that the result reflects the expected behavior.

Comment 17 errata-xmlrpc 2021-12-13 19:59:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 4.9.1 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:5091


Note You need to log in before you can comment on or make changes to this bug.