Created attachment 1746393 [details] vm xml and engine logs Description of problem: VM left in invalid state for backup. StartVmBackupVDS failed: Checkpoint Error: {'parent_checkpoint_id': None, 'leaf_checkpoint_id': '0f02c1b2-dde0-47f2-8b85-5a7b82addafd', 'vm_id': '24174ed8-40f8-41de-b2a6-4d852c2de4cb', 'reason': 'Parent checkpoint ID does not match the actual leaf checkpoint'} Version-Release number of selected component (if applicable): ovirt-4.4.3 rhv-4.4.3 vdsm-4.40.35.1-1.el8ev.x86_64 ovirt-imageio-daemon-2.1.1-1.el8ev.x86_64 libvirt-daemon-6.6.0-7.module+el8.3.0+8424+5ea525c5.x86_64 qemu-kvm-5.1.0-14.module+el8.3.0+8438+644aff69.x86_64 How reproducible: Create VM with RAW disk (just not check "enable incremental backup") Do full backup Take snapshot Go to disk properties and check "Enable incremental backup" Do full backup Do series of incremental backups delete snapshot Try to start full backup. VM backup successfully created but when you try to start imager transfer you will get <fault> <detail>[Cannot transfer Virtual Disk. The specified VM backup does not exist.]</detail> <reason>Operation Failed</reason> </fault>
(In reply to Yury.Panchenko from comment #0) > delete snapshot Deleting a snapshot does not update the checkpoints in engine db. This need to be fixed. > Try to start full backup. > VM backup successfully created but If backup was successful, where do you get the error? StartVmBackupVDS failed: Checkpoint Error: { 'parent_checkpoint_id': None, 'leaf_checkpoint_id': '0f02c1b2-dde0-47f2-8b85-5a7b82addafd', 'vm_id': '24174ed8-40f8-41de-b2a6-4d852c2de4cb', 'reason': 'Parent checkpoint ID does not match the actual leaf checkpoint'} Is this internal log, but backup succeeds? > when you try to start imager transfer you will get > <fault> > <detail>[Cannot transfer Virtual Disk. The specified VM backup does not > exist.]</detail> > <reason>Operation Failed</reason> > </fault> Looks like error handling in engine is incorrect in this case. Regardless of backup error handling, when a user delete a snapshot, there should be a warning that deleting the snapshot will disable incremental backup, and the user must confirm the operation to delete the snapshot.
> Deleting a snapshot does not update the checkpoints in engine db. > This need to be fixed. Deleting a snapshot doesn't interact with the checkpoints in the engine DB. > > > Try to start full backup. > > VM backup successfully created but > > If backup was successful, where do you get the error? > > StartVmBackupVDS failed: Checkpoint Error: { > 'parent_checkpoint_id': None, > 'leaf_checkpoint_id': '0f02c1b2-dde0-47f2-8b85-5a7b82addafd', > 'vm_id': '24174ed8-40f8-41de-b2a6-4d852c2de4cb', > 'reason': 'Parent checkpoint ID does not match the actual leaf > checkpoint'} > > Is this internal log, but backup succeeds? According to the log, the creation of the backup failed - 2021-01-11 17:07:42,554+01 ERROR [org.ovirt.engine.core.bll.StartVmBackupCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-13) [30e41885-e538-45e9-9756-eb22f9a2711f] Failed to execute VM backup operation 'StartVmBackup': {}: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to StartVmBackupVDS, error = Checkpoint Error: {'parent_checkpoint_id': None, 'leaf_checkpoint_id': '0f02c1b2-dde0-47f2-8b85-5a7b82addafd', 'vm_id': '24174ed8-40f8-41de-b2a6-4d852c2de4cb', 'reason': 'Parent checkpoint ID does not match the actual leaf checkpoint'}, code = 1610 (Failed with error unexpected and code 16) Here is what's happening here - VM contains only 1 disk in RAW format. A snapshot created and the disk format changed to QCOW2 -> Disk can participate in a backup now. A backup is taken for the VM and a few incremental backups were taken after that (list of the defined checkpoints from VDSM log)- 2021-01-11 17:09:19,317+0100 INFO (jsonrpc/3) [api.virt] FINISH list_checkpoints return={'result': ['e9169bd9-7224-42dc-bf7b-b10b2b9ed91b', '33742109-cad2-47db-a9ca-0314eaa8fdf8', '0f02c1b2-dde0-47f2-8b85-5a7b82addafd'], 'status': {'code': 0, 'message': 'Done'}} from=::ffff:172.25.16.27,41302, flow_id=30e41885-e538-45e9-9756-eb22f9a2711f, vmId=24174ed8-40f8-41de-b2a6-4d852c2de4cb (api:54) Snapshot removed for the VM -> Disk format is now back to RAW (warning appears in the UI) A full backup is now taken for the VM -> Backup contains only RAW disks so the engine doesn't create a checkpoint for that backup - 2021-01-11 17:07:41,643+01 INFO [org.ovirt.engine.core.bll.StartVmBackupCommand] (default task-29) [30e41885-e538-45e9-9756-eb22f9a2711f] Skip checkpoint creation for VM '24174ed8-40f8-41de-b2a6-4d852c2de4cb' So the request for the host to create the backup doesn't contain any parent checkpoint ID (no checkpoint) that needed for the host in order to validate that the chain is valid. The solution will be to skip this check if there is no checkpoint creation in this backup (no parent ID). > > > when you try to start imager transfer you will get > > <fault> > > <detail>[Cannot transfer Virtual Disk. The specified VM backup does not > > exist.]</detail> > > <reason>Operation Failed</reason> > > </fault> > > Looks like error handling in engine is incorrect in this case. I think that you tried to download the backup even though it failed, so you try to remove a backup that doesn't exist. Your application should identify that the backup wasn't created properly and avoid trying downloading it. > > Regardless of backup error handling, when a user delete a snapshot, there > should be a warning that deleting the snapshot will disable incremental > backup, and the user must confirm the operation to delete the snapshot.
Hello, Nir and Eyal. >If backup was successful, where do you get the error? No, that errors got after backup creation failed >Your application should identify that the backup wasn't created properly and avoid trying downloading it. For test, a created manually backup via REST. Backup created without problems, But image transfer failed with 404
Please provide step for verifying this.
(In reply to Ilan Zuckerman from comment #4) > Please provide step for verifying this. Steps to reproduce - 1. Create a VM that contains only 1 disk in RAW format. 2. Create a snapshot for the VM -> disk format changed to QCOW2 and the disk can participate in an incremental backup now. 3. Create full + incremental backup for the VM. 4. Remove the VM snapshot -> disk format changed back to RAW and the disk can't participate in an incremental backup now. 5. Create a full backup for the VM. Expected result - A full backup should be taken after step 6 without a checkpoint creation.
Update: Will be fixed in RHV 4.4.5
Verified on rhv-4.4.5-4 according the exact same steps. > Steps to reproduce - > > 1. Create a VM that contains only 1 disk in RAW format. > 2. Create a snapshot for the VM -> disk format changed to QCOW2 and the disk > can participate in an incremental backup now. > 3. Create full + incremental backup for the VM. > 4. Remove the VM snapshot -> disk format changed back to RAW and the disk > can't participate in an incremental backup now. > 5. Create a full backup for the VM. > > Expected result - > > A full backup should be taken after step 6 without a checkpoint creation.
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.