Bug 1915025 - Unable to backup VM with raw disk after snapshot deletion
Summary: Unable to backup VM with raw disk after snapshot deletion
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backup-Restore.VMs
Version: 4.4.3.12
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.4.5
: ---
Assignee: Eyal Shenitzky
QA Contact: Ilan Zuckerman
bugs@ovirt.org
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-11 19:01 UTC by Yury.Panchenko
Modified: 2021-11-04 19:28 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-18 15:14:49 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
vm xml and engine logs (3.86 MB, application/zip)
2021-01-11 19:01 UTC, Yury.Panchenko
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 112937 0 master MERGED backup.py: skip parent validation in case of missing parent_checkpoint_id 2021-02-09 11:52:54 UTC

Internal Links: 1952916

Description Yury.Panchenko 2021-01-11 19:01:04 UTC
Created attachment 1746393 [details]
vm xml and engine logs

Description of problem:
VM left in invalid state for backup. 

StartVmBackupVDS failed: Checkpoint Error: {'parent_checkpoint_id': None, 'leaf_checkpoint_id': '0f02c1b2-dde0-47f2-8b85-5a7b82addafd', 'vm_id': '24174ed8-40f8-41de-b2a6-4d852c2de4cb', 'reason': 'Parent checkpoint ID does not match the actual leaf checkpoint'}

Version-Release number of selected component (if applicable):
ovirt-4.4.3
rhv-4.4.3
vdsm-4.40.35.1-1.el8ev.x86_64
ovirt-imageio-daemon-2.1.1-1.el8ev.x86_64
libvirt-daemon-6.6.0-7.module+el8.3.0+8424+5ea525c5.x86_64
qemu-kvm-5.1.0-14.module+el8.3.0+8438+644aff69.x86_64

How reproducible:
Create VM with RAW disk (just not check "enable incremental backup")
Do full backup
Take snapshot
Go to disk properties and check "Enable incremental backup"
Do full backup
Do series of incremental backups

delete snapshot
Try to start full backup.
VM backup successfully created but
when you try to start imager transfer you will get
<fault>
  <detail>[Cannot transfer Virtual Disk. The specified VM backup does not exist.]</detail>
  <reason>Operation Failed</reason>
</fault>

Comment 1 Nir Soffer 2021-01-11 20:34:21 UTC
(In reply to Yury.Panchenko from comment #0)
> delete snapshot

Deleting a snapshot does not update the checkpoints in engine db.
This need to be fixed.

> Try to start full backup.
> VM backup successfully created but

If backup was successful, where do you get the error?

    StartVmBackupVDS failed: Checkpoint Error: {
    'parent_checkpoint_id': None,
    'leaf_checkpoint_id': '0f02c1b2-dde0-47f2-8b85-5a7b82addafd',
    'vm_id': '24174ed8-40f8-41de-b2a6-4d852c2de4cb',
    'reason': 'Parent checkpoint ID does not match the actual leaf checkpoint'}

Is this internal log, but backup succeeds?

> when you try to start imager transfer you will get
> <fault>
>   <detail>[Cannot transfer Virtual Disk. The specified VM backup does not
> exist.]</detail>
>   <reason>Operation Failed</reason>
> </fault>

Looks like error handling in engine is incorrect in this case.

Regardless of backup error handling, when a user delete a snapshot, there
should be a warning that deleting the snapshot will disable incremental
backup, and the user must confirm the operation to delete the snapshot.

Comment 2 Eyal Shenitzky 2021-01-12 10:00:18 UTC
> Deleting a snapshot does not update the checkpoints in engine db.
> This need to be fixed.

Deleting a snapshot doesn't interact with the checkpoints in the engine DB.

> 
> > Try to start full backup.
> > VM backup successfully created but
> 
> If backup was successful, where do you get the error?
> 
>     StartVmBackupVDS failed: Checkpoint Error: {
>     'parent_checkpoint_id': None,
>     'leaf_checkpoint_id': '0f02c1b2-dde0-47f2-8b85-5a7b82addafd',
>     'vm_id': '24174ed8-40f8-41de-b2a6-4d852c2de4cb',
>     'reason': 'Parent checkpoint ID does not match the actual leaf
> checkpoint'}
> 
> Is this internal log, but backup succeeds?


According to the log, the creation of the backup failed - 
2021-01-11 17:07:42,554+01 ERROR [org.ovirt.engine.core.bll.StartVmBackupCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-13) [30e41885-e538-45e9-9756-eb22f9a2711f] Failed to execute VM backup operation 'StartVmBackup': {}: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to StartVmBackupVDS, error = Checkpoint Error: {'parent_checkpoint_id': None, 'leaf_checkpoint_id': '0f02c1b2-dde0-47f2-8b85-5a7b82addafd', 'vm_id': '24174ed8-40f8-41de-b2a6-4d852c2de4cb', 'reason': 'Parent checkpoint ID does not match the actual leaf checkpoint'}, code = 1610 (Failed with error unexpected and code 16)

Here is what's happening here - 

VM contains only 1 disk in RAW format.
A snapshot created and the disk format changed to QCOW2 -> Disk can participate in a backup now.

A backup is taken for the VM and a few incremental backups were taken after that (list of the defined checkpoints from VDSM log)- 
2021-01-11 17:09:19,317+0100 INFO  (jsonrpc/3) [api.virt] FINISH list_checkpoints return={'result': ['e9169bd9-7224-42dc-bf7b-b10b2b9ed91b', '33742109-cad2-47db-a9ca-0314eaa8fdf8', '0f02c1b2-dde0-47f2-8b85-5a7b82addafd'], 'status': {'code': 0, 'message': 'Done'}} from=::ffff:172.25.16.27,41302, flow_id=30e41885-e538-45e9-9756-eb22f9a2711f, vmId=24174ed8-40f8-41de-b2a6-4d852c2de4cb (api:54)

Snapshot removed for the VM -> Disk format is now back to RAW (warning appears in the UI)

A full backup is now taken for the VM -> Backup contains only RAW disks so the engine doesn't create a checkpoint for that backup - 
2021-01-11 17:07:41,643+01 INFO  [org.ovirt.engine.core.bll.StartVmBackupCommand] (default task-29) [30e41885-e538-45e9-9756-eb22f9a2711f] Skip checkpoint creation for VM '24174ed8-40f8-41de-b2a6-4d852c2de4cb'

So the request for the host to create the backup doesn't contain any parent checkpoint ID (no checkpoint) that needed for the host in order to validate that the chain is valid.

The solution will be to skip this check if there is no checkpoint creation in this backup (no parent ID).

> 
> > when you try to start imager transfer you will get
> > <fault>
> >   <detail>[Cannot transfer Virtual Disk. The specified VM backup does not
> > exist.]</detail>
> >   <reason>Operation Failed</reason>
> > </fault>
> 
> Looks like error handling in engine is incorrect in this case.

I think that you tried to download the backup even though it failed,
so you try to remove a backup that doesn't exist.

Your application should identify that the backup wasn't created properly and avoid trying downloading it.

> 
> Regardless of backup error handling, when a user delete a snapshot, there
> should be a warning that deleting the snapshot will disable incremental
> backup, and the user must confirm the operation to delete the snapshot.

Comment 3 Yury.Panchenko 2021-01-12 14:31:51 UTC
Hello, Nir and Eyal.
>If backup was successful, where do you get the error?
No, that errors got after backup creation failed

>Your application should identify that the backup wasn't created properly and avoid trying downloading it.
For test, a created manually backup via REST. Backup created without problems, But image transfer failed with 404

Comment 4 Ilan Zuckerman 2021-01-25 15:29:51 UTC
Please provide step for verifying this.

Comment 5 Eyal Shenitzky 2021-01-26 12:12:02 UTC
(In reply to Ilan Zuckerman from comment #4)
> Please provide step for verifying this.

Steps to reproduce - 

1. Create a VM that contains only 1 disk in RAW format.
2. Create a snapshot for the VM -> disk format changed to QCOW2 and the disk can participate in an incremental backup now.
3. Create full + incremental backup for the VM.
4. Remove the VM snapshot -> disk format changed back to RAW and the disk can't participate in an incremental backup now.
5. Create a full backup for the VM.

Expected result - 

A full backup should be taken after step 6 without a checkpoint creation.

Comment 6 Pavan Chavva 2021-02-01 12:43:48 UTC
Update: Will be fixed in RHV 4.4.5

Comment 7 Ilan Zuckerman 2021-02-10 06:48:29 UTC
Verified on rhv-4.4.5-4 according the exact same steps.

> Steps to reproduce - 
> 
> 1. Create a VM that contains only 1 disk in RAW format.
> 2. Create a snapshot for the VM -> disk format changed to QCOW2 and the disk
> can participate in an incremental backup now.
> 3. Create full + incremental backup for the VM.
> 4. Remove the VM snapshot -> disk format changed back to RAW and the disk
> can't participate in an incremental backup now.
> 5. Create a full backup for the VM.
> 
> Expected result - 
> 
> A full backup should be taken after step 6 without a checkpoint creation.

Comment 8 Sandro Bonazzola 2021-03-18 15:14:49 UTC
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.