Description of problem: When a VM backup is taken for a running VM, scratch disks are created for the disks that participate in the backup. If the VM is destroyed during the backup when the backup is finalized the scratch disks failed to teardown from the host and removed from the system. Version-Release number of selected component (if applicable): 4.4.9 How reproducible: 100% Steps to Reproduce: 1. Create a VM with a disk 2. Start the VM 3. Start a full backup for the VM 4. Stop the VM while the backup is still running (in READY phase) by using - POST ../ovirt-engine/api/vms/<vm-id>/stop <action> <force>true</force> </action> 5. Finalize the backup while the VM is down Actual results: VM backup finalized but the created scratch disks remain and not teardown from the host. Expected results: Created scratch disks should be teardown from the host that runs the VM backup and removed from the system. Additional info:
The scratch disks are gone according to the flow but the finalize backup operation fails with an exception, is that expected? Versions: ovirt-engine-4.4.10-0.17.el8ev vdsm-4.40.100.1-1.el8ev Steps to reproduce:(as bug description) 1. Create a VM with a disk 2. Start the VM 3. Start a full backup for the VM <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <backup href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19/backups/f9f259bf-abb6-470f-b83d-4eaa1c2f677a" id="f9f259bf-abb6-470f-b83d-4eaa1c2f677a"> <actions> <link href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19/backups/f9f259bf-abb6-470f-b83d-4eaa1c2f677a/finalize" rel="finalize"/> </actions> <link href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19/backups/f9f259bf-abb6-470f-b83d-4eaa1c2f677a/disks" rel="disks"/> <creation_date>2021-11-24T10:35:45.527+02:00</creation_date> <modification_date>2021-11-24T10:35:45.717+02:00</modification_date> <phase>initializing</phase> <host href="/ovirt-engine/api/hosts/910c075a-c187-41b7-a986-d0200a15adf3" id="910c075a-c187-41b7-a986-d0200a15adf3"/> <vm href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19" id="31ece06a-a0d0-4f79-82e2-d9080a5bca19"/> </backup> 4. Stop the VM while the backup is still running (in READY phase) by using - Getting the state of the backup (ready phase): <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <backup href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19/backups/f9f259bf-abb6-470f-b83d-4eaa1c2f677a" id="f9f259bf-abb6-470f-b83d-4eaa1c2f677a"> <actions> <link href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19/backups/f9f259bf-abb6-470f-b83d-4eaa1c2f677a/finalize" rel="finalize"/> </actions> <link href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19/backups/f9f259bf-abb6-470f-b83d-4eaa1c2f677a/disks" rel="disks"/> <creation_date>2021-11-24T10:35:45.527+02:00</creation_date> <modification_date>2021-11-24T10:36:00.557+02:00</modification_date> <phase>ready</phase> <to_checkpoint_id>57cd10c6-ff9b-4042-b7e6-4d60fa4daacf</to_checkpoint_id> <host href="/ovirt-engine/api/hosts/910c075a-c187-41b7-a986-d0200a15adf3" id="910c075a-c187-41b7-a986-d0200a15adf3"/> <vm href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19" id="31ece06a-a0d0-4f79-82e2-d9080a5bca19"/> </backup> Stop the VM: POST ../ovirt-engine/api/vms/<vm-id>/stop <action> <force>true</force> </action> 5. Finalize the backup while the VM is down <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <fault> <detail>[]</detail> <reason>Operation Failed</reason> </fault> Actual results: VM backup failed to finalize but the created scratch disks have been removed in the teardown process. logs: 2021-11-24 10:36:00,559+02 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-46) [0d8fbf8f-0bbb-4682-b4ac-a83644f665bf] Ready to start image transfers 2021-11-24 10:36:51,686+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-95) [] VM '31ece06a-a0d0-4f79-82e2-d9080a5bca19'(vm1) moved from 'Up' --> 'PoweringDown' 2021-11-24 10:36:52,082+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-19) [494056bb] VM '31ece06a-a0d0-4f79-82e2-d9080a5bca19'(vm1) moved from 'PoweringDown' --> 'Down' 2021-11-24 10:36:52,090+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-19) [494056bb] EVENT_ID: VM_DOWN(61), VM vm1 is down. 2021-11-24 10:37:57,003+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.StopVmBackupVDSCommand] (default task-12) [05741e5f-a41b-419e-9e0f-4134602142ac] FINISH, StopVmBackupVDSCommand, return: , log id: 446f97c5 2021-11-24 10:37:57,011+02 ERROR [org.ovirt.engine.core.bll.storage.backup.StopVmBackupCommand] (default task-12) [05741e5f-a41b-419e-9e0f-4134602142ac] Failed to execute VM.stopBackup operation: org.ovirt.engine.core.common.errors.Engine Exception: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to StopVmBackupVDS, error = Virtual machine does not exist: {'vmId': '31ece06a-a0d0-4f79-82e2-d9080a5b ca19'}, code = 1 (Failed with error noVM and code 1) 2021-11-24 10:37:57,014+02 ERROR [org.ovirt.engine.core.bll.storage.backup.StopVmBackupCommand] (default task-12) [05741e5f-a41b-419e-9e0f-4134602142ac] Failed to stop VmBackup 'f9f259bf-abb6-470f-b83d-4eaa1c2f677a' 2021-11-24 10:37:57,029+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-12) [05741e5f-a41b-419e-9e0f-4134602142ac] EVENT_ID: VM_BACKUP_FAILED_TO_FINALIZE(10,795), Backup f9f259bf-abb6-470f-b83d-4eaa1c2f677a for VM vm1 failed to finalize (User: admin@internal-authz). 2021-11-24 10:37:57,042+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-12) [] Operation Failed: [] 2021-11-24 10:38:03,796+02 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-24) [0d8fbf8f-0bbb-4682-b4ac-a83644f665bf] Remove all the scratch disks that were created for backup 'f9f259bf-abb6-470f-b83d-4eaa1c2f677a' 2021-11-24 10:38:04,153+02 INFO [org.ovirt.engine.core.bll.storage.backup.RemoveScratchDisksCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-24) [0d8fbf8f-0bbb-4682-b4ac-a83644f665bf] Remove scratch disk 'VM vm1 backup f9f259bf-abb6-470f-b83d-4eaa1c2f677a scratch disk for vm1_Disk1' 2021-11-24 10:38:11,311+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-33496) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'RemoveDisk', 2021-11-24 10:38:11,314+02 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedThreadFactory-engine-Thread-33496) [56980fb0] Ending command 'org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand' successfully. 2021-11-24 10:38:11,315+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-33496) [56980fb0] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'RemoveDisk' completed, handling the result. 2021-11-24 10:38:11,315+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-33496) [56980fb0] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'RemoveDisk' succeeded, clearing tasks.
This behavior is expected. Due to bug 2016913, the backup flow (live or cold) continues as it was originally started. It means that even if you started a live backup operation and then the VM powered off, the engine still tries to stop the backup as it was live backup (and vice-versa). So in this case, you started a live backup flow -> powered-off the VM -> stopped backup -> Failed because the engine tries to stop a live backup while the VM is down so there VM to interact with on VDSM side to stop the backup. This is expected. In this case, the engine performed cleanup and set the backup operation as a failure.
Thanks, @eshenitz Moving this bug to Verified.
This bugzilla is included in oVirt 4.4.10 release, published on January 18th 2022. Since the problem described in this bug report should be resolved in oVirt 4.4.10 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.