Bug 2013932 - [CBT] VM backup scratch disks remains if the VM destroyed during the backup
Summary: [CBT] VM backup scratch disks remains if the VM destroyed during the backup
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.4.9
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.4.10
: 4.4.10.1
Assignee: Arik
QA Contact: sshmulev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-14 07:21 UTC by Eyal Shenitzky
Modified: 2022-01-19 07:00 UTC (History)
3 users (show)

Fixed In Version: ovirt-engine-4.4.10.1
Clone Of:
Environment:
Last Closed: 2022-01-19 07:00:13 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43806 0 None None None 2021-10-14 07:22:09 UTC
oVirt gerrit 117143 0 master MERGED core: set correct host for removing a backup scratch disk 2021-10-14 19:43:10 UTC
oVirt gerrit 117172 0 ovirt-engine-4.4 MERGED core: set correct host for removing a backup scratch disk 2021-10-17 12:19:43 UTC

Description Eyal Shenitzky 2021-10-14 07:21:32 UTC
Description of problem:

When a VM backup is taken for a running VM, scratch disks are created for the disks that participate in the backup.

If the VM is destroyed during the backup when the backup is finalized the scratch disks failed to teardown from the host and removed from the system.

Version-Release number of selected component (if applicable):
4.4.9

How reproducible:
100%

Steps to Reproduce:
1. Create a VM with a disk
2. Start the VM
3. Start a full backup for the VM
4. Stop the VM while the backup is still running (in READY phase) by using - 

POST ../ovirt-engine/api/vms/<vm-id>/stop

<action>
    <force>true</force>
</action>

5. Finalize the backup while the VM is down

Actual results:
VM backup finalized but the created scratch disks remain and not teardown from the host.

Expected results:
Created scratch disks should be teardown from the host that runs the VM backup and removed from the system.

Additional info:

Comment 1 sshmulev 2021-11-24 09:28:21 UTC
The scratch disks are gone according to the flow but the finalize backup operation fails with an exception, is that expected?

Versions:
ovirt-engine-4.4.10-0.17.el8ev
vdsm-4.40.100.1-1.el8ev

Steps to reproduce:(as bug description)

1. Create a VM with a disk
2. Start the VM
3. Start a full backup for the VM
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<backup href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19/backups/f9f259bf-abb6-470f-b83d-4eaa1c2f677a" id="f9f259bf-abb6-470f-b83d-4eaa1c2f677a">
    <actions>
        <link href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19/backups/f9f259bf-abb6-470f-b83d-4eaa1c2f677a/finalize" rel="finalize"/>
    </actions>
    <link href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19/backups/f9f259bf-abb6-470f-b83d-4eaa1c2f677a/disks" rel="disks"/>
    <creation_date>2021-11-24T10:35:45.527+02:00</creation_date>
    <modification_date>2021-11-24T10:35:45.717+02:00</modification_date>
    <phase>initializing</phase>
    <host href="/ovirt-engine/api/hosts/910c075a-c187-41b7-a986-d0200a15adf3" id="910c075a-c187-41b7-a986-d0200a15adf3"/>
    <vm href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19" id="31ece06a-a0d0-4f79-82e2-d9080a5bca19"/>
</backup>
4. Stop the VM while the backup is still running (in READY phase) by using - 
Getting the state of the backup (ready phase):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<backup href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19/backups/f9f259bf-abb6-470f-b83d-4eaa1c2f677a" id="f9f259bf-abb6-470f-b83d-4eaa1c2f677a">
    <actions>
        <link href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19/backups/f9f259bf-abb6-470f-b83d-4eaa1c2f677a/finalize" rel="finalize"/>
    </actions>
    <link href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19/backups/f9f259bf-abb6-470f-b83d-4eaa1c2f677a/disks" rel="disks"/>
    <creation_date>2021-11-24T10:35:45.527+02:00</creation_date>
    <modification_date>2021-11-24T10:36:00.557+02:00</modification_date>
    <phase>ready</phase>
    <to_checkpoint_id>57cd10c6-ff9b-4042-b7e6-4d60fa4daacf</to_checkpoint_id>
    <host href="/ovirt-engine/api/hosts/910c075a-c187-41b7-a986-d0200a15adf3" id="910c075a-c187-41b7-a986-d0200a15adf3"/>
    <vm href="/ovirt-engine/api/vms/31ece06a-a0d0-4f79-82e2-d9080a5bca19" id="31ece06a-a0d0-4f79-82e2-d9080a5bca19"/>
</backup>

Stop the VM:
POST ../ovirt-engine/api/vms/<vm-id>/stop

<action>
    <force>true</force>
</action>


5. Finalize the backup while the VM is down
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fault>
    <detail>[]</detail>
    <reason>Operation Failed</reason>
</fault>

Actual results:
VM backup failed to finalize but the created scratch disks have been removed in the teardown process.

logs:
2021-11-24 10:36:00,559+02 INFO  [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-46) [0d8fbf8f-0bbb-4682-b4ac-a83644f665bf] Ready to start image transfers

2021-11-24 10:36:51,686+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-95) [] VM '31ece06a-a0d0-4f79-82e2-d9080a5bca19'(vm1) moved from 'Up' --> 'PoweringDown'

2021-11-24 10:36:52,082+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-19) [494056bb] VM '31ece06a-a0d0-4f79-82e2-d9080a5bca19'(vm1) moved from 'PoweringDown' --> 'Down'
2021-11-24 10:36:52,090+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-19) [494056bb] EVENT_ID: VM_DOWN(61), VM vm1 is down. 

2021-11-24 10:37:57,003+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.StopVmBackupVDSCommand] (default task-12) [05741e5f-a41b-419e-9e0f-4134602142ac] FINISH, StopVmBackupVDSCommand, return: , log id: 446f97c5
2021-11-24 10:37:57,011+02 ERROR [org.ovirt.engine.core.bll.storage.backup.StopVmBackupCommand] (default task-12) [05741e5f-a41b-419e-9e0f-4134602142ac] Failed to execute VM.stopBackup operation: org.ovirt.engine.core.common.errors.Engine
Exception: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to StopVmBackupVDS, error = Virtual machine does not exist: {'vmId': '31ece06a-a0d0-4f79-82e2-d9080a5b
ca19'}, code = 1 (Failed with error noVM and code 1)


2021-11-24 10:37:57,014+02 ERROR [org.ovirt.engine.core.bll.storage.backup.StopVmBackupCommand] (default task-12) [05741e5f-a41b-419e-9e0f-4134602142ac] Failed to stop VmBackup 'f9f259bf-abb6-470f-b83d-4eaa1c2f677a'
2021-11-24 10:37:57,029+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-12) [05741e5f-a41b-419e-9e0f-4134602142ac] EVENT_ID: VM_BACKUP_FAILED_TO_FINALIZE(10,795), Backup f9f259bf-abb6-470f-b83d-4eaa1c2f677a for VM vm1 failed to finalize (User: admin@internal-authz).
2021-11-24 10:37:57,042+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-12) [] Operation Failed: []


2021-11-24 10:38:03,796+02 INFO  [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-24) [0d8fbf8f-0bbb-4682-b4ac-a83644f665bf] Remove all the scratch disks that were created for backup 'f9f259bf-abb6-470f-b83d-4eaa1c2f677a'


2021-11-24 10:38:04,153+02 INFO  [org.ovirt.engine.core.bll.storage.backup.RemoveScratchDisksCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-24) [0d8fbf8f-0bbb-4682-b4ac-a83644f665bf] Remove scratch disk 'VM vm1 backup f9f259bf-abb6-470f-b83d-4eaa1c2f677a scratch disk for vm1_Disk1'


2021-11-24 10:38:11,311+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-33496) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'RemoveDisk',
2021-11-24 10:38:11,314+02 INFO  [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedThreadFactory-engine-Thread-33496) [56980fb0] Ending command 'org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand' successfully.
2021-11-24 10:38:11,315+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-33496) [56980fb0] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'RemoveDisk' completed, handling the result.
2021-11-24 10:38:11,315+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-33496) [56980fb0] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'RemoveDisk' succeeded, clearing tasks.

Comment 3 Eyal Shenitzky 2021-11-25 07:34:51 UTC
This behavior is expected. 

Due to bug 2016913, the backup flow (live or cold) continues as it was originally started.
It means that even if you started a live backup operation and then the VM powered off, the engine still tries to stop the backup as it was live backup (and vice-versa).

So in this case, you started a live backup flow -> powered-off the VM -> stopped backup -> Failed because the engine tries to stop a live backup while the VM is down so there VM to interact with on VDSM side to stop the backup.

This is expected.

In this case, the engine performed cleanup and set the backup operation as a failure.

Comment 4 sshmulev 2021-11-25 07:41:13 UTC
Thanks, @eshenitz
Moving this bug to Verified.

Comment 6 Sandro Bonazzola 2022-01-19 07:00:13 UTC
This bugzilla is included in oVirt 4.4.10 release, published on January 18th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.4.10 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.