Bug 1826348

Summary: [Incremental backup] Full backup during live disk migration should not be allowed
Product: [oVirt] ovirt-engine Reporter: Ilan Zuckerman <izuckerm>
Component: BLL.StorageAssignee: Eyal Shenitzky <eshenitz>
Status: CLOSED CURRENTRELEASE QA Contact: Ilan Zuckerman <izuckerm>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4.0CC: aefrat, bugs, eshenitz, tnisan
Target Milestone: ovirt-4.4.1Flags: pm-rhel: ovirt-4.4+
aefrat: planning_ack?
aefrat: devel_ack?
aefrat: testing_ack+
Target Release: 4.4.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.1.5 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-08 08:26:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1836627    
Bug Blocks:    
Attachments:
Description Flags
engine log
none
vdsm log none

Description Ilan Zuckerman 2020-04-21 13:30:43 UTC
Created attachment 1680557 [details]
engine log

Description of problem:

When invoking full backup for a disk which is currently being migrated to another SD, the process is attempting to be started although the disk is locked.
This is causing the engine to throw Exception:

2020-04-21 16:03:05,859+03 ERROR [org.ovirt.engine.core.bll.StartVmBackupCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-4) [e9340d04-5937-4dc0-a6c9-e9ed9ab2d09c] Failed to execute VM backup operation 'StartVmBackup': {}: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to StartVmBackupVDS, error = Backup Error: {'vm_id': 'fefd90c8-94a5-4c68-83ae-9c3462655ca9', 'backup': <vdsm.virt.backup.BackupConfig object at 0x7fb9b420d550>, 'reason': "Failed to find one of the backup disks: No such drive: '{'domainID': 'f3f88292-1287-4635-83ac-f8c2cf482a9e', 'imageID': 'c3d127e4-b6cf-4dda-a5fb-3064953e67a3', 'volumeID': '4cab4cf5-0cfc-4eac-8e15-42bbdb9a4e7c'}'"}, code = 1600 (Failed with error unexpected and code 16)


And causing vdsm to throw ERROR:

LookupError: No such drive: '{'domainID': 'f3f88292-1287-4635-83ac-f8c2cf482a9e', 'imageID': 'c3d127e4-b6cf-4dda-a5fb-3064953e67a3', 'volumeID': '4cab4cf5-0cf
c-4eac-8e15-42bbdb9a4e7c'}'


Also, the API response returning phase "starting" instead of error message telling me that the disk is locked and can not be backed up:

POST {{engine}}vms/{{myvm_id}}/backups

Body:

<backup>
    <disks>
        <disk id="{{qcow_disk_id}}" />
    </disks>
</backup>

Response:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<backup href="/ovirt-engine/api/vms/38e0898b-b6a8-4238-b400-e17928f6a926/backups/8d8079fa-2d25-4faf-853e-856ba22f3889" id="8d8079fa-2d25-4faf-853e-856ba22f3889">
    <actions>
        <link href="/ovirt-engine/api/vms/38e0898b-b6a8-4238-b400-e17928f6a926/backups/8d8079fa-2d25-4faf-853e-856ba22f3889/finalize" rel="finalize"/>
    </actions>
    <link href="/ovirt-engine/api/vms/38e0898b-b6a8-4238-b400-e17928f6a926/backups/8d8079fa-2d25-4faf-853e-856ba22f3889/disks" rel="disks"/>
    <creation_date>2020-04-21T11:31:36.755+03:00</creation_date>
    <phase>starting</phase>
    <vm href="/ovirt-engine/api/vms/38e0898b-b6a8-4238-b400-e17928f6a926" id="38e0898b-b6a8-4238-b400-e17928f6a926"/>
</backup>


Version-Release number of selected component (if applicable):
vdsm-4.40.13-1.el8ev.x86_64
ovirt-engine-4.4.0-0.33.master.el8ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create blank vm
2. Create Qcow disk with incremental backup enabled + attach it to the vm as os disk on ISCSI
3. Start the vm -> wait till it starts
4. Migrate the disk to another ISCSI domain
5. As soon as the migration starts, and the disk getting locked, invoke with API full backup for the subject disk


Actual results:

The full backup request starts to process, causing Exceptions on VDSM and ENGINE.
API response should be something different than regular response which indicates that the full backup process has started.
For example: "the disk is locked and can not be backed up currently"

Expected results:
The backup operations should be blocked during disk/VM migration (live or cold) and vice-versa.

Additional info:
Attaching Engine log and relevant vdsm log.

Comment 1 Ilan Zuckerman 2020-04-21 13:31:35 UTC
Created attachment 1680558 [details]
vdsm log

Comment 2 Sandro Bonazzola 2020-06-19 09:45:51 UTC
This bug is in modified state and targeting 4.4.2. Can this be re-targeted to 4.4.1?

Comment 3 Ilan Zuckerman 2020-06-29 12:56:35 UTC
Verified on rhv-release-4.4.1-5-001.noarch

1. Create blank vm
2. Create Qcow disk with incremental backup enabled + attach it to the vm as os disk on ISCSI
3. Start the vm -> wait till it starts
4. Migrate the disk to another ISCSI domain
5. As soon as the migration starts, and the disk getting locked, invoke with API full backup for the subject disk

Expected:
The backup operations should be blocked during disk/VM migration (live or cold) and vice-versa.

Actual:
Backup operation is blocked with the following error message from engine:

"Cannot backup VM: The following disks are locked: 26780_qcow_incr_enabled. Please try again in a few minutes."

Comment 4 Sandro Bonazzola 2020-07-08 08:26:06 UTC
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.