Bug 1420258

Summary: Live Storage Migration failed due to 'disk profile does not found'
Product: [oVirt] ovirt-engine Reporter: Eyal Shenitzky <eshenitz>
Component: BLL.StorageAssignee: Andrej Krejcir <akrejcir>
Status: CLOSED CURRENTRELEASE QA Contact: Eyal Shenitzky <eshenitz>
Severity: high Docs Contact:
Priority: high    
Version: 4.1.0CC: akrejcir, bugs, dfediuck, eshenitz, mgoldboi, msivak, stirabos, tnisan
Target Milestone: ovirt-4.1.3Keywords: Automation, Regression, Reopened
Target Release: ---Flags: rule-engine: ovirt-4.1+
rule-engine: blocker+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When a VM template has a disk copied to multiple storage domains, each copy has a different disk profile assigned. Then, when creating a VM from the template using REST API, wrong disk profile was used for some storage domains. Consequence: The disks of the new VM are stored to the DB even if their disk profile does not belong to the disk's storage domain. This breaks subsequent commands which assume correct disk profile assignment in the DB. Fix: When creating a VM from the template, use disk profile from the correct storage domain. Result: The data in the DB is correct and subsequent commands work.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-06 14:01:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine and vdsm logs
none
engine logs
none
vdsm logs
none
PUT and POST REST_API commands none

Description Eyal Shenitzky 2017-02-08 09:59:23 UTC
Created attachment 1248573 [details]
engine and vdsm logs

Description of problem:

Failed to perform live storage migration to VM's disks, the VM created using 
REST-API as thin-copy from template.

The operation failed with the following error:

        Status: 400
        Reason: Bad Request
        Detail: [Cannot move Virtual Disk. Cannot find a disk profile defined on storage domain ed51367e-1842-48ad-9eb9-bf6e4f3e6b4c.]

 

Version-Release number of selected component (if applicable):
Engine - 4.1.0.4-0.1.el7
VDSM - 4.19.4-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce USING REST-API:
1. Create a VM with disk
2. Create a template from the VM in step 1
3. Copy the template disk to another storage domain
4. Create a VM as thin copy from the template in step 2
5. Start the VM from step 4
6. Move the disk to the other storage domain that has the disk copy

Actual results:
Live storage migration failed

Expected results:
Live storage migration should succeed

Additional info:
The regression keyword is because of - Bug 1361838

Comment 1 Tal Nisan 2017-02-08 21:02:04 UTC
Martin, it seems like SLA's turf, can you please have a look?

Comment 2 Martin Sivák 2017-02-10 09:16:19 UTC
This looks very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1361838, but I have no idea what are the implications of live storage migration here.

Comment 3 Martin Sivák 2017-02-10 09:22:28 UTC
The code here actually checks whether a disk profile is defined (diskImage.getDiskProfileId() == null && storageDomainId != null):
- When it is not we select a proper one automatically (thanks to the mentioned #1361838);
- But when a profile is set, we attempt to use it and fail with this error if it is not possible.


Can you check whether your REST calls force the engine to try to use the profile from the origin for the destination storage domain? It might be enough to leave that field empty or something to let the engine solve it by itself.

Comment 4 Eyal Shenitzky 2017-02-12 07:19:26 UTC
There is no specific demand for disk profile, only to create VM from template.

Comment 5 Tal Nisan 2017-02-12 14:08:39 UTC
Moving to SLA for now as it seems like a disk profile issue

Comment 6 Martin Sivák 2017-02-13 09:00:31 UTC
Eyal: Every disk has a disk profile (empty one by default). Can you give us detailed reproduction steps? Like the actual REST / SDK calls you used?

Comment 7 Eyal Shenitzky 2017-02-23 07:37:34 UTC
I didn't managed to reproduce the bug on engine version- 4.1.1-0.1.el7
Close this bug for now and re-open if occur again

Comment 8 Eyal Shenitzky 2017-03-07 13:06:53 UTC
Created attachment 1260797 [details]
engine logs

Comment 9 Eyal Shenitzky 2017-03-07 13:11:29 UTC
This bug reproduced again.

It seems like it doesn't reproducible by 100%, something like 70%.

I attached the logs of the run.

It reproduced only in our automation that based on REST-API commands.

Error in the engine log:
2017-03-06 17:08:45,765+02 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveDisksCommand] (default task-10) [disks_syncAction_ff4fadc6-be1b-4b58] Running command: MoveDisksCommand internal: false. Entities affected :  ID: 07b22a8b-e904-44bd-80e9-d15bca44a4f2 Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER
2017-03-06 17:08:45,843+02 INFO  [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateVmDisksCommand] (default task-10) [disks_syncAction_ff4fadc6-be1b-4b58] Lock Acquired to object 'EngineLock:{exclusiveLocks='[07b22a8b-e904-44bd-80e9-d15bca44a4f2=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName disk_type_0617030011>]', sharedLocks='[fe170acc-408f-4add-bf1c-8fd7b8df2d85=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}'
2017-03-06 17:08:45,979+02 WARN  [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateVmDisksCommand] (default task-10) [disks_syncAction_ff4fadc6-be1b-4b58] Validation of action 'LiveMigrateVmDisks' failed for user admin@internal-authz. Reasons: VAR__ACTION__MOVE,VAR__TYPE__DISK,ACTION_TYPE_DISK_PROFILE_NOT_FOUND_FOR_STORAGE_DOMAIN,$storageDomainId ff212ce4-84ad-423f-a45a-b54e1ed40ef8
2017-03-06 17:08:45,981+02 INFO  [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateVmDisksCommand] (default task-10) [disks_syncAction_ff4fadc6-be1b-4b58] Lock freed to object 'EngineLock:{exclusiveLocks='[07b22a8b-e904-44bd-80e9-d15bca44a4f2=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName disk_type_0617030011>]', sharedLocks='[fe170acc-408f-4add-bf1c-8fd7b8df2d85=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}'
2017-03-06 17:08:45,985+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-10) [] Operation Failed: [Cannot move Virtual Disk. Cannot find a disk profile defined on storage domain ff212ce4-84ad-423f-a45a-b54e1ed40ef8.]
2017-03-06 17:08:47,965+02 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler6) [6d2bc85] Domain 'ff212ce4-84ad-423f-a45a-b54e1ed40ef8:iscsi_0' report isn't an actual report

Comment 10 Eyal Shenitzky 2017-03-07 13:12:08 UTC
Created attachment 1260798 [details]
vdsm logs

Comment 11 Martin Sivák 2017-03-07 15:37:18 UTC
Eyal, can you please post the detailed reproducer steps too? Ideally the actual REST requests you are doing.

Comment 12 Eyal Shenitzky 2017-03-12 05:16:10 UTC
The steps to reproduce the bug remains the same.
This bug occured on our automation runs so I can't tell what is the specific step/command that cause this failure, also,
This is only an assumption that the REST command related to this, because of the previous bug on the same issue and because I was unable to reproduce it manually.
https://bugzilla.redhat.com/show_bug.cgi?id=1361838

Comment 13 Martin Sivák 2017-03-14 12:00:26 UTC
Eyal, I do not have access to your automation and I NEED to see the REST requests. Otherwise I can't reproduce it properly. This will happen when the step 6 request (PUSH to different storage domain?) contains a reference to the original IO profile.

Comment 14 Eyal Shenitzky 2017-03-15 08:06:12 UTC
Created attachment 1263204 [details]
PUT and POST REST_API commands

Comment 15 Eyal Shenitzky 2017-03-15 08:08:43 UTC
Martin, I attached a summery of PUT and POST REST-API commands from our automation run of the test.
Pls note that this specific test does passed but the commands are the same.
Let me know if you need anything else

Comment 16 Martin Sivák 2017-03-15 11:19:32 UTC
Targeting for 4.1.2 for deeper analysis and then we will decide when we will fix this once we know what is wrong.

Comment 17 Red Hat Bugzilla Rules Engine 2017-03-15 11:19:38 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 18 Andrej Krejcir 2017-04-19 11:56:57 UTC
Debugging in the testing environment showed that the live disk migration is failing because of an invalid data row in the DB at the time of the migration.
The disk image being migrated has one storage domain ID and one disk profile ID assigned, but the disk profile does not belong to the storage domain, and that triggers the error.

This invalid data is written by a previous AddVmCommand, which calls CreateSnapshotFromTemplateCommand at [1] and this command writes the invalid data at [2].
The ImageStorageDomainMap written to the DB has disk profile ID which does not belong to the storage domain.

[1] - org.ovirt.engine.core.bll.AddVmCommand # addVmImages
[2] - org.ovirt.engine.core.bll.storage.disk.image.BaseImagesCommand # addDiskImageToDb

Comment 19 Eyal Shenitzky 2017-06-04 11:47:31 UTC
 Verified with the following code:
---------------------------------------
VDSM -4.19.17-1.el7ev.x86_64
RHEVM -4.1.3.1-0.1.el7

Steps to reproduce:
------------------------
1. Create a VM with disk
2. Create a template from the VM in step 1
3. Copy the template disk to another storage domain
4. Create a VM as thin copy from the template in step 2
5. Start the VM from step 4
6. Move the disk to the other storage domain that has the disk copy

Moving to VERIFIED