Created attachment 1248573 [details] engine and vdsm logs Description of problem: Failed to perform live storage migration to VM's disks, the VM created using REST-API as thin-copy from template. The operation failed with the following error: Status: 400 Reason: Bad Request Detail: [Cannot move Virtual Disk. Cannot find a disk profile defined on storage domain ed51367e-1842-48ad-9eb9-bf6e4f3e6b4c.] Version-Release number of selected component (if applicable): Engine - 4.1.0.4-0.1.el7 VDSM - 4.19.4-1.el7ev.x86_64 How reproducible: 100% Steps to Reproduce USING REST-API: 1. Create a VM with disk 2. Create a template from the VM in step 1 3. Copy the template disk to another storage domain 4. Create a VM as thin copy from the template in step 2 5. Start the VM from step 4 6. Move the disk to the other storage domain that has the disk copy Actual results: Live storage migration failed Expected results: Live storage migration should succeed Additional info: The regression keyword is because of - Bug 1361838
Martin, it seems like SLA's turf, can you please have a look?
This looks very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1361838, but I have no idea what are the implications of live storage migration here.
The code here actually checks whether a disk profile is defined (diskImage.getDiskProfileId() == null && storageDomainId != null): - When it is not we select a proper one automatically (thanks to the mentioned #1361838); - But when a profile is set, we attempt to use it and fail with this error if it is not possible. Can you check whether your REST calls force the engine to try to use the profile from the origin for the destination storage domain? It might be enough to leave that field empty or something to let the engine solve it by itself.
There is no specific demand for disk profile, only to create VM from template.
Moving to SLA for now as it seems like a disk profile issue
Eyal: Every disk has a disk profile (empty one by default). Can you give us detailed reproduction steps? Like the actual REST / SDK calls you used?
I didn't managed to reproduce the bug on engine version- 4.1.1-0.1.el7 Close this bug for now and re-open if occur again
Created attachment 1260797 [details] engine logs
This bug reproduced again. It seems like it doesn't reproducible by 100%, something like 70%. I attached the logs of the run. It reproduced only in our automation that based on REST-API commands. Error in the engine log: 2017-03-06 17:08:45,765+02 INFO [org.ovirt.engine.core.bll.storage.disk.MoveDisksCommand] (default task-10) [disks_syncAction_ff4fadc6-be1b-4b58] Running command: MoveDisksCommand internal: false. Entities affected : ID: 07b22a8b-e904-44bd-80e9-d15bca44a4f2 Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER 2017-03-06 17:08:45,843+02 INFO [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateVmDisksCommand] (default task-10) [disks_syncAction_ff4fadc6-be1b-4b58] Lock Acquired to object 'EngineLock:{exclusiveLocks='[07b22a8b-e904-44bd-80e9-d15bca44a4f2=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName disk_type_0617030011>]', sharedLocks='[fe170acc-408f-4add-bf1c-8fd7b8df2d85=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}' 2017-03-06 17:08:45,979+02 WARN [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateVmDisksCommand] (default task-10) [disks_syncAction_ff4fadc6-be1b-4b58] Validation of action 'LiveMigrateVmDisks' failed for user admin@internal-authz. Reasons: VAR__ACTION__MOVE,VAR__TYPE__DISK,ACTION_TYPE_DISK_PROFILE_NOT_FOUND_FOR_STORAGE_DOMAIN,$storageDomainId ff212ce4-84ad-423f-a45a-b54e1ed40ef8 2017-03-06 17:08:45,981+02 INFO [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateVmDisksCommand] (default task-10) [disks_syncAction_ff4fadc6-be1b-4b58] Lock freed to object 'EngineLock:{exclusiveLocks='[07b22a8b-e904-44bd-80e9-d15bca44a4f2=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName disk_type_0617030011>]', sharedLocks='[fe170acc-408f-4add-bf1c-8fd7b8df2d85=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}' 2017-03-06 17:08:45,985+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-10) [] Operation Failed: [Cannot move Virtual Disk. Cannot find a disk profile defined on storage domain ff212ce4-84ad-423f-a45a-b54e1ed40ef8.] 2017-03-06 17:08:47,965+02 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler6) [6d2bc85] Domain 'ff212ce4-84ad-423f-a45a-b54e1ed40ef8:iscsi_0' report isn't an actual report
Created attachment 1260798 [details] vdsm logs
Eyal, can you please post the detailed reproducer steps too? Ideally the actual REST requests you are doing.
The steps to reproduce the bug remains the same. This bug occured on our automation runs so I can't tell what is the specific step/command that cause this failure, also, This is only an assumption that the REST command related to this, because of the previous bug on the same issue and because I was unable to reproduce it manually. https://bugzilla.redhat.com/show_bug.cgi?id=1361838
Eyal, I do not have access to your automation and I NEED to see the REST requests. Otherwise I can't reproduce it properly. This will happen when the step 6 request (PUSH to different storage domain?) contains a reference to the original IO profile.
Created attachment 1263204 [details] PUT and POST REST_API commands
Martin, I attached a summery of PUT and POST REST-API commands from our automation run of the test. Pls note that this specific test does passed but the commands are the same. Let me know if you need anything else
Targeting for 4.1.2 for deeper analysis and then we will decide when we will fix this once we know what is wrong.
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Debugging in the testing environment showed that the live disk migration is failing because of an invalid data row in the DB at the time of the migration. The disk image being migrated has one storage domain ID and one disk profile ID assigned, but the disk profile does not belong to the storage domain, and that triggers the error. This invalid data is written by a previous AddVmCommand, which calls CreateSnapshotFromTemplateCommand at [1] and this command writes the invalid data at [2]. The ImageStorageDomainMap written to the DB has disk profile ID which does not belong to the storage domain. [1] - org.ovirt.engine.core.bll.AddVmCommand # addVmImages [2] - org.ovirt.engine.core.bll.storage.disk.image.BaseImagesCommand # addDiskImageToDb
Verified with the following code: --------------------------------------- VDSM -4.19.17-1.el7ev.x86_64 RHEVM -4.1.3.1-0.1.el7 Steps to reproduce: ------------------------ 1. Create a VM with disk 2. Create a template from the VM in step 1 3. Copy the template disk to another storage domain 4. Create a VM as thin copy from the template in step 2 5. Start the VM from step 4 6. Move the disk to the other storage domain that has the disk copy Moving to VERIFIED