Bug 1420258
Summary: | Live Storage Migration failed due to 'disk profile does not found' | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Eyal Shenitzky <eshenitz> | ||||||||||
Component: | BLL.Storage | Assignee: | Andrej Krejcir <akrejcir> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Eyal Shenitzky <eshenitz> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | high | ||||||||||||
Version: | 4.1.0 | CC: | akrejcir, bugs, dfediuck, eshenitz, mgoldboi, msivak, stirabos, tnisan | ||||||||||
Target Milestone: | ovirt-4.1.3 | Keywords: | Automation, Regression, Reopened | ||||||||||
Target Release: | --- | Flags: | rule-engine:
ovirt-4.1+
rule-engine: blocker+ |
||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: |
Cause:
When a VM template has a disk copied to multiple storage domains, each copy has a different disk profile assigned. Then, when creating a VM from the template using REST API, wrong disk profile was used for some storage domains.
Consequence:
The disks of the new VM are stored to the DB even if their disk profile does not belong to the disk's storage domain. This breaks subsequent commands which assume correct disk profile assignment in the DB.
Fix:
When creating a VM from the template, use disk profile from the correct storage domain.
Result:
The data in the DB is correct and subsequent commands work.
|
Story Points: | --- | ||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2017-07-06 14:01:36 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Eyal Shenitzky
2017-02-08 09:59:23 UTC
Martin, it seems like SLA's turf, can you please have a look? This looks very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1361838, but I have no idea what are the implications of live storage migration here. The code here actually checks whether a disk profile is defined (diskImage.getDiskProfileId() == null && storageDomainId != null): - When it is not we select a proper one automatically (thanks to the mentioned #1361838); - But when a profile is set, we attempt to use it and fail with this error if it is not possible. Can you check whether your REST calls force the engine to try to use the profile from the origin for the destination storage domain? It might be enough to leave that field empty or something to let the engine solve it by itself. There is no specific demand for disk profile, only to create VM from template. Moving to SLA for now as it seems like a disk profile issue Eyal: Every disk has a disk profile (empty one by default). Can you give us detailed reproduction steps? Like the actual REST / SDK calls you used? I didn't managed to reproduce the bug on engine version- 4.1.1-0.1.el7 Close this bug for now and re-open if occur again Created attachment 1260797 [details]
engine logs
This bug reproduced again. It seems like it doesn't reproducible by 100%, something like 70%. I attached the logs of the run. It reproduced only in our automation that based on REST-API commands. Error in the engine log: 2017-03-06 17:08:45,765+02 INFO [org.ovirt.engine.core.bll.storage.disk.MoveDisksCommand] (default task-10) [disks_syncAction_ff4fadc6-be1b-4b58] Running command: MoveDisksCommand internal: false. Entities affected : ID: 07b22a8b-e904-44bd-80e9-d15bca44a4f2 Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER 2017-03-06 17:08:45,843+02 INFO [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateVmDisksCommand] (default task-10) [disks_syncAction_ff4fadc6-be1b-4b58] Lock Acquired to object 'EngineLock:{exclusiveLocks='[07b22a8b-e904-44bd-80e9-d15bca44a4f2=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName disk_type_0617030011>]', sharedLocks='[fe170acc-408f-4add-bf1c-8fd7b8df2d85=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}' 2017-03-06 17:08:45,979+02 WARN [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateVmDisksCommand] (default task-10) [disks_syncAction_ff4fadc6-be1b-4b58] Validation of action 'LiveMigrateVmDisks' failed for user admin@internal-authz. Reasons: VAR__ACTION__MOVE,VAR__TYPE__DISK,ACTION_TYPE_DISK_PROFILE_NOT_FOUND_FOR_STORAGE_DOMAIN,$storageDomainId ff212ce4-84ad-423f-a45a-b54e1ed40ef8 2017-03-06 17:08:45,981+02 INFO [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateVmDisksCommand] (default task-10) [disks_syncAction_ff4fadc6-be1b-4b58] Lock freed to object 'EngineLock:{exclusiveLocks='[07b22a8b-e904-44bd-80e9-d15bca44a4f2=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName disk_type_0617030011>]', sharedLocks='[fe170acc-408f-4add-bf1c-8fd7b8df2d85=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}' 2017-03-06 17:08:45,985+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-10) [] Operation Failed: [Cannot move Virtual Disk. Cannot find a disk profile defined on storage domain ff212ce4-84ad-423f-a45a-b54e1ed40ef8.] 2017-03-06 17:08:47,965+02 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler6) [6d2bc85] Domain 'ff212ce4-84ad-423f-a45a-b54e1ed40ef8:iscsi_0' report isn't an actual report Created attachment 1260798 [details]
vdsm logs
Eyal, can you please post the detailed reproducer steps too? Ideally the actual REST requests you are doing. The steps to reproduce the bug remains the same. This bug occured on our automation runs so I can't tell what is the specific step/command that cause this failure, also, This is only an assumption that the REST command related to this, because of the previous bug on the same issue and because I was unable to reproduce it manually. https://bugzilla.redhat.com/show_bug.cgi?id=1361838 Eyal, I do not have access to your automation and I NEED to see the REST requests. Otherwise I can't reproduce it properly. This will happen when the step 6 request (PUSH to different storage domain?) contains a reference to the original IO profile. Created attachment 1263204 [details]
PUT and POST REST_API commands
Martin, I attached a summery of PUT and POST REST-API commands from our automation run of the test. Pls note that this specific test does passed but the commands are the same. Let me know if you need anything else Targeting for 4.1.2 for deeper analysis and then we will decide when we will fix this once we know what is wrong. This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. Debugging in the testing environment showed that the live disk migration is failing because of an invalid data row in the DB at the time of the migration. The disk image being migrated has one storage domain ID and one disk profile ID assigned, but the disk profile does not belong to the storage domain, and that triggers the error. This invalid data is written by a previous AddVmCommand, which calls CreateSnapshotFromTemplateCommand at [1] and this command writes the invalid data at [2]. The ImageStorageDomainMap written to the DB has disk profile ID which does not belong to the storage domain. [1] - org.ovirt.engine.core.bll.AddVmCommand # addVmImages [2] - org.ovirt.engine.core.bll.storage.disk.image.BaseImagesCommand # addDiskImageToDb Verified with the following code: --------------------------------------- VDSM -4.19.17-1.el7ev.x86_64 RHEVM -4.1.3.1-0.1.el7 Steps to reproduce: ------------------------ 1. Create a VM with disk 2. Create a template from the VM in step 1 3. Copy the template disk to another storage domain 4. Create a VM as thin copy from the template in step 2 5. Start the VM from step 4 6. Move the disk to the other storage domain that has the disk copy Moving to VERIFIED |