Bug 1282957 - LSM fails when one of the vm's disks is located on domain in maintenance
LSM fails when one of the vm's disks is located on domain in maintenance
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage (Show other bugs)
3.6.0.2
Unspecified Unspecified
unspecified Severity medium (vote)
: ovirt-3.6.2
: 3.6.2
Assigned To: Fred Rolland
Raz Tamir
: Automation
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-17 17:01 EST by Raz Tamir
Modified: 2016-02-18 06:20 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1287024 1287025 (view as bug list)
Environment:
Last Closed: 2016-02-18 06:20:06 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
ylavi: planning_ack+
tnisan: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
engine and vdsm logs (29.47 KB, application/x-gzip)
2015-11-17 17:01 EST, Raz Tamir
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 49446 master MERGED engine: Storage validation in LSM flow Never
oVirt gerrit 49953 ovirt-engine-3.6 MERGED engine: Storage validation in LSM flow Never

  None (edit)
Description Raz Tamir 2015-11-17 17:01:16 EST
Created attachment 1095712 [details]
engine and vdsm logs

Description of problem:
When trying to perform live storage migration (File to File) to one of the vm's disk, and other disk is located on storage domain in maintenance, the Auto-generated snapshot is failed to create.

From engine.log:
2015-11-17 22:36:52,905 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (org.ovirt.thread.pool-7-thread-19) [disks_syncAction_fb0cbd8b-3c54-453c] Running command: LiveMigrateVmDisksCommand Task handler: LiveSnapshotTaskHandler internal: false. Entities affected :  ID: 1fb1c29b-eab3-4443-b94f-e3c9198d7c11 Type: DiskAction group DISK_LIVE_STORAGE_MIGRATION with role type USER
2015-11-17 22:36:53,005 WARN  [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-7-thread-19) [4eab3fc8] CanDoAction of action 'CreateAllSnapshotsFromVm' failed for user admin@internal. Reasons: VAR__ACTION__CREATE,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Maintenance



Version-Release number of selected component (if applicable):
rhevm-3.6.0.3-0.1.el6.noarch
vdsm-4.17.10.1-0.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
Setup with 2 storage domains (NFS in my case), vm + 1 bootable disk.
2 more disks attached to the vm and each on different storage domain:
disk1 on nfs_sd1 and disk2 on nfs_sd2
1. Deactivate disk2 disk and maintenance nfs_sd2
2. Live migrate disk1
3.

Actual results:
In tasks tab: "Creating VM Snapshot Auto-generated for Live Storage Migration for VM live_storage_migration_nfs"


Expected results:
LSM should work


Additional info:
Comment 1 Idan Shaby 2015-11-22 06:32:08 EST
After taking a look at the code, this is what I've found out:

1. The CDA of CreateAllSnapshotsFromVmCommand fails on its last step, validateStorage().

2. Patch I9f42f387781425d16f53a0e8a34d859365808ec0 changes the disks that should be validated to be all of the disks (including the inactive one) instead of only those coming back from getDisksListForChecks().

3. For some reason, the command gets as a parameter the id of the disk that we want to move (the active one) as an id of a disk that we should ignore (getParameters().getDiskIdsToIgnoreInChecks()), which might ruin getDisksListForChecks().

4. Anyway, the command doesn't end with an error message, the disk stays locked and ruins your env. The only thing I managed to do is to remove it from the db and then from the storage manually.

It might be enough to just replace the call to getSnappableVmDisks() (first line in validateStorage()) to getDisksListForChecks(), but since the mentioned above patch changed it a few months ago, we should dig a bit deeper to see what's going on there.
Comment 2 Sandro Bonazzola 2015-12-23 08:44:02 EST
oVirt 3.6.2 RC1 has been released for testing, moving to ON_QA
Comment 3 Raz Tamir 2015-12-29 11:38:33 EST
Verified on 
rhevm-3.6.2-0.1.el6.noarch
vdsm-4.17.15-0.el7ev.noarch

Note You need to log in before you can comment on or make changes to this bug.