Bug 1282957

Summary: LSM fails when one of the vm's disks is located on domain in maintenance
Product: [oVirt] ovirt-engine Reporter: Raz Tamir <ratamir>
Component: BLL.StorageAssignee: Fred Rolland <frolland>
Status: CLOSED CURRENTRELEASE QA Contact: Raz Tamir <ratamir>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.6.0.2CC: amureini, bugs, tnisan
Target Milestone: ovirt-3.6.2Keywords: Automation
Target Release: 3.6.2Flags: rule-engine: ovirt-3.6.z+
ylavi: planning_ack+
tnisan: devel_ack+
rule-engine: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1287024 1287025 (view as bug list) Environment:
Last Closed: 2016-02-18 11:20:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine and vdsm logs none

Description Raz Tamir 2015-11-17 22:01:16 UTC
Created attachment 1095712 [details]
engine and vdsm logs

Description of problem:
When trying to perform live storage migration (File to File) to one of the vm's disk, and other disk is located on storage domain in maintenance, the Auto-generated snapshot is failed to create.

From engine.log:
2015-11-17 22:36:52,905 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (org.ovirt.thread.pool-7-thread-19) [disks_syncAction_fb0cbd8b-3c54-453c] Running command: LiveMigrateVmDisksCommand Task handler: LiveSnapshotTaskHandler internal: false. Entities affected :  ID: 1fb1c29b-eab3-4443-b94f-e3c9198d7c11 Type: DiskAction group DISK_LIVE_STORAGE_MIGRATION with role type USER
2015-11-17 22:36:53,005 WARN  [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-7-thread-19) [4eab3fc8] CanDoAction of action 'CreateAllSnapshotsFromVm' failed for user admin@internal. Reasons: VAR__ACTION__CREATE,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Maintenance



Version-Release number of selected component (if applicable):
rhevm-3.6.0.3-0.1.el6.noarch
vdsm-4.17.10.1-0.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
Setup with 2 storage domains (NFS in my case), vm + 1 bootable disk.
2 more disks attached to the vm and each on different storage domain:
disk1 on nfs_sd1 and disk2 on nfs_sd2
1. Deactivate disk2 disk and maintenance nfs_sd2
2. Live migrate disk1
3.

Actual results:
In tasks tab: "Creating VM Snapshot Auto-generated for Live Storage Migration for VM live_storage_migration_nfs"


Expected results:
LSM should work


Additional info:

Comment 1 Idan Shaby 2015-11-22 11:32:08 UTC
After taking a look at the code, this is what I've found out:

1. The CDA of CreateAllSnapshotsFromVmCommand fails on its last step, validateStorage().

2. Patch I9f42f387781425d16f53a0e8a34d859365808ec0 changes the disks that should be validated to be all of the disks (including the inactive one) instead of only those coming back from getDisksListForChecks().

3. For some reason, the command gets as a parameter the id of the disk that we want to move (the active one) as an id of a disk that we should ignore (getParameters().getDiskIdsToIgnoreInChecks()), which might ruin getDisksListForChecks().

4. Anyway, the command doesn't end with an error message, the disk stays locked and ruins your env. The only thing I managed to do is to remove it from the db and then from the storage manually.

It might be enough to just replace the call to getSnappableVmDisks() (first line in validateStorage()) to getDisksListForChecks(), but since the mentioned above patch changed it a few months ago, we should dig a bit deeper to see what's going on there.

Comment 2 Sandro Bonazzola 2015-12-23 13:44:02 UTC
oVirt 3.6.2 RC1 has been released for testing, moving to ON_QA

Comment 3 Raz Tamir 2015-12-29 16:38:33 UTC
Verified on 
rhevm-3.6.2-0.1.el6.noarch
vdsm-4.17.15-0.el7ev.noarch