Red Hat Bugzilla – Bug 1282957
LSM fails when one of the vm's disks is located on domain in maintenance
Last modified: 2016-02-18 06:20:06 EST
Created attachment 1095712 [details]
engine and vdsm logs
Description of problem:
When trying to perform live storage migration (File to File) to one of the vm's disk, and other disk is located on storage domain in maintenance, the Auto-generated snapshot is failed to create.
2015-11-17 22:36:52,905 INFO [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (org.ovirt.thread.pool-7-thread-19) [disks_syncAction_fb0cbd8b-3c54-453c] Running command: LiveMigrateVmDisksCommand Task handler: LiveSnapshotTaskHandler internal: false. Entities affected : ID: 1fb1c29b-eab3-4443-b94f-e3c9198d7c11 Type: DiskAction group DISK_LIVE_STORAGE_MIGRATION with role type USER
2015-11-17 22:36:53,005 WARN [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-7-thread-19) [4eab3fc8] CanDoAction of action 'CreateAllSnapshotsFromVm' failed for user admin@internal. Reasons: VAR__ACTION__CREATE,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Maintenance
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Setup with 2 storage domains (NFS in my case), vm + 1 bootable disk.
2 more disks attached to the vm and each on different storage domain:
disk1 on nfs_sd1 and disk2 on nfs_sd2
1. Deactivate disk2 disk and maintenance nfs_sd2
2. Live migrate disk1
In tasks tab: "Creating VM Snapshot Auto-generated for Live Storage Migration for VM live_storage_migration_nfs"
LSM should work
After taking a look at the code, this is what I've found out:
1. The CDA of CreateAllSnapshotsFromVmCommand fails on its last step, validateStorage().
2. Patch I9f42f387781425d16f53a0e8a34d859365808ec0 changes the disks that should be validated to be all of the disks (including the inactive one) instead of only those coming back from getDisksListForChecks().
3. For some reason, the command gets as a parameter the id of the disk that we want to move (the active one) as an id of a disk that we should ignore (getParameters().getDiskIdsToIgnoreInChecks()), which might ruin getDisksListForChecks().
4. Anyway, the command doesn't end with an error message, the disk stays locked and ruins your env. The only thing I managed to do is to remove it from the db and then from the storage manually.
It might be enough to just replace the call to getSnappableVmDisks() (first line in validateStorage()) to getDisksListForChecks(), but since the mentioned above patch changed it a few months ago, we should dig a bit deeper to see what's going on there.
oVirt 3.6.2 RC1 has been released for testing, moving to ON_QA