Bug 957498

Summary: engine: if vm is killed during live storage migration during the create snapshot part we end the command as if live migration succeeded.
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Daniel Erez <derez>
Status: CLOSED CURRENTRELEASE QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: acathrow, amureini, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, scohen, yeylon, ykaul
Target Milestone: ---Keywords: Regression
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: sf16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-06-11 08:18:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 948448    
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-04-28 14:20:45 UTC
Created attachment 741167 [details]
logs

Description of problem:

I killed -9 the vm's qemu pid during the create snapshot part of Live Storage Migration. 
the create snapshot succeeds and and we report that the LiveStorageMigration has succeeded:

EndAction for action type LiveMigrateVmDisks succeeded, clearing tasks.

and this is the UI event log: 

	
2013-Apr-28, 17:05 User admin@internal moving disk NFS-RHEL6_iSCSI_Disk1 to domain Dafna-32-02.
	
2013-Apr-28, 17:05 Snapshot 'Auto-generated for Live Storage Migration' creation for VM 'kill_me' has been completed.
	
2013-Apr-28, 17:05 VM kill_me is down. Exit message: Lost connection with qemu process.

in actuallity, we did create the snapshot but the move disk has not started and there are no tasks in spm to suggest that any were even sent: 

[root@gold-vdsc ~]# vdsClient -s 0 getAllTasksInfo

[root@gold-vdsc ~]# 


Version-Release number of selected component (if applicable):

sf14

How reproducible:

100%

Steps to Reproduce:
1. create and run a vm on a two hosts cluster with iscsi storage
2. start a live migration action on the disk
3. while snapshot is created kill -9 the vm's pid on the host
  
Actual results:

the live snapshot is completeted but although we report that the disk is moving in UI and engine reports that the live migration succeeded, no such task is actually sent to the spm and disk is not migrated

Expected results:

we should report failure in the migration

Additional info: logs


2013-04-28 17:05:39,395 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-26) vm kill_me running in db and not running in vds - add to rerun treatment. vds gold-vdsd
2013-04-28 17:05:40,140 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (QuartzScheduler_Worker-40) Polling and updating Async Tasks: 1 tasks, 1 tasks to poll now
2013-04-28 17:05:40,170 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-40) SPMAsyncTask::PollTask: Polling task 57c84733-7b9f-46e8-a32e-47d8520cf10b (Parent Command LiveMigrateVmDisks, Parameters Type org.ovirt.e
ngine.core.common.asynctasks.AsyncTaskParameters) returned status running.
2013-04-28 17:05:40,170 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (QuartzScheduler_Worker-40) Finished polling Tasks, will poll again in 10 seconds.
2013-04-28 17:05:50,199 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-30) SPMAsyncTask::PollTask: Polling task 57c84733-7b9f-46e8-a32e-47d8520cf10b (Parent Command LiveMigrateVmDisks, Parameters Type org.ovirt.e
ngine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'success'.
2013-04-28 17:05:50,203 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-30) BaseAsyncTask::OnTaskEndSuccess: Task 57c84733-7b9f-46e8-a32e-47d8520cf10b (Parent Command LiveMigrateVmDisks, Parameters Type org.ovirt.
engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-04-28 17:05:50,203 INFO  [org.ovirt.engine.core.bll.EntityAsyncTask] (QuartzScheduler_Worker-30) EntityAsyncTask::EndActionIfNecessary: All tasks of entity eddd2bbe-4df4-42b5-95b8-5ecec0fc6afc has ended -> executing EndAction
2013-04-28 17:05:50,203 INFO  [org.ovirt.engine.core.bll.EntityAsyncTask] (QuartzScheduler_Worker-30) EntityAsyncTask::EndAction: Ending action for 1 tasks (entity ID: eddd2bbe-4df4-42b5-95b8-5ecec0fc6afc): calling EndAction for action 
type LiveMigrateVmDisks.
2013-04-28 17:05:50,204 INFO  [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-41) EntityAsyncTask::EndCommandAction [within thread] context: Attempting to EndAction LiveMigrateVmDisks, executionIndex: 0
2013-04-28 17:05:50,211 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (pool-4-thread-41) Ending command successfully: org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand
2013-04-28 17:05:50,214 INFO  [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (pool-4-thread-41) Ending command successfully: org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand
2013-04-28 17:05:50,217 INFO  [org.ovirt.engine.core.bll.CreateSnapshotCommand] (pool-4-thread-41) [1e0a3cb5] Ending command successfully: org.ovirt.engine.core.bll.CreateSnapshotCommand
2013-04-28 17:05:50,219 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (pool-4-thread-41) [1e0a3cb5] START, GetImageInfoVDSCommand( storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, ignoreFailoverLimit = fa
lse, compatabilityVersion = null, storageDomainId = d326916a-89b6-41c8-9c14-4b9cf8a1c979, imageGroupId = 79874688-f94d-486b-b72a-191276d88a26, imageId = cc19aee6-e6a2-42e0-bdde-8b12633ba05c), log id: 6d71bf29
2013-04-28 17:05:50,830 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (pool-4-thread-41) [1e0a3cb5] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.DiskImage@e4e13ba, log 
id: 6d71bf29
2013-04-28 17:05:50,841 WARN  [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (pool-4-thread-41) Unable to get value of property: glusterVolume for class org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand
2013-04-28 17:05:50,842 WARN  [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (pool-4-thread-41) Unable to get value of property: vds for class org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand
2013-04-28 17:05:50,864 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (pool-4-thread-41) Running command: LiveMigrateVmDisksCommandTask handler: LiveMigrateDisksTaskHandler internal: false. Entities affected :  ID: 9f4
2231d-2904-4df9-9026-fe6d4fd5c7c1 Type: Disk,  ID: 81ef11d0-4c0c-47b4-8953-d61a6af442d8 Type: Storage
2013-04-28 17:05:50,924 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-4-thread-41) [26138e91] Lock Acquired to object EngineLock [exclusiveLocks= key: 79874688-f94d-486b-b72a-191276d88a26 value: DISK
, sharedLocks= key: eddd2bbe-4df4-42b5-95b8-5ecec0fc6afc value: VM
]
2013-04-28 17:05:50,936 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-4-thread-41) [26138e91] Running command: LiveMigrateDiskCommandTask handler: CreateImagePlaceholderTaskHandler internal: true. Entities affected 
:  ID: 79874688-f94d-486b-b72a-191276d88a26 Type: Disk,  ID: 81ef11d0-4c0c-47b4-8953-d61a6af442d8 Type: Storage
2013-04-28 17:05:50,937 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-4-thread-41) [26138e91] Lock freed to object EngineLock [exclusiveLocks= key: 79874688-f94d-486b-b72a-191276d88a26 value: DISK
, sharedLocks= key: eddd2bbe-4df4-42b5-95b8-5ecec0fc6afc value: VM

Comment 3 Ayal Baron 2013-05-01 09:21:38 UTC
Why is this a regression?

Comment 4 Dafna Ron 2013-05-01 10:34:47 UTC
we used to report failure

Comment 5 Ayal Baron 2013-05-01 18:51:06 UTC
(In reply to comment #4)
> we used to report failure

used to in what version?

Comment 6 Dafna Ron 2013-05-02 08:01:01 UTC
last time I checked LSM was in 3.1 - can't tell you the build.

Comment 7 Daniel Erez 2013-05-05 14:28:45 UTC
LiveMigrateVmDisks is only the wrapper command that invoke the LiveMigrateDiskCommand commands. For failure indication, added an appropriate event and an error log.

Comment 8 Dafna Ron 2013-05-12 12:53:04 UTC
verified on sf16

Comment 9 Itamar Heim 2013-06-11 08:18:29 UTC
3.2 has been released

Comment 10 Itamar Heim 2013-06-11 08:22:56 UTC
3.2 has been released