Created attachment 741425 [details] logs Description of problem: I killed -9 a vm during the cloneImageStructure step of storage live migration which means that the lv's are already created on the other domain (target domain). trying to migrate the disk a second time (after re-running the vm) we fail because the images already exist in the target. Version-Release number of selected component (if applicable): sf14 How reproducible: 100% Steps to Reproduce: 1. kill -9 during the cloneImageStructure task of live storage migration 2. re-run the vm 3. try to migrate the disk to the same domain Actual results: we fail because Image already exist in target and the only way to clean it is by lvremove on vds side. Expected results: we should succeed second move if we had a failure on the first move Additional info: full logs first move failure: 2013-04-29 10:44:36,412 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-4-thread-42) [46ed926] Ending command with failure: org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand 2013-04-29 10:44:36,417 WARN [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (pool-4-thread-42) Unable to get value of property: glusterVolume for class org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand 2013-04-29 10:44:36,429 WARN [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (pool-4-thread-42) Unable to get value of property: vds for class org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand 2013-04-29 10:44:36,444 INFO [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-42) EntityAsyncTask::HandleEndActionResult [within thread]: EndAction for action type LiveMigrateDisk completed, handling the result. 2013-04-29 10:44:36,445 INFO [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-42) EntityAsyncTask::HandleEndActionResult [within thread]: EndAction for action type LiveMigrateDisk succeeded, clearing tasks. 2013-04-29 10:44:36,448 INFO [org.ovirt.engine.core.bll.SPMAsyncTask] (pool-4-thread-42) SPMAsyncTask::ClearAsyncTask: Attempting to clear task f77778af-f651-4fee-9e4f-788e6987f658 second move failure: 2013-04-29 10:52:17,215 INFO [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-78) SPMAsyncTask::PollTask: Polling task 57d1151e-54b0-41ed-9365-a18a653e9fd8 (Parent Command LiveMigrateDisk, Parameters Type org.ovirt.engi ne.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'cleanSuccess'. 2013-04-29 10:52:17,218 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-78) BaseAsyncTask::LogEndTaskFailure: Task 57d1151e-54b0-41ed-9365-a18a653e9fd8 (Parent Command LiveMigrateDisk, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure: -- Result: cleanSuccess -- Message: VDSGenericException: VDSErrorException: Failed in vdscommand to HSMGetAllTasksStatusesVDS, error = Volume already exists, -- Exception: VDSGenericException: VDSErrorException: Failed in vdscommand to HSMGetAllTasksStatusesVDS, error = Volume already exists 2013-04-29 10:52:17,219 INFO [org.ovirt.engine.core.bll.EntityAsyncTask] (QuartzScheduler_Worker-78) EntityAsyncTask::EndActionIfNecessary: All tasks of entity b634a653-7042-4d19-aa24-296484e59bd0 has ended -> executing EndAction 2013-04-29 10:52:17,219 INFO [org.ovirt.engine.core.bll.EntityAsyncTask] (QuartzScheduler_Worker-78) EntityAsyncTask::EndAction: Ending action for 1 tasks (entity ID: b634a653-7042-4d19-aa24-296484e59bd0): calling EndAction for action type LiveMigrateDisk. 2013-04-29 10:52:17,219 INFO [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-42) EntityAsyncTask::EndCommandAction [within thread] context: Attempting to EndAction LiveMigrateDisk, executionIndex: 0 2013-04-29 10:52:17,234 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-4-thread-42) [67b9687f] Ending command with failure: org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand 2013-04-29 10:52:17,238 WARN [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (pool-4-thread-42) Unable to get value of property: glusterVolume for class org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand 2013-04-29 10:52:17,240 WARN [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (pool-4-thread-42) Unable to get value of property: vds for class org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand 2013-04-29 10:52:17,247 INFO [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-42) EntityAsyncTask::HandleEndActionResult [within thread]: EndAction for action type LiveMigrateDisk completed, handling the result. 2013-04-29 10:52:17,247 INFO [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-42) EntityAsyncTask::HandleEndActionResult [within thread]: EndAction for action type LiveMigrateDisk succeeded, clearing tasks. 2013-04-29 10:52:17,250 INFO [org.ovirt.engine.core.bll.SPMAsyncTask] (pool-4-thread-42) SPMAsyncTask::ClearAsyncTask: Attempting to clear task 57d1151e-54b0-41ed-9365-a18a653e9fd8 vdsm side failure: 57d1151e-54b0-41ed-9365-a18a653e9fd8::ERROR::2013-04-29 10:52:13,826::task::850::TaskManager.Task::(_setError) Task=`57d1151e-54b0-41ed-9365-a18a653e9fd8`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 318, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1769, in cloneImageStructure image.Image(repoPath).cloneStructure(sdUUID, imgUUID, dstSdUUID) File "/usr/share/vdsm/storage/image.py", line 649, in cloneStructure self._createTargetImage(sdCache.produce(dstSdUUID), sdUUID, imgUUID) File "/usr/share/vdsm/storage/image.py", line 517, in _createTargetImage srcVolUUID=volParams['parent']) File "/usr/share/vdsm/storage/blockSD.py", line 610, in createVolume volUUID, desc, srcImgUUID, srcVolUUID) File "/usr/share/vdsm/storage/volume.py", line 418, in create raise se.VolumeAlreadyExists(volUUID) VolumeAlreadyExists: Volume already exists: ('b634a653-7042-4d19-aa24-296484e59bd0',) 57d1151e-54b0-41ed-9365-a18a653e9fd8::DEBUG::2013-04-29 10:52:13,828::task::869::TaskManager.Task::(_run) Task=`57d1151e-54b0-41ed-9365-a18a653e9fd8`::Task._run: 57d1151e-54b0-41ed-9365-a18a653e9fd8 () {} failed - stopping task 57d1151e-54b0-41ed-9365-a18a653e9fd8::DEBUG::2013-04-29 10:52:13,828::task::1194::TaskManager.Task::(stop) Task=`57d1151e-54b0-41ed-9365-a18a653e9fd8`::stopping in state running (force False) 57d1151e-54b0-41ed-9365-a18a653e9fd8::DEBUG::2013-04-29 10:52:13,829::task::974::TaskManager.Task::(_decref) Task=`57d1151e-54b0-41ed-9365-a18a653e9fd8`::ref 1 aborting True
What happens here is: 1. CloneImageGroupStructure is sent: 2013-04-29 10:44:18,218 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CloneImageGroupStructureVDSCommand] (pool-4-thread-48) [46ed926] START, CloneImageGroupStructureVDSCommand( storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = d326916a-89b6-41c8-9c14-4b9cf8a1c979, imageGroupId = 8220610b-1bee-4beb-918f-368cb7d4848a, dstDomainId = 81ef11d0-4c0c-47b4-8953-d61a6af442d8), log id: 5d6db40f 2. VM crashes and engine sends vmDestroy: 2013-04-29 10:44:27,802 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (QuartzScheduler_Worker-28) START, DestroyVDSCommand(HostName = gold-vdsd, HostId = 83834e1f-9e60-41b5-a9cc-16460a8a2fe2, vmId=6725d737-834d-4f41-b64d-78b2c4885ba2, force=false, secondsToWait=0, gracefully=false), log id: c15c965 2013-04-29 10:44:27,827 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (QuartzScheduler_Worker-28) FINISH, DestroyVDSCommand, log id: c15c965 3. CloneImageGroupStructure task ends with failure: 2013-04-29 10:44:36,390 INFO [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-42) EntityAsyncTask::EndCommandAction [within thread] context: Attempting to EndAction LiveMigrateDisk, executionIndex: 0 2013-04-29 10:44:36,412 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-4-thread-42) [46ed926] Ending command with failure: org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand CloneImageGroupStructure ends with failure because in LiveMigrateDiskCommand.java we check if the VM isRunningOrPaused: public LiveMigrateDiskCommand(T parameters) { super(parameters); ... getParameters().setTaskGroupSuccess(getParameters().getTaskGroupSuccess() && getVm().getStatus().isRunningOrPaused()); } Since the VM is down setTaskGroupSuccess is set to false and that triggers internalEndWithFailure: private void internalEndWithFailure() { log.errorFormat("Ending command with failure: {0}", getClass().getName()); if (hasTaskHandlers()) { if (hasStepsToRevert()) { getCurrentTaskHandler().endWithFailure(); revertPreviousHandlers(); } startPollingAsyncTasks(); } else { endWithFailure(); } rollbackQuota(); } and for the current task handler (CloneImageGroupStructure) we call only endWithFailure() (instead of compensate that would remove the destination image). One possible (or additional) fix is to call compensate() also for the current task handler (which would be handy for ImportRepoImage too). Anyway in general I think that removing: getParameters().setTaskGroupSuccess(getParameters().getTaskGroupSuccess() && getVm().getStatus().isRunningOrPaused()); would be a more appropriate fix as there's no reason to fail the end action of CloneImageGroupStructure. We'll delegate the failure to the next handler (VmReplicateDiskStartTaskHandler) and we can benefit of compensate() being called by revertPreviousHandlers().
This bug is currently attached to errata RHEA-2013:15231. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
is24.2 disk successfully live migrated second time after killing qemu process while first live migration
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0038.html