+++ This bug is a downstream clone. The original bug is: +++
+++ bug 1443137 +++
======================================================================
Description of problem:
A Live Storage Migration 'hung'. The task handling the internal (base) volume copy on the SPM encountered an error, yet the 'copy' (qemu-img convert) appeared to have completed successfully. A subsequent "UnknownTask" error occurred while trying to stop/clear the task.
On the engine, the SyncImageGroupDataVDSCommand completed without 'error', yet the LSM sequence just stopped, and so the VmReplicateDiskFinishVDSCommand was never executed.
As a result, the symptoms were that the base volume copy seemed to have completed, but the active volume 'block copy' job was still in running.
Version-Release number of selected component (if applicable):
RHEV 3.6.9
RHEL 7.2 host;
vdsm-4.17.35-1.el7
libvirt-1.2.17-13.el7_2.5
qemu-kvm-rhev-2.3.0-31.el7_2.21
How reproducible:
Not reproducible.
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
(Originally by Gordon Watson)
LSM implementation in engine changed drastically in 4.0 (from SEAT to CoCo infra). I couldn't reproduce the issue on latest build. Moving to ON_QA for verification.
(Originally by Daniel Erez)
(In reply to Eyal Shenitzky from comment #12)
> Can you pls add steps to reproduce?
There's actually no exact reproducing steps. It seems like a race that happens occasionally. If you didn't encounter this issue since 4.0, we can close the bug on insufficient_data and reopen if the issue is reproduced.
(Originally by Daniel Erez)