Description of problem:
Special care needs to be taken when a VM goes down while a live merge job is underway on the host.
If the active layer is being merged, when the engine detects the host went down it needs to call the getVolumeInfo verb on the leaf volume to detect its legality, and thus whether the job was successful.
If an internal volume is being merged, when the engine detects the vm is back up, it can restart the merge job using a special set of parameters.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Start live merging snapshots, either the most recent snapshot for the active layer case, or an older snapshot to merge internal volumes.
2. Fence the host (or some alternate way to cause the merge job to stop)
3. Observe the recovery as the VM changes state
No consideration is made to recovery from this scenario today.
The engine should coordinate with VDSM, as described above, to cause the job to converge to success or failure in such a way that the merge job is repeatable and/or the VM is once again usable.
See bug 1127294 for the corresponding vdsm-side implementation.
oVirt 3.5 has been released and should include the fix for this issue.