Description of problem: Special care needs to be taken when a VM goes down while a live merge job is underway on the host. If the active layer is being merged, when the engine detects the host went down it needs to call the getVolumeInfo verb on the leaf volume to detect its legality, and thus whether the job was successful. If an internal volume is being merged, when the engine detects the vm is back up, it can restart the merge job using a special set of parameters. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Start live merging snapshots, either the most recent snapshot for the active layer case, or an older snapshot to merge internal volumes. 2. Fence the host (or some alternate way to cause the merge job to stop) 3. Observe the recovery as the VM changes state Actual results: No consideration is made to recovery from this scenario today. Expected results: The engine should coordinate with VDSM, as described above, to cause the job to converge to success or failure in such a way that the merge job is repeatable and/or the VM is once again usable. Additional info: See bug 1127294 for the corresponding vdsm-side implementation.
oVirt 3.5 has been released and should include the fix for this issue.