Bug 1129898 - Live Merge: Engine-side recovery flow for vm restart during merge
Summary: Live Merge: Engine-side recovery flow for vm restart during merge
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.5.0
Assignee: Greg Padgett
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-13 21:54 UTC by Greg Padgett
Modified: 2016-02-10 19:44 UTC (History)
10 users (show)

Fixed In Version: ovirt-3.5.0_rc3
Clone Of:
Environment:
Last Closed: 2014-10-17 12:20:10 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 31850 0 master MERGED core: Live Merge recovery flow Never
oVirt gerrit 33018 0 ovirt-engine-3.5 MERGED core: Live Merge recovery flow Never

Description Greg Padgett 2014-08-13 21:54:24 UTC
Description of problem:
Special care needs to be taken when a VM goes down while a live merge job is underway on the host.

If the active layer is being merged, when the engine detects the host went down it needs to call the getVolumeInfo verb on the leaf volume to detect its legality, and thus whether the job was successful.

If an internal volume is being merged, when the engine detects the vm is back up, it can restart the merge job using a special set of parameters.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Start live merging snapshots, either the most recent snapshot for the active layer case, or an older snapshot to merge internal volumes.
2. Fence the host (or some alternate way to cause the merge job to stop)
3. Observe the recovery as the VM changes state

Actual results:
No consideration is made to recovery from this scenario today.

Expected results:
The engine should coordinate with VDSM, as described above, to cause the job to converge to success or failure in such a way that the merge job is repeatable and/or the VM is once again usable.

Additional info:
See bug 1127294 for the corresponding vdsm-side implementation.

Comment 1 Sandro Bonazzola 2014-10-17 12:20:10 UTC
oVirt 3.5 has been released and should include the fix for this issue.


Note You need to log in before you can comment on or make changes to this bug.