1129898 – Live Merge: Engine-side recovery flow for vm restart during merge

Bug 1129898 - Live Merge: Engine-side recovery flow for vm restart during merge

Summary: Live Merge: Engine-side recovery flow for vm restart during merge

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	oVirt
Classification:	Retired
Component:	ovirt-engine-core
Sub Component:
Version:	3.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.5.0
Assignee:	Greg Padgett
QA Contact:	Kevin Alon Goldblatt
Docs Contact:
URL:
Whiteboard:	storage
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-08-13 21:54 UTC by Greg Padgett
Modified:	2016-02-10 19:44 UTC (History)
CC List:	10 users (show)
Fixed In Version:	ovirt-3.5.0_rc3
Clone Of:
Environment:
Last Closed:	2014-10-17 12:20:10 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	31850	0	master	MERGED	core: Live Merge recovery flow	Never
oVirt gerrit	33018	0	ovirt-engine-3.5	MERGED	core: Live Merge recovery flow	Never

Description Greg Padgett 2014-08-13 21:54:24 UTC

Description of problem:
Special care needs to be taken when a VM goes down while a live merge job is underway on the host.

If the active layer is being merged, when the engine detects the host went down it needs to call the getVolumeInfo verb on the leaf volume to detect its legality, and thus whether the job was successful.

If an internal volume is being merged, when the engine detects the vm is back up, it can restart the merge job using a special set of parameters.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Start live merging snapshots, either the most recent snapshot for the active layer case, or an older snapshot to merge internal volumes.
2. Fence the host (or some alternate way to cause the merge job to stop)
3. Observe the recovery as the VM changes state

Actual results:
No consideration is made to recovery from this scenario today.

Expected results:
The engine should coordinate with VDSM, as described above, to cause the job to converge to success or failure in such a way that the merge job is repeatable and/or the VM is once again usable.

Additional info:
See bug 1127294 for the corresponding vdsm-side implementation.

Comment 1 Sandro Bonazzola 2014-10-17 12:20:10 UTC

oVirt 3.5 has been released and should include the fix for this issue.

Note You need to log in before you can comment on or make changes to this bug.