1702597 – [downstream clone - 4.3.4] When a live storage migration fails, the auto generated snapshot does not get removed

Bug 1702597 - [downstream clone - 4.3.4] When a live storage migration fails, the auto generated snapshot does not get removed

Summary: [downstream clone - 4.3.4] When a live storage migration fails, the auto gene...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.2.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	ovirt-4.3.4
Target Release:	4.3.1
Assignee:	Benny Zlotnik
QA Contact:	Evelina Shames
Docs Contact:
URL:
Whiteboard:
Depends On:	1690475
Blocks:
TreeView+	depends on / blocked

Reported:	2019-04-24 08:20 UTC by RHV bug bot
Modified:	2020-08-03 15:29 UTC (History)
CC List:	7 users (show)
Fixed In Version:	ovirt-engine-4.3.4
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1690475
Environment:
Last Closed:	2019-06-20 14:48:33 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:
Flags:	lsvaty: testing_plan_complete-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2019:1566	None	None	None	2019-06-20 14:48:45 UTC
oVirt gerrit	98919	'None'	MERGED	core: attempt to remove the auto-generated snapshot	2020-08-19 08:41:45 UTC
oVirt gerrit	99481	'None'	MERGED	core: attempt to remove the auto-generated snapshot	2020-08-19 08:41:45 UTC

Description RHV bug bot 2019-04-24 08:20:03 UTC

+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1690475 +++
======================================================================

Description of problem:
During a live storage migration from storage domain to storage domain (both backed by FC) several steps take place.
One of the first steps is to create a snapshot of the source vdisk, named "auto generated snapshot for migration"
If the migration fails (in our case due to broken paths of the destination SD) the "auto generated snapshot" does not get removed.

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
(1) Trigger live migration of storage
(2) Wait until snapshot is created and you can see disk activity on the destination SD
(3) Cut access to the destination SD (e.g. pull the cable)
(4) The task in RHV fails
(5) The snapshot from step (2) is still there and does not get removed.

Actual results:
The snapshot is left untouched and blocking storage

Expected results:
The RHV correctly detects, that the migration did not took place and should clean up automatically.

Additional info:

(Originally by Steffen Froemer)

Comment 1 RHV bug bot 2019-04-24 08:20:04 UTC

Benny, I recall we have an RFE for this issue

(Originally by Tal Nisan)

Comment 2 RHV bug bot 2019-04-24 08:20:06 UTC

Not sure, I think we have an RFE for removing if the VM was shutdown

I am not entirely sure at which stage the cable is pulled?
Live storage migration consists of:
1. Create a snapshot 
2. Create image placeholder
3. Start replication
4. sync
5. finish replication
6. live merge

After stage 2 and until the end of 5 the "snapshot" is present on both source and destination, and if the destination is blocked, we can't really clean it up and it will require manual intervention
Though we can add a best-effort attempt to remove the auto-generated snapshot after failures

(Originally by Benny Zlotnik)

Comment 3 RHV bug bot 2019-04-24 08:20:07 UTC

(In reply to Benny Zlotnik from comment #2)
> Not sure, I think we have an RFE for removing if the VM was shutdown
> 
> I am not entirely sure at which stage the cable is pulled?

In my case, the "pull cable" was caused by issue described in [1]. 
The scenario was as follow:

The vDisk from VM should be moved from SD1 -> SD2. The snapshot was created on SD1 and migration started. The migration failed due to [1] and the Snapshot was not deleted.

The expectation is, that in this case, the automatic created snapshot is removed automatically. 
I don't know, in which state the migration failed, but let me know, how I can help with additional logs, to get clarification to this.


[1]: https://access.redhat.com/solutions/3086271

(Originally by Steffen Froemer)

Comment 4 RHV bug bot 2019-04-24 08:20:09 UTC

I see, I will a best-effort attempt to remove the auto-generated snapshot

(Originally by Benny Zlotnik)

Comment 6 Evelina Shames 2019-05-30 10:58:58 UTC

Verified on engine 4.3.4.1-0.1.el7, vdsm 4.30.16-3.el7ev.x86_64.

Comment 8 errata-xmlrpc 2019-06-20 14:48:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1566

Comment 9 Daniel Gur 2019-08-28 13:14:39 UTC

sync2jira

Comment 10 Daniel Gur 2019-08-28 13:19:41 UTC

sync2jira

Note You need to log in before you can comment on or make changes to this bug.