Bug 1702597

Summary: [downstream clone - 4.3.4] When a live storage migration fails, the auto generated snapshot does not get removed
Product: Red Hat Enterprise Virtualization Manager Reporter: RHV bug bot <rhv-bugzilla-bot>
Component: ovirt-engineAssignee: Benny Zlotnik <bzlotnik>
Status: CLOSED ERRATA QA Contact: Evelina Shames <eshames>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.2.7CC: aefrat, bzlotnik, lsurette, Rhev-m-bugs, srevivo, tnisan, ycui
Target Milestone: ovirt-4.3.4Keywords: ZStream
Target Release: 4.3.1Flags: lsvaty: testing_plan_complete-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.3.4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1690475 Environment:
Last Closed: 2019-06-20 14:48:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1690475    
Bug Blocks:    

Description RHV bug bot 2019-04-24 08:20:03 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1690475 +++
======================================================================

Description of problem:
During a live storage migration from storage domain to storage domain (both backed by FC) several steps take place.
One of the first steps is to create a snapshot of the source vdisk, named "auto generated snapshot for migration"
If the migration fails (in our case due to broken paths of the destination SD) the "auto generated snapshot" does not get removed.

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
(1) Trigger live migration of storage
(2) Wait until snapshot is created and you can see disk activity on the destination SD
(3) Cut access to the destination SD (e.g. pull the cable)
(4) The task in RHV fails
(5) The snapshot from step (2) is still there and does not get removed.

Actual results:
The snapshot is left untouched and blocking storage

Expected results:
The RHV correctly detects, that the migration did not took place and should clean up automatically.

Additional info:

(Originally by Steffen Froemer)

Comment 1 RHV bug bot 2019-04-24 08:20:04 UTC
Benny, I recall we have an RFE for this issue

(Originally by Tal Nisan)

Comment 2 RHV bug bot 2019-04-24 08:20:06 UTC
Not sure, I think we have an RFE for removing if the VM was shutdown

I am not entirely sure at which stage the cable is pulled?
Live storage migration consists of:
1. Create a snapshot 
2. Create image placeholder
3. Start replication
4. sync
5. finish replication
6. live merge

After stage 2 and until the end of 5 the "snapshot" is present on both source and destination, and if the destination is blocked, we can't really clean it up and it will require manual intervention
Though we can add a best-effort attempt to remove the auto-generated snapshot after failures

(Originally by Benny Zlotnik)

Comment 3 RHV bug bot 2019-04-24 08:20:07 UTC
(In reply to Benny Zlotnik from comment #2)
> Not sure, I think we have an RFE for removing if the VM was shutdown
> 
> I am not entirely sure at which stage the cable is pulled?

In my case, the "pull cable" was caused by issue described in [1]. 
The scenario was as follow:

The vDisk from VM should be moved from SD1 -> SD2. The snapshot was created on SD1 and migration started. The migration failed due to [1] and the Snapshot was not deleted.

The expectation is, that in this case, the automatic created snapshot is removed automatically. 
I don't know, in which state the migration failed, but let me know, how I can help with additional logs, to get clarification to this.


[1]: https://access.redhat.com/solutions/3086271

(Originally by Steffen Froemer)

Comment 4 RHV bug bot 2019-04-24 08:20:09 UTC
I see, I will a best-effort attempt to remove the auto-generated snapshot

(Originally by Benny Zlotnik)

Comment 6 Evelina Shames 2019-05-30 10:58:58 UTC
Verified on engine 4.3.4.1-0.1.el7, vdsm 4.30.16-3.el7ev.x86_64.

Comment 8 errata-xmlrpc 2019-06-20 14:48:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1566

Comment 9 Daniel Gur 2019-08-28 13:14:39 UTC
sync2jira

Comment 10 Daniel Gur 2019-08-28 13:19:41 UTC
sync2jira