Bug 1056935

Summary: A failure in a merge operation should fallback to a partial merge, not a broken snapshot
Product: [Retired] oVirt Reporter: Allon Mureinik <amureini>
Component: ovirt-engine-coreAssignee: Daniel Erez <derez>
Status: CLOSED CURRENTRELEASE QA Contact: Ori Gofen <ogofen>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4CC: acanan, adevolder, bugs, derez, gklein, iheim, lnatapov, matjaz.godec, rbalakri, scohen, tnisan, yeylon
Target Milestone: ---Flags: amureini: ovirt_requires_release_note?
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: ovirt-3.5.0-alpha2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-17 12:40:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1054219    
Bug Blocks:    
Attachments:
Description Flags
log of failed BROKEN snapshot removal ... none

Description Allon Mureinik 2014-01-23 08:28:54 UTC
Description of problem:
When a merge snapshot operation fails, the snapshot is left in status BROKEN and is unusable.
Instead, we could merge the disk-snapshots we know have succeeded, and only mark the disk who's merge operation failed as broken.

Version-Release number of selected component (if applicable):
ovirt-3.4-bera1

How reproducible:
100%

Steps to Reproduce:
1. Create a DC
2. Create a cluster with compatibility level >=3.4
3. Add a host
4. Create a storage domain
5. Create a VM with three disks
6. Fill up one of the disks so it's considerably larger than the others
7. Take a snapshot
8. Fill up the same disk, so it's snapshot is also considerably larger than the others
9. merge the entire snapshot
10. wait for the two small disks' merge operations to complete, and restart the host (note: since it's the only host, it's also the SPM)

Actual results:
The entire snapshot is broken.

Expected results:
The two smaller disks should be removed from the snapshot (as they have been merged successfully), and the larger only should be marked as broken.

Comment 1 Sandro Bonazzola 2014-03-04 09:25:59 UTC
This is an automated message.
Re-targeting all non-blocker bugs still open on 3.4.0 to 3.4.1.

Comment 2 Daniel Erez 2014-03-27 09:57:21 UTC
*** Bug 996945 has been marked as a duplicate of this bug. ***

Comment 3 Daniel Erez 2014-05-15 15:01:44 UTC
*** Bug 1096508 has been marked as a duplicate of this bug. ***

Comment 4 Allon Mureinik 2014-06-26 08:09:58 UTC
*** Bug 1082655 has been marked as a duplicate of this bug. ***

Comment 5 Ori Gofen 2014-08-11 12:11:28 UTC
verified on rc1

Comment 6 gody1 2014-10-06 13:34:16 UTC
Hello,

I can also verify this problem on RHEV 3.4.

During snapshot removal our SPM node load skyrocket and it was thrown out of cluster.

After rebooting and activating the node 3 VMs have BROKEN snapshots, which we can't remove.

We get this error in user interface:
Error while executing action RemoveSnapshot: Image does not exist in domain

Will provide log of ovirt-engine during this problem.

I really need some sort of solution, is there any workaround ?

Comment 7 gody1 2014-10-06 13:35:25 UTC
Created attachment 944265 [details]
log of failed BROKEN snapshot removal ...

Attached is the ovirt-engine log during failed removal of BROKEN snapshot.

Comment 8 Allon Mureinik 2014-10-07 18:03:08 UTC
(In reply to gody1 from comment #6)
> Hello,
> 
> I can also verify this problem on RHEV 3.4.
> 
> During snapshot removal our SPM node load skyrocket and it was thrown out of
> cluster.
> 
> After rebooting and activating the node 3 VMs have BROKEN snapshots, which
> we can't remove.
> 
> We get this error in user interface:
> Error while executing action RemoveSnapshot: Image does not exist in domain
> 
> Will provide log of ovirt-engine during this problem.
> 
> I really need some sort of solution, is there any workaround ?

Daniel?

Comment 9 Daniel Erez 2014-10-08 07:51:15 UTC
(In reply to gody1 from comment #6)
> Hello,
> 
> I can also verify this problem on RHEV 3.4.
> 
> During snapshot removal our SPM node load skyrocket and it was thrown out of
> cluster.
> 
> After rebooting and activating the node 3 VMs have BROKEN snapshots, which
> we can't remove.
> 
> We get this error in user interface:
> Error while executing action RemoveSnapshot: Image does not exist in domain
> 
> Will provide log of ovirt-engine during this problem.
> 
> I really need some sort of solution, is there any workaround ?

This issue seems more related to bug 996945, which addresses failure of a broken snapshot removal. Up until 3.5, the behavior has been to mark a snapshot as broken on any failure while removing it. Deleting the broken snapshot could fail as well since that's the reason it was marked as broken in the first place. Hence, the alternative is a manual cleanup of the snapshot. Are you looking for a workaround just to remove it from snapshots list in the UI/rest-api, or the snapshots disks still exist on the storage?

Comment 10 Sandro Bonazzola 2014-10-17 12:40:35 UTC
oVirt 3.5 has been released and should include the fix for this issue.

Comment 11 Allon Mureinik 2014-10-21 08:23:26 UTC
*** Bug 1149770 has been marked as a duplicate of this bug. ***

Comment 12 Matjaž Godec 2020-01-28 07:13:24 UTC
Works on 4.3