Bug 1056935 - A failure in a merge operation should fallback to a partial merge, not a broken snapshot
Summary: A failure in a merge operation should fallback to a partial merge, not a brok...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.5.0
Assignee: Daniel Erez
QA Contact: Ori Gofen
URL:
Whiteboard: storage
: 996945 1082655 1096508 1149770 (view as bug list)
Depends On: 1054219
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-23 08:28 UTC by Allon Mureinik
Modified: 2020-01-28 07:13 UTC (History)
12 users (show)

Fixed In Version: ovirt-3.5.0-alpha2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-10-17 12:40:35 UTC
oVirt Team: Storage
Embargoed:
amureini: ovirt_requires_release_note?


Attachments (Terms of Use)
log of failed BROKEN snapshot removal ... (7.70 KB, text/plain)
2014-10-06 13:35 UTC, gody1
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 27610 0 master MERGED core: extract PrepareSnapshotConfig method to ImagesHandler Never
oVirt gerrit 27611 0 master MERGED core: avoid broken status on snapshot removal Never
oVirt gerrit 27612 0 master MERGED core: SnapshotStatus - remove BROKEN status Never
oVirt gerrit 27681 0 master MERGED core: Deprecate broken snapshots upgrade script Never

Description Allon Mureinik 2014-01-23 08:28:54 UTC
Description of problem:
When a merge snapshot operation fails, the snapshot is left in status BROKEN and is unusable.
Instead, we could merge the disk-snapshots we know have succeeded, and only mark the disk who's merge operation failed as broken.

Version-Release number of selected component (if applicable):
ovirt-3.4-bera1

How reproducible:
100%

Steps to Reproduce:
1. Create a DC
2. Create a cluster with compatibility level >=3.4
3. Add a host
4. Create a storage domain
5. Create a VM with three disks
6. Fill up one of the disks so it's considerably larger than the others
7. Take a snapshot
8. Fill up the same disk, so it's snapshot is also considerably larger than the others
9. merge the entire snapshot
10. wait for the two small disks' merge operations to complete, and restart the host (note: since it's the only host, it's also the SPM)

Actual results:
The entire snapshot is broken.

Expected results:
The two smaller disks should be removed from the snapshot (as they have been merged successfully), and the larger only should be marked as broken.

Comment 1 Sandro Bonazzola 2014-03-04 09:25:59 UTC
This is an automated message.
Re-targeting all non-blocker bugs still open on 3.4.0 to 3.4.1.

Comment 2 Daniel Erez 2014-03-27 09:57:21 UTC
*** Bug 996945 has been marked as a duplicate of this bug. ***

Comment 3 Daniel Erez 2014-05-15 15:01:44 UTC
*** Bug 1096508 has been marked as a duplicate of this bug. ***

Comment 4 Allon Mureinik 2014-06-26 08:09:58 UTC
*** Bug 1082655 has been marked as a duplicate of this bug. ***

Comment 5 Ori Gofen 2014-08-11 12:11:28 UTC
verified on rc1

Comment 6 gody1 2014-10-06 13:34:16 UTC
Hello,

I can also verify this problem on RHEV 3.4.

During snapshot removal our SPM node load skyrocket and it was thrown out of cluster.

After rebooting and activating the node 3 VMs have BROKEN snapshots, which we can't remove.

We get this error in user interface:
Error while executing action RemoveSnapshot: Image does not exist in domain

Will provide log of ovirt-engine during this problem.

I really need some sort of solution, is there any workaround ?

Comment 7 gody1 2014-10-06 13:35:25 UTC
Created attachment 944265 [details]
log of failed BROKEN snapshot removal ...

Attached is the ovirt-engine log during failed removal of BROKEN snapshot.

Comment 8 Allon Mureinik 2014-10-07 18:03:08 UTC
(In reply to gody1 from comment #6)
> Hello,
> 
> I can also verify this problem on RHEV 3.4.
> 
> During snapshot removal our SPM node load skyrocket and it was thrown out of
> cluster.
> 
> After rebooting and activating the node 3 VMs have BROKEN snapshots, which
> we can't remove.
> 
> We get this error in user interface:
> Error while executing action RemoveSnapshot: Image does not exist in domain
> 
> Will provide log of ovirt-engine during this problem.
> 
> I really need some sort of solution, is there any workaround ?

Daniel?

Comment 9 Daniel Erez 2014-10-08 07:51:15 UTC
(In reply to gody1 from comment #6)
> Hello,
> 
> I can also verify this problem on RHEV 3.4.
> 
> During snapshot removal our SPM node load skyrocket and it was thrown out of
> cluster.
> 
> After rebooting and activating the node 3 VMs have BROKEN snapshots, which
> we can't remove.
> 
> We get this error in user interface:
> Error while executing action RemoveSnapshot: Image does not exist in domain
> 
> Will provide log of ovirt-engine during this problem.
> 
> I really need some sort of solution, is there any workaround ?

This issue seems more related to bug 996945, which addresses failure of a broken snapshot removal. Up until 3.5, the behavior has been to mark a snapshot as broken on any failure while removing it. Deleting the broken snapshot could fail as well since that's the reason it was marked as broken in the first place. Hence, the alternative is a manual cleanup of the snapshot. Are you looking for a workaround just to remove it from snapshots list in the UI/rest-api, or the snapshots disks still exist on the storage?

Comment 10 Sandro Bonazzola 2014-10-17 12:40:35 UTC
oVirt 3.5 has been released and should include the fix for this issue.

Comment 11 Allon Mureinik 2014-10-21 08:23:26 UTC
*** Bug 1149770 has been marked as a duplicate of this bug. ***

Comment 12 Matjaž Godec 2020-01-28 07:13:24 UTC
Works on 4.3


Note You need to log in before you can comment on or make changes to this bug.