Bug 1178010 - RHEV: Faulty storage allocation checks when merging a snapshot
Summary: RHEV: Faulty storage allocation checks when merging a snapshot
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.4.0
Hardware: Unspecified
OS: All
medium
medium
Target Milestone: ---
: 3.5.0
Assignee: Vered Volansky
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard: storage
: 1182222 (view as bug list)
Depends On: 1053733
Blocks: 960934 1117231
TreeView+ depends on / blocked
 
Reported: 2015-01-01 11:52 UTC by Allon Mureinik
Modified: 2016-02-10 20:42 UTC (History)
21 users (show)

Fixed In Version: ovirt-engine-3.5.0_vt6
Doc Type: Bug Fix
Doc Text:
Clone Of: 1053733
Environment:
Last Closed: 2015-02-16 19:11:49 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs 1.1.15 (1.06 MB, application/x-gzip)
2015-01-01 14:02 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 33610 0 'None' 'MERGED' 'core: Merge snapshots storage allocation' 2019-11-13 10:29:50 UTC
oVirt gerrit 33696 0 'None' 'MERGED' 'core: Merge snapshots storage allocation' 2019-11-13 10:29:51 UTC
oVirt gerrit 37014 0 'None' 'MERGED' 'core: Fix storage allocation check when merging a snapshot' 2019-11-13 10:29:51 UTC
oVirt gerrit 37255 0 'None' 'MERGED' 'core: Fix storage allocation check when merging a snapshot' 2019-11-13 10:29:51 UTC

Description Allon Mureinik 2015-01-01 11:52:57 UTC
+++ This bug was initially created as a clone of Bug #1053733 +++

+++ This bug was initially created as a clone of Bug #960934 +++

When merging a snapshot disk, we have a transient situation where the data of the merged snapshot exists twice - in the source and target.
We must make sure we have this space for a successful merge:

      | File Domain                             | Block Domain
 -----|-----------------------------------------|-------------
 qcow | preallocated : 1.1 * disk capacity      |1.1 * min(used ,capacity) 
      | sparse: 1.1 * min(used ,capacity)       |
 -----|-----------------------------------------|-------------
 raw  | preallocated: disk capacity             |disk capacity
      | sparse: min(used,capacity)

--- Additional comment from Vered Volansky on 2014-10-01 09:14:31 IST ---

The related commands to the above scenario are RemoveSnapshotCommand and RemoveDiskSnapshotsCommand.

Verify in two ways:
1. Remove a snapshot from vm tab, snapshots subtab.
2. Remove a disk snapshot from the storage tab, snapshots subtab.

Say we have B + S1 + S2, all 10G.
SD should have extra available space for 20G (the maximum size merge of the two snapshots). Domain with less than 20G free space should fail and 20G+ should succeed.
-------------------------------------------------------------------------------
This bug is a RHEV tracker for the QE team to verify against RHEVM 3.5.0

Comment 1 Elad 2015-01-01 14:01:41 UTC
Working on verfication, encountered the follwoing:
Created a VM with 10G preallocated disk on FC domain and created a snapshot. Storage domain had 10G free space. Initiated a snapshot merge (via VM tab->snapshots subtab). The operation wasn't blocked and right after, the domain was reported to be with 0G free space:

2015-01-01 15:28:53,002 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-85) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Critical, Low disk space. fc1 domain has 0 GB of free space

Vered, Allon, please advise.

Attaching logs from engine and vdsm

Comment 2 Elad 2015-01-01 14:02:16 UTC
Created attachment 974961 [details]
logs 1.1.15

Comment 3 Elad 2015-01-01 16:14:29 UTC
Also, the validation should consider also the value of FreeSpaceCriticalLowInGB, which is 5G by default (can be changed). In my setup, the value is the default (5G)

Comment 4 Vered Volansky 2015-01-04 06:06:52 UTC
Elad, looking into this.
A. Please report the actual operations you have executed, I guess deleting one  snapshot out of ???. Please state exactly which snapshot that was.
B. Regarding threshold, the validation does take care of this, you may take it into consideration as you will, with different numbers. The threshold issue is a clear one as to how to verify. We gave more details for the allocations, since there were questions raised from QE in the past.

Comment 5 Ori Gofen 2015-01-04 16:25:41 UTC
Vered I think what Elad probably means is that during snapshot merge operation, oVirt-engine doesn't take into account FreeSpaceCriticalLowInGB, and reaches 0 Free space, which is not allowed by definition.

I have managed to reproduce this after deleting one snapshot which was the only snapshot.
Rest of the steps were per Elad's comment #1.

2015-01-04 17:49:22,922 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-41) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Critical, Low disk space. FC_1 domain has 0 GB of free space

Comment 6 Vered Volansky 2015-01-05 08:17:20 UTC
Ori,
Regarding threshold, please state the whole situation. It's not true that you can never go bellow the threshold.

If there was 10G free space, as stated by Elad, the threshold validation would and should not stop this action.

Threshold stops operations involving allocation only if the threshold has currently been met, with no regard to the specific storage-allocation-related operation we would like to execute.
If we're low on space, we won't do it.
If we're not low on space, we'll do it, even if we *will* be low on space after (or eve not enough space for the operation).

Comment 7 Kevin Alon Goldblatt 2015-01-11 15:15:41 UTC
Verified with 13.6

Comment 8 Elad 2015-01-12 07:26:25 UTC
Did the following:
- Had a VM with a 10G disk attached.
- Created 1 snapshot
- After snapshot creation, domain had 10G free space
- Initiated snapshot merge

Vered, I think that threshold validation must stop the merge. User has no way to know that a simple merge operation would disable the storage domain.

Comment 9 Elad 2015-01-12 07:28:34 UTC
Sorry, moving back to VERIFIED based on comment #7 
Vered, setting need-info? for comment #8 for further discussion

Comment 10 Vered Volansky 2015-01-12 07:57:11 UTC
Elad -
Initiated snapshot merge - how? What button did you press on the webadmin?
IIUC, you had one snapshot, which you deleted.
That snapshot should be 1G, plus the preallocated 10G disk of the VM makes 11G.
Space allocation should have failed on space allocation since 11G < 10G (free space), with no regard to the threshold validation, which should and did pass (10G >= 5G).

I don't understand what's the bug's status, Kevin marked as verified on comment #7.
Please clarify.

Comment 11 Elad 2015-01-12 08:08:58 UTC
(In reply to Vered Volansky from comment #10)
> Elad -
> Initiated snapshot merge - how? What button did you press on the webadmin?
> IIUC, you had one snapshot, which you deleted.
Virtual Machines tab -> Snapshots subtab -> delete
Had 1 snapshot. 
> That snapshot should be 1G, plus the preallocated 10G disk of the VM makes
> 11G.
> Space allocation should have failed on space allocation since 11G < 10G
> (free space), with no regard to the threshold validation, which should and
> did pass (10G >= 5G).
> 
The snapshot merge operation wasn't blocked.
> I don't understand what's the bug's status, Kevin marked as verified on
> comment #7.
> Please clarify.

Leaving it as VERIFIED based on Kevin's verification and if it will be necessary, we'll change the status

Comment 12 Vered Volansky 2015-01-14 12:49:25 UTC
Managed to reproduce only after the following (not yet merged) patches:
http://gerrit.ovirt.org/#/c/36892/
http://gerrit.ovirt.org/#/c/36889/

Make sure verification of this bug is done on a version which consists on these bugs.

Comment 13 Vered Volansky 2015-01-14 12:50:09 UTC
Managed to reproduce only after the following (not yet merged) patches:
http://gerrit.ovirt.org/#/c/36892/
http://gerrit.ovirt.org/#/c/36889/

Make sure verification of this bug is done on a version which consists on these patches.

Comment 14 Vered Volansky 2015-01-19 06:05:25 UTC
Verification clarification:

Needed space for snapshot merge should be min(disk virtual size, deleted snapshot size + the snapshot child's size).

Comment 15 Yaniv Lavi 2015-01-19 08:32:26 UTC
*** Bug 1182222 has been marked as a duplicate of this bug. ***

Comment 16 Allon Mureinik 2015-02-16 19:11:49 UTC
RHEV-M 3.5.0 has been released, closing this bug.

Comment 17 Allon Mureinik 2015-02-16 19:11:49 UTC
RHEV-M 3.5.0 has been released, closing this bug.


Note You need to log in before you can comment on or make changes to this bug.