Bug 960934

Summary: Tracker: Storage allocation checks for new disks are faulty throughout the system
Product: Red Hat Enterprise Virtualization Manager Reporter: Vered Volansky <vered>
Component: ovirt-engineAssignee: Vered Volansky <vered>
Status: CLOSED CURRENTRELEASE QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: high    
Version: 3.3.0CC: acanan, amureini, bsettle, dfediuck, eedri, gklein, lpeer, nlevinki, ogofen, rbalakri, Rhev-m-bugs, scohen, tnisan, yeylon, ylavi
Target Milestone: ovirt-3.6.0-rcKeywords: Tracking
Target Release: 3.6.0   
Hardware: Unspecified   
OS: All   
Whiteboard: storage
Fixed In Version: 3.6.0-9 Doc Type: Bug Fix
Doc Text:
Certain storage related flows throughout the system were either not checked for enough space before the execution, had the wrong check, or had it right. In this bug dependents, new code was uniformly applied to all cases in the system where storage is allocated, taking into consideration the specific action and storage type (different actions/storage types consume different sizes of space), as well as a better estimate of the free space in the storage when the request is made. These validations are now performed beforehand, though we still cannot ensure the operation will not fail during the allocation process due to lack of space at the time of the action. We can, however, ensure that if there is not enough space before the operation takes place - the operation will immediately fail.
Story Points: ---
Clone Of:
: 1053704 1053709 1053733 1054175 (view as bug list) Environment:
Last Closed: 2015-11-19 14:28:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 917682, 970932, 1053704, 1053709, 1053740, 1053742, 1053746, 1053747, 1053751, 1053752, 1054175, 1098773, 1117231, 1119022, 1124143, 1124469, 1136717, 1136719, 1136721, 1136734, 1178010, 1178011, 1178012, 1178021, 1178022, 1179688, 1179690, 1181133, 1184807, 1185613    
Bug Blocks:    

Description Vered Volansky 2013-05-08 10:41:59 UTC
Description of problem:

Storage space allocation (or verification) of new disks should be done as follows:

For new empty disks:

     | File Domain                             | Block Domain
-----|-----------------------------------------|-------------
qcow | 1M (header size)                        | 1G
-----|-----------------------------------------|-------------
raw  | preallocated: disk capacity (getSize()) |disk capacity
     | thin (sparse): 1M (as qcow)             |


For cloned disks:
raw:  min(used, capacity).
qcow: 1.1*raw
  
Actual results:
Incorrect memory allocation (not compliant with the above) in all disks allocations throughout the system.

Expected results:
Correct memory allocation.

Additional info:

Comment 1 Alissa 2013-05-12 09:10:02 UTC
(In reply to comment #0)

Refining with Allon the requirements...

For new empty disks (unchanged from the original)
      | File Domain                             | Block Domain
 -----|-----------------------------------------|-------------
 qcow | 1M (header size)                        | 1G
 -----|-----------------------------------------|-------------
 raw  | preallocated: disk capacity (getSize()) | disk capacity
      | thin (sparse): 1M                       | (there is no raw sparse on 
                                                   block domains)


For cloned disks:
      | File Domain                             | Block Domain
 -----|-----------------------------------------|-------------
 qcow | preallocated : 1.1 * disk capacity      |1.1 * min(used ,capacity) 
      | sparse: 1.1 * min(used ,capacity)       |
 -----|-----------------------------------------|-------------
 raw  | preallocated: disk capacity             |disk capacity
      | sparse: min(used,capacity)

Comment 2 Allon Mureinik 2013-10-30 16:47:37 UTC
*** Bug 1024269 has been marked as a duplicate of this bug. ***

Comment 18 Allon Mureinik 2014-01-01 16:33:58 UTC
Sean, Ayal, I'd like your input as to how to treat this bug.

The two patches in the external tracker introduced all the infrastrcuture required to implement the solution for this bug:

1. http://gerrit.ovirt.org/#/c/15377/
Added the ability to CORRECTLY validate the storage requirements for a newly created volume, and applied it to the Add Disk flow.

2. http://gerrit.ovirt.org/#/c/22447/
Added the ability to CORRECTLY validate the storage requirements for a new volume that should contain data, and applied to the Move Disk (both cold and live) flows.

There are, of course, several other flows that are still lacking - create VM, create template, create snapshot, etc...
Should each have a separate bug?
Should we have a single bug and close it only when we're finished with all of them?

Comment 19 Ayal Baron 2014-01-01 19:59:57 UTC
(In reply to Allon Mureinik from comment #18)
> Sean, Ayal, I'd like your input as to how to treat this bug.
> 
> The two patches in the external tracker introduced all the infrastrcuture
> required to implement the solution for this bug:
> 
> 1. http://gerrit.ovirt.org/#/c/15377/
> Added the ability to CORRECTLY validate the storage requirements for a newly
> created volume, and applied it to the Add Disk flow.
> 
> 2. http://gerrit.ovirt.org/#/c/22447/
> Added the ability to CORRECTLY validate the storage requirements for a new
> volume that should contain data, and applied to the Move Disk (both cold and
> live) flows.
> 
> There are, of course, several other flows that are still lacking - create
> VM, create template, create snapshot, etc...
> Should each have a separate bug?
> Should we have a single bug and close it only when we're finished with all
> of them?

depends on when you want it checked.
in general I'd go for a single bug and specify all the scenarios that need to be covered and how to test them.
But this is only assuming that all the patches get in in a short time frame.

Comment 20 Allon Mureinik 2014-01-15 16:33:53 UTC
(In reply to Ayal Baron from comment #19)
> (In reply to Allon Mureinik from comment #18)
> > Sean, Ayal, I'd like your input as to how to treat this bug.
> > 
> > The two patches in the external tracker introduced all the infrastrcuture
> > required to implement the solution for this bug:
> > 
> > 1. http://gerrit.ovirt.org/#/c/15377/
> > Added the ability to CORRECTLY validate the storage requirements for a newly
> > created volume, and applied it to the Add Disk flow.
> > 
> > 2. http://gerrit.ovirt.org/#/c/22447/
> > Added the ability to CORRECTLY validate the storage requirements for a new
> > volume that should contain data, and applied to the Move Disk (both cold and
> > live) flows.
> > 
> > There are, of course, several other flows that are still lacking - create
> > VM, create template, create snapshot, etc...
> > Should each have a separate bug?
> > Should we have a single bug and close it only when we're finished with all
> > of them?
> 
> depends on when you want it checked.
> in general I'd go for a single bug and specify all the scenarios that need
> to be covered and how to test them.
> But this is only assuming that all the patches get in in a short time frame.

Due to priority considerations, we cannot guarantee that.
I'll leave this BZ as a tracker, and open a new BZ for each usecase, so we can tackle (and test!) them as they come.

Comment 21 Vered Volansky 2014-10-19 12:27:57 UTC
Some notes regarding verification of the dependent bugs:

First, we have two use cases for volume creation:
1. Empty Volumes - Create a new disk, create snapshot (a new, empty volume is created in this case).
2. Cloned volumes - An empty volume is created, immediately followed by filling in with the data (When moving a disk, cloning, exporting, creating from a template/snapshot etc.)
All the allocation-related scenarios in the system can be taken apart to consist one or both of these cases.

When verifying, both positive and negative test cases should be verified: Verify against a storage domain with and without enough space - just about the needed/excess space for the disks. Verify it succeeds when there's enough space and fails on CDA when it doesn't.

Usually threshold should also be verified for a complete verification (again, positive and negative).

If we're dealing with a VM with memory volumes, their space should also be taken into consideration (in some cases for every snapshot, depending on the use case).
The metadata volume's is always of size 10KB, Cow, Sparse. In file domains this size is negligible, but bare in mind that block domains have a minimum block size of 1G.
The memory volume's size varies, is always RAW, and is Sparse for file domains and Preallocated for block domains.

New volumes are always RAW, except thinly provisioned on block domains.
Snapshots are always qcow and consist of only a header. The domain's type (file/block) determines the minimum size for this header: 1M for File and 1G for Block.

Comment 22 Allon Mureinik 2014-11-07 14:42:26 UTC
This is a tracker bug.
Since all the dependent bugs are MODIFIED (or better), moving this one to MODIFIED too.

Comment 23 Allon Mureinik 2014-12-16 12:53:45 UTC
This bug depends on several RHEV bugs which are all ON_QA and several oVirt bugs in MODIFIED.
The oVirt bugs will be moved to ON_QA when oVirt 3.5.1 is built. However, this code is included in RHEV 3.5.0's recent builds, so the RHEV tracker can be moved to ON_QA too.

Comment 24 Vered Volansky 2015-01-04 06:14:23 UTC
More details as to how to verify were added in comment #21.
Note that the storage allocations implementation ALWAYS took into consideration the threshold limitations. Storage devel advised how this should be verified, since it seemed to be unclear. No advice was entered on thresholds since this is clear. Just so there are no misunderstandings - thresholds are being validated in the code throughout all these bugs.

Comment 25 Allon Mureinik 2015-01-18 08:21:40 UTC
Clearly there are some items here that failed qa and require fixing, but we won't delay 3.5.0 for this. Pushing out to 3.5.1 to handle properly.

Comment 26 Yaniv Lavi 2015-01-20 14:01:20 UTC
Moving to 3.6.0 since this will not converge to 3.5.z.

Comment 27 Eyal Edri 2015-08-13 10:37:09 UTC
moving old bug fixed before ovirt alpha release as fixed in current beta2, 
3.6.0-9.

Comment 28 Aharon Canan 2015-11-19 14:28:40 UTC
Moving tracker to verified as all "depends on" bugs on verified/closed