Bug 1184807

Summary: Storage thresholds should not be inclusive
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: ovirt-engineAssignee: Vered Volansky <vered>
Status: CLOSED ERRATA QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.4.3CC: acanan, agkesos, amureini, ecohen, gklein, iheim, juwu, kgoldbla, lpeer, lsurette, pstehlik, rbalakri, Rhev-m-bugs, sbonazzo, tnisan, vered, yeylon, ylavi
Target Milestone: ---Keywords: ZStream
Target Release: 3.5.1Flags: ylavi: Triaged+
Hardware: All   
OS: All   
Whiteboard: storage
Fixed In Version: org.ovirt.engine-root-3.5.1-1 Doc Type: Bug Fix
Doc Text:
Previously, less than or equal to (<=) was used in monitoring storage free space. Alerts for low disk space were triggered when they shouldn't have. With this update, when monitoring the storage free space, less than (<) is now used. Alerts for low disk space are now generated appropriately.
Story Points: ---
Clone Of: 1178480 Environment:
Last Closed: 2015-04-28 18:44:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1178480    
Bug Blocks: 960934, 1186161, 1193058, 1197441    
Attachments:
Description Flags
server, engine and vdsm logs
none
server, engine and vdsm logs none

Comment 2 Allon Mureinik 2015-03-05 14:49:46 UTC
Steps to reproduce:

1. Create a block storage (iSCSI/FCP) domain of 10G.
2. Create 1 preallocated disk of vitrtualSize=actualSize = 4G

Expected result without the fix:
 - warning in the audit log

Expected result with the fix:
 - no warning

Comment 4 Kevin Alon Goldblatt 2015-03-05 22:46:36 UTC
Created attachment 998595 [details]
server, engine and vdsm logs

Added logs

Comment 5 Vered Volansky 2015-03-08 06:03:04 UTC
Kevin, what is the FreeSpaceCriticalLow's value?
It's set to 5GB by default, so from your test it seems we have 2g<5G, which is OK.
If you have 6G free space and want to test this bug, you should preallocate a 1G disk.

Allon, I don't understand your comment#2 as well,  looks like there shouldn't be a warning at all with this scenario.

Comment 6 Allon Mureinik 2015-03-08 08:19:31 UTC
(In reply to Vered Volansky from comment #5)
> Kevin, what is the FreeSpaceCriticalLow's value?
> It's set to 5GB by default, so from your test it seems we have 2g<5G, which
> is OK.
> If you have 6G free space and want to test this bug, you should preallocate
> a 1G disk.
> 
> Allon, I don't understand your comment#2 as well,  looks like there
> shouldn't be a warning at all with this scenario.

In comment #2, the scenario will result in (6GB - epsilon) free space. Since we use ints, that's evaluated as 5, which will produce a warning if we use <= instead of <.

Comment 7 Vered Volansky 2015-03-08 11:47:58 UTC
Kevin, to conclude:
Please verify no alerts using a domain with 5GB <= freeSPace < 6GB .
Any freeSpace < 5GB should yield a warning.

Comment 8 Kevin Alon Goldblatt 2015-03-09 13:01:53 UTC
I ran the following scenario:

Created a LUN of 11g on the storage server

Created a Block storage domain using the 11g LUN

Storage domain displays Virtual size as 10G, Free space 6g (2 OVF disks were created on the LUN)

Created a 1g Preallocated block disk - PASSED
Storage domain displays Virtual size as 10G, Free space 5g (No warning is displayed)

Created a 1g Preallocated block disk - PASSED
Storage domain displays Virtual size as 10G, Free space 4g (No warning is displayed)

Created a 1g Preallocated block disk - FAILS
Storage domain displays Virtual size as 10G, Free space 4g (Lo disk space error displayed)


THEN

Deleted one of the 2 1g disks previously created - PASSED
Storage domain displays Virtual size as 10G, Free space 5g (No warning is displayed)


Created a 2g Preallocated block disk - PASSED
Storage domain displays Virtual size as 10G, Free space 3g (No warning is displayed) (This is problematic)

Comment 9 Kevin Alon Goldblatt 2015-03-09 13:22:56 UTC
Adding logs:
(In reply to Kevin Alon Goldblatt from comment #8)
> I ran the following scenario:
> 
> Created a LUN of 11g on the storage server
> 
> Created a Block storage domain using the 11g LUN
> 
> Storage domain displays Virtual size as 10G, Free space 6g (2 OVF disks were
> created on the LUN)
> 
> Created a 1g Preallocated block disk - PASSED
> Storage domain displays Virtual size as 10G, Free space 5g (No warning is
> displayed)
> 
> Created a 1g Preallocated block disk - PASSED
> Storage domain displays Virtual size as 10G, Free space 4g (No warning is
> displayed)
> 
> Created a 1g Preallocated block disk - FAILS
> Storage domain displays Virtual size as 10G, Free space 4g (Lo disk space
> error displayed)
> 
ERROR in ENGINE.LOG.................................................
2015-03-08 17:36:43,457 WARN  [org.ovirt.engine.core.bll.AddDiskCommand] (ajp-/127.0.0.1:8702-7) [2318e604] CanDoAction of action AddDisk failed for user admin@inte
rnal. Reasons: VAR__ACTION__ADD,VAR__TYPE__VM_DISK,ACTION_TYPE_FAILED_DISK_SPACE_LOW_ON_STORAGE_DOMAIN,$storageName block_11g
2015-03-08 17:38:49,508 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-30) [749faddb] Correlation ID: n
ull, Call Stack: null, Custom Event ID: -1, Message: Critical, Low disk space. nfs2 domain has 4 GB of free space


> 
> THEN
> 
> Deleted one of the 2 1g disks previously created - PASSED
> Storage domain displays Virtual size as 10G, Free space 5g (No warning is
> displayed)

FROM ENGINE.LOG..........................
2015-03-08 17:39:34,571 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-22) [2ff6b5a0] Correlation ID: 503b611e, Job ID: 0c0d01e1-cfe4-4252-b3a6-dcd646846b3d, Call Stack: null, Custom Event ID: -1, Message: Disk 1g_b was successfully removed from domain block_11g (User admin@internal).

> 
> 
> Created a 2g Preallocated block disk - PASSED
> Storage domain displays Virtual size as 10G, Free space 3g (No warning is
> displayed) (This is problematic)

FROM ENGINE.LOG...............................
2015-03-08 17:55:51,972 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-7) Correlation ID: 44629d9f, Job ID: 4d4306b9-bb85-4913-8da3-226da28f2393, Call Stack: null, Custom Event ID: -1, Message: The disk '2g' was successfully added.



SEVERAL HOURS LATER DURING during ProcessOvf_For_StorageDomainCommand the storage domain is reported as having low disk space.

FROM ENGINE.LOG.................................

2015-03-09 00:11:30,824 INFO  [org.ovirt.engine.core.bll.ProcessOvfUpdateForStorageDomainCommand] (DefaultQuartzScheduler_Worker-64) [b525ba2] Lock freed to object 
EngineLock [exclusiveLocks= key: 88eb14b4-a7fc-40ab-be6b-e2ab7bc31dbc value: STORAGE
, sharedLocks= key: 6d96f52d-d791-4f66-83bd-2553ca0f3012 value: OVF_UPDATE
]
2015-03-09 00:23:46,058 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-33) [6d58ef36] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Warning, Low disk space.block_11g domain has 3 GB of free space

Comment 10 Kevin Alon Goldblatt 2015-03-09 13:24:28 UTC
Created attachment 999520 [details]
server, engine and vdsm logs

Adding logs

Comment 11 Vered Volansky 2015-03-12 13:18:31 UTC
Works for me, discussed with Kevin, who verified.
The issue with the verification was caused by the anti-flooding of log messages mechanism.
The behaviour was OK, slightly different verification method and monitoring was conducted. Kevin will fill in and move to verified.

Comment 12 Allon Mureinik 2015-03-12 13:27:59 UTC
Vered, this BZ implies a (slight) behavior change in RHEV. Please provide doctext that explains the change.
Thanks!

Comment 13 Kevin Alon Goldblatt 2015-03-12 13:32:37 UTC
Tested the following scenario:

Created a LUN of 11g on the storage server

Created a Block storage domain using the 11g LUN

Storage domain displays Virtual size as 10G, Free space 6g (2 OVF disks were created on the LUN)

Created a 1g Preallocated block disk - PASSED
Storage domain displays Virtual size as 10G, Free space 5g (No warning is displayed)

Created a 1g Preallocated block disk - PASSED
Storage domain displays Virtual size as 10G, Free space 4g - warning displayed as follows:
Warning, Low disk space.block1 domain has 4 GB of free space

The reason I had no warning in the previous scenario is due to the following:

When a storage domain reports low disk space for the first time, the "flooding mechanism" prevents recurring messages from being displayed.


Moving to verified

Comment 14 Julie 2015-03-19 08:15:49 UTC
Hi Vered,
   I have updated the doc text. Please let me know if it is correct or not.

Kind regards,
Julie

Comment 15 Vered Volansky 2015-03-19 15:22:34 UTC
Hi Julie, 

I'd like to get rid of the threshold term her, since other changes are on their way since I wrote this doc-text.
We DIDN'T change number presentation, we're still using integers.
The only change was from <= to <.
The reason this solves the issue is the truncation is no longer an issue.
For example, if before we had 5.5GB free space, this would be truncated to 5, and when compared with 5 using <=, the answer was yes. So for 5.5GB free space we would send an alert, though we meant to do so only under 5GB.
Now it's still truncated to 5, but comparison to 5 using < yields false, meaning no alert is generated.

I suggest the following change, not including the </<=/truncation I added before, I'll leave that to you since I don't know how deep you want to dive into this...

Previously, less than or equal to (<=) was used when monitoring storage free space. In addition, integer numbers were used and caused fractions to be truncated. This triggered alerts for low disk space when it shouldn't have. With this update, when checking storage free space, less than (<) is now used. Alerts for low disk space are now generated appropriately.

Comment 16 errata-xmlrpc 2015-04-28 18:44:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0888.html