Bug 1119717

Summary: [Nagios] Status of quota monitoring service changed to warning from critical, even though the hard limit on a directory was reached
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Shruti Sampat <ssampat>
Component: gluster-nagios-addonsAssignee: Sahina Bose <sabose>
Status: CLOSED ERRATA QA Contact: Shruti Sampat <ssampat>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.0CC: dpati, esammons, kmayilsa, rhs-bugs, rhsc-qe-bugs
Target Milestone: ---   
Target Release: RHGS 3.0.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: gluster-nagios-addons-0.1.10-1.el6rhs Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-22 19:11:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shruti Sampat 2014-07-15 10:58:38 UTC
Description of problem:
------------------------

While the usage of a particular directory had reached the hard limit, and the quota service was in critical state, usage on another directory reached the soft limit. This caused the service to change state from critical to warning which *should not* happen, with usage on one directory having reached the hard limit.

The status information of the service was as follows when the service was critical -

QUOTA:hard limit reached on /test1:

While usage of test1 was still equal to the hard limit, the usage on / exceeded the soft limit, this is when the status of the service changed to warning from critical, and the status information was as follows -

QUOTA: Usage is above soft limit: 57.1GB used by /

Notifications via e-mail and SNMP were sent for this change of state.

After a while the state of the service changed back to critical with the following status information - 

QUOTA:hard limit reached on /test1: soft limit exceeded on /

The above message is the expected status information. Notifications were again sent.

Version-Release number of selected component (if applicable):
gluster-nagios-addons-0.1.9-1.el6rhs.x86_64

How reproducible:
Saw it once.

Steps to Reproduce:
1. On the client machine, write data to directory test1 at the mount point in such a way that the soft limit is exceeded.
2. Continue to write data to test such that the hard limit is reached. Writes are now disallowed inside test.
3. Repeat step 1 for directory / (the mount of the volume).

Actual results:
The actual result seen is as described above, the status of the service changed from critical to warning, even though usage on one directory had reached the hard limit.

Expected results:
Expected is that the status of the service remain in critical state with the status information being updated appropriately.

Additional info:

Comment 1 Shruti Sampat 2014-07-15 11:40:34 UTC
Hi,

Following is the output of # gluster volume info, for the volume for which I saw the above behavior.

[root@rhs ~]# gluster v info vol5 
 
Volume Name: vol5
Type: Distributed-Replicate
Volume ID: 2f735024-2972-4998-a39f-a55a82526ae2
Status: Started
Snap Volume: no
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.37.182:/rhs/brick4/b1
Brick2: 10.70.37.73:/rhs/brick4/b1
Brick3: 10.70.37.112:/rhs/brick4/b1
Brick4: 10.70.37.79:/rhs/brick4/b1
Options Reconfigured:
features.quota: on
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
cluster.server-quorum-ratio: 80%


Note : The options alert-time, soft-timeout and hard-timeout were not set.

Comment 3 Sahina Bose 2014-07-16 05:49:44 UTC
http://review.gluster.org/#/c/8314 - Patch posted to disable NSCA notifications.

Since we currently have both NRPE poll-based check and NSCA push-based notification, the push notification was overriding the nrpe check result.

Disabling the push notification.

Comment 5 Sahina Bose 2014-07-24 15:40:28 UTC
The change is in the /etc/rsyslog.d/glusternagios.conf - the filter that looks for quota messages has been removed.

Comment 6 Shruti Sampat 2014-08-05 10:20:54 UTC
Verified as fixed in gluster-nagios-addons-0.1.10-2.el6rhs.x86_64

Comment 7 errata-xmlrpc 2014-09-22 19:11:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1277.html