Bug 1119717 - [Nagios] Status of quota monitoring service changed to warning from critical, even though the hard limit on a directory was reached
Summary: [Nagios] Status of quota monitoring service changed to warning from critical,...
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: gluster-nagios-addons
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: RHGS 3.0.0
Assignee: Sahina Bose
QA Contact: Shruti Sampat
Depends On:
TreeView+ depends on / blocked
Reported: 2014-07-15 10:58 UTC by Shruti Sampat
Modified: 2015-05-13 16:53 UTC (History)
5 users (show)

Fixed In Version: gluster-nagios-addons-0.1.10-1.el6rhs
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2014-09-22 19:11:50 UTC

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:1277 normal SHIPPED_LIVE Red Hat Storage Console 3.0 enhancement and bug fix update 2014-09-22 23:06:30 UTC

Description Shruti Sampat 2014-07-15 10:58:38 UTC
Description of problem:

While the usage of a particular directory had reached the hard limit, and the quota service was in critical state, usage on another directory reached the soft limit. This caused the service to change state from critical to warning which *should not* happen, with usage on one directory having reached the hard limit.

The status information of the service was as follows when the service was critical -

QUOTA:hard limit reached on /test1:

While usage of test1 was still equal to the hard limit, the usage on / exceeded the soft limit, this is when the status of the service changed to warning from critical, and the status information was as follows -

QUOTA: Usage is above soft limit: 57.1GB used by /

Notifications via e-mail and SNMP were sent for this change of state.

After a while the state of the service changed back to critical with the following status information - 

QUOTA:hard limit reached on /test1: soft limit exceeded on /

The above message is the expected status information. Notifications were again sent.

Version-Release number of selected component (if applicable):

How reproducible:
Saw it once.

Steps to Reproduce:
1. On the client machine, write data to directory test1 at the mount point in such a way that the soft limit is exceeded.
2. Continue to write data to test such that the hard limit is reached. Writes are now disallowed inside test.
3. Repeat step 1 for directory / (the mount of the volume).

Actual results:
The actual result seen is as described above, the status of the service changed from critical to warning, even though usage on one directory had reached the hard limit.

Expected results:
Expected is that the status of the service remain in critical state with the status information being updated appropriately.

Additional info:

Comment 1 Shruti Sampat 2014-07-15 11:40:34 UTC

Following is the output of # gluster volume info, for the volume for which I saw the above behavior.

[root@rhs ~]# gluster v info vol5 
Volume Name: vol5
Type: Distributed-Replicate
Volume ID: 2f735024-2972-4998-a39f-a55a82526ae2
Status: Started
Snap Volume: no
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Options Reconfigured:
features.quota: on
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
cluster.server-quorum-ratio: 80%

Note : The options alert-time, soft-timeout and hard-timeout were not set.

Comment 3 Sahina Bose 2014-07-16 05:49:44 UTC
http://review.gluster.org/#/c/8314 - Patch posted to disable NSCA notifications.

Since we currently have both NRPE poll-based check and NSCA push-based notification, the push notification was overriding the nrpe check result.

Disabling the push notification.

Comment 5 Sahina Bose 2014-07-24 15:40:28 UTC
The change is in the /etc/rsyslog.d/glusternagios.conf - the filter that looks for quota messages has been removed.

Comment 6 Shruti Sampat 2014-08-05 10:20:54 UTC
Verified as fixed in gluster-nagios-addons-0.1.10-2.el6rhs.x86_64

Comment 7 errata-xmlrpc 2014-09-22 19:11:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.