Description of problem:
While the usage of a particular directory had reached the hard limit, and the quota service was in critical state, usage on another directory reached the soft limit. This caused the service to change state from critical to warning which *should not* happen, with usage on one directory having reached the hard limit.
The status information of the service was as follows when the service was critical -
QUOTA:hard limit reached on /test1:
While usage of test1 was still equal to the hard limit, the usage on / exceeded the soft limit, this is when the status of the service changed to warning from critical, and the status information was as follows -
QUOTA: Usage is above soft limit: 57.1GB used by /
Notifications via e-mail and SNMP were sent for this change of state.
After a while the state of the service changed back to critical with the following status information -
QUOTA:hard limit reached on /test1: soft limit exceeded on /
The above message is the expected status information. Notifications were again sent.
Version-Release number of selected component (if applicable):
Saw it once.
Steps to Reproduce:
1. On the client machine, write data to directory test1 at the mount point in such a way that the soft limit is exceeded.
2. Continue to write data to test such that the hard limit is reached. Writes are now disallowed inside test.
3. Repeat step 1 for directory / (the mount of the volume).
The actual result seen is as described above, the status of the service changed from critical to warning, even though usage on one directory had reached the hard limit.
Expected is that the status of the service remain in critical state with the status information being updated appropriately.
Following is the output of # gluster volume info, for the volume for which I saw the above behavior.
[root@rhs ~]# gluster v info vol5
Volume Name: vol5
Volume ID: 2f735024-2972-4998-a39f-a55a82526ae2
Snap Volume: no
Number of Bricks: 2 x 2 = 4
Note : The options alert-time, soft-timeout and hard-timeout were not set.
http://review.gluster.org/#/c/8314 - Patch posted to disable NSCA notifications.
Since we currently have both NRPE poll-based check and NSCA push-based notification, the push notification was overriding the nrpe check result.
Disabling the push notification.
The change is in the /etc/rsyslog.d/glusternagios.conf - the filter that looks for quota messages has been removed.
Verified as fixed in gluster-nagios-addons-0.1.10-2.el6rhs.x86_64
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.