Description of problem: ------------------------ Volume quota status monitoring service changes it's status to CRITICAL with the following status information very often - "CHECK_NRPE: Socket timeout after 10 seconds." At other times, the correct status is shown, but because of this change to CRITICAL very often, the service starts flapping. Version-Release number of selected component (if applicable): gluster-nagios-addons-0.1.0-25.git25f0bba.el6.x86_64 How reproducible: Very frequently Steps to Reproduce: 1. Configure quota on a volume and start monitoring it using Nagios. 2. Keep checking the status and the status information for the volume status quota service. Actual results: The status changes to CRITICAL with the status information - "CHECK_NRPE: Socket timeout after 10 seconds." Expected results: If 10 seconds is not enough time for this command to execute and return the result, then the default time-out should be increased. Additional info:
http://review.gluster.org/#/c/7682/ There is a mechanism provided in check_vol_server.py so set the timeout with "-t <sec>" option. So if a service (say quota) takes more time, the gluster-commands.cfg can be modifed for the same service to pass -t value for the command and it should work fine.
Verified as fixed in nagios-server-addons-0.1.0-82.git77df8ca.el6rhs.x86_64 Updated gluster-commands.cfg with a timeout value different from the default value of 10 using the option -t and it works.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1277.html