Description of problem: When glusterd goes down in all the nodes in the cluster, volume status, self-heal, geo-rep displays status as 'UNKNOW' with status information ' UNKNOWN: temporary error'. Status information needs to be improved as it is not a temporary error, since glusterd went down in all the nodes. Version-Release number of selected component (if applicable): nagios-server-addons-0.1.9-1.el6rhs.noarch How reproducible: Always Steps to Reproduce: 1. Install nagios on RHS node. 2. Run discovery.py and start monitoring the nodes. 3. stop glusterd in all the nodes by running the command "service glusterd stop". Actual results: volume status, volume Self-heal, Volume Geo-Replication gives status as 'UNKNOWN' with status Information as "UNKOWN: temporary error". Expected results: Status information for these services needs to be improved. Additional info:
Similar behavior is seen for Volume Quota services too.
Enhance the message to suggest to user that issues may be with glusterd. Change temporary error - Glusterd cannot be queried.
Patch sent to upstream for review: http://review.gluster.org/10421
Please put the FIV for this bug
Verified and works with build nagios-server-addons-0.2.0-1.el6rhs.noarch. When glusterd is down in all the nodes of the cluster, Volume Geo Replication ,Volume status,Volume Utilization status is shown as "UNKNOWN" with status information "UNKNOWN: NO hosts(with state UP) found in the cluster". Brick status is shown as "UNKNOWN" with status information as "UNKNOWN: Status could not be determined as glusterd is not running"
Tim, Kindly review and sign-off the edited doc text.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2015-1494.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days