Description of problem: ------------------------ When a node in the cluster was brought down, to cause quorum to be lost, the status of cluster-quorum service changed to critical. Then the state changed to ok while the node was still down. The state should not have changed to OK as quorum was still lost. Version-Release number of selected component (if applicable): -------------------------------------------------------------- nagios-server-addons-0.1.6-1.el6rhs.noarch gluster-nagios-addons-0.1.10-2.el6rhs.x86_64 How reproducible: ----------------- Intermittently Steps to Reproduce: ------------------- 1. Create a cluster of 2 RHS nodes and start monitoring it using nagios. 2. Create 2 volumes of distribute-replicate type, set server quorum and quorum ratio to 80% 3. Bring down one of the nodes, so quorum is lost. Actual results: ---------------- The state of the service changed to critical, and then to ok. Expected results: ------------------ The state of the service should have remained as critical as quorum was lost. Additional info:
http://review.gluster.org/8740
The issue was due to buffered messages by the plugin that processes the syslog messages. This happens when the quorum is regained and lost again. The earlier message for quorum regained was in the buffer and resend to nagios when the quorum lost messages were received. This behaviour is fixed with the attached patch
Verified as fixed in gluster-nagios-addons-0.1.12-1.el6rhs.x86_64 Quorum service remains in critical state when quorum is lost.
Please add doc text for this bug.
Hi Sahina, Can you please review the edited doc text for technical accuracy and sign off?
Looks good
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0039.html