Bug 1141171 - [Nagios] Quorum service is seen to be OK when it should actually be in CRITICAL state
Summary: [Nagios] Quorum service is seen to be OK when it should actually be in CRITIC...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: gluster-nagios-addons
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: RHGS 3.0.3
Assignee: Sahina Bose
QA Contact: Shruti Sampat
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-09-12 11:33 UTC by Shruti Sampat
Modified: 2015-05-13 17:41 UTC (History)
8 users (show)

Fixed In Version: gluster-nagios-addons-0.1.12-1.el6rhs
Doc Type: Bug Fix
Doc Text:
Previously, the status of the quorum service displayed an incorrect status. With this fix, a buffering issue is fixed and the quorum service displays the appropriate status.
Clone Of:
Environment:
Last Closed: 2015-01-15 13:49:46 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0039 0 normal SHIPPED_LIVE Red Hat Storage Console 3.0 enhancement and bug fix update #3 2015-01-15 18:46:40 UTC

Description Shruti Sampat 2014-09-12 11:33:44 UTC
Description of problem:
------------------------

When a node in the cluster was brought down, to cause quorum to be lost, the status of cluster-quorum service changed to critical. Then the state changed to ok while the node was still down. The state should not have changed to OK as quorum was still lost.

Version-Release number of selected component (if applicable):
--------------------------------------------------------------

nagios-server-addons-0.1.6-1.el6rhs.noarch
gluster-nagios-addons-0.1.10-2.el6rhs.x86_64

How reproducible:
-----------------

Intermittently

Steps to Reproduce:
-------------------

1. Create a cluster of 2 RHS nodes and start monitoring it using nagios.
2. Create 2 volumes of distribute-replicate type, set server quorum and quorum ratio to 80%
3. Bring down one of the nodes, so quorum is lost.

Actual results:
----------------

The state of the service changed to critical, and then to ok.

Expected results:
------------------

The state of the service should have remained as critical as quorum was lost.

Additional info:

Comment 1 Sahina Bose 2014-09-15 11:30:45 UTC
http://review.gluster.org/8740

Comment 2 Sahina Bose 2014-10-29 05:20:49 UTC
The issue was due to buffered messages by the plugin that processes the syslog messages. This happens when the quorum is regained and lost again.

The earlier message for quorum regained was in the buffer and resend to nagios when the quorum lost messages were received. This behaviour is fixed with the attached patch

Comment 3 Shruti Sampat 2014-11-14 11:56:43 UTC
Verified as fixed in gluster-nagios-addons-0.1.12-1.el6rhs.x86_64

Quorum service remains in critical state when quorum is lost.

Comment 4 Shalaka 2014-11-27 06:24:39 UTC
Please add doc text for this bug.

Comment 5 Pavithra 2014-12-17 06:34:24 UTC
Hi Sahina,

Can you please review the edited doc text for technical accuracy and sign off?

Comment 6 Sahina Bose 2014-12-24 09:12:06 UTC
Looks good

Comment 8 errata-xmlrpc 2015-01-15 13:49:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0039.html


Note You need to log in before you can comment on or make changes to this bug.