1141171 – [Nagios] Quorum service is seen to be OK when it should actually be in CRITICAL state

Bug 1141171 - [Nagios] Quorum service is seen to be OK when it should actually be in CRITICAL state

Summary: [Nagios] Quorum service is seen to be OK when it should actually be in CRITIC...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-nagios-addons
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.0.3
Assignee:	Sahina Bose
QA Contact:	Shruti Sampat
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-09-12 11:33 UTC by Shruti Sampat
Modified:	2015-05-13 17:41 UTC (History)
CC List:	8 users (show)
Fixed In Version:	gluster-nagios-addons-0.1.12-1.el6rhs
Doc Type:	Bug Fix
Doc Text:	Previously, the status of the quorum service displayed an incorrect status. With this fix, a buffering issue is fixed and the quorum service displays the appropriate status.
Clone Of:
Environment:
Last Closed:	2015-01-15 13:49:46 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:0039	0	normal	SHIPPED_LIVE	Red Hat Storage Console 3.0 enhancement and bug fix update #3	2015-01-15 18:46:40 UTC

Description Shruti Sampat 2014-09-12 11:33:44 UTC

Description of problem:
------------------------

When a node in the cluster was brought down, to cause quorum to be lost, the status of cluster-quorum service changed to critical. Then the state changed to ok while the node was still down. The state should not have changed to OK as quorum was still lost.

Version-Release number of selected component (if applicable):
--------------------------------------------------------------

nagios-server-addons-0.1.6-1.el6rhs.noarch
gluster-nagios-addons-0.1.10-2.el6rhs.x86_64

How reproducible:
-----------------

Intermittently

Steps to Reproduce:
-------------------

1. Create a cluster of 2 RHS nodes and start monitoring it using nagios.
2. Create 2 volumes of distribute-replicate type, set server quorum and quorum ratio to 80%
3. Bring down one of the nodes, so quorum is lost.

Actual results:
----------------

The state of the service changed to critical, and then to ok.

Expected results:
------------------

The state of the service should have remained as critical as quorum was lost.

Additional info:

Comment 1 Sahina Bose 2014-09-15 11:30:45 UTC

http://review.gluster.org/8740

Comment 2 Sahina Bose 2014-10-29 05:20:49 UTC

The issue was due to buffered messages by the plugin that processes the syslog messages. This happens when the quorum is regained and lost again.

The earlier message for quorum regained was in the buffer and resend to nagios when the quorum lost messages were received. This behaviour is fixed with the attached patch

Comment 3 Shruti Sampat 2014-11-14 11:56:43 UTC

Verified as fixed in gluster-nagios-addons-0.1.12-1.el6rhs.x86_64

Quorum service remains in critical state when quorum is lost.

Comment 4 Shalaka 2014-11-27 06:24:39 UTC

Please add doc text for this bug.

Comment 5 Pavithra 2014-12-17 06:34:24 UTC

Hi Sahina,

Can you please review the edited doc text for technical accuracy and sign off?

Comment 6 Sahina Bose 2014-12-24 09:12:06 UTC

Looks good

Comment 8 errata-xmlrpc 2015-01-15 13:49:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0039.html

Note You need to log in before you can comment on or make changes to this bug.