Bug 1109744 - [Nagios] notifications are not sent when quorum is lost for multiple volumes, one after the other
Summary: [Nagios] notifications are not sent when quorum is lost for multiple volumes,...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: gluster-nagios-addons
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: RHGS 3.1.0
Assignee: Sahina Bose
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On:
Blocks: 1087818 1202842
TreeView+ depends on / blocked
 
Reported: 2014-06-16 10:00 UTC by Shruti Sampat
Modified: 2015-07-29 05:26 UTC (History)
8 users (show)

Fixed In Version: gluster-nagios-addons-0.2.0-1
Doc Type: Bug Fix
Doc Text:
Previously, there was a misleading notification message that quorum is lost for only one volume even if multiple volumes have lost quorum. With this fix, the notification message is corrected to inform the user that the quorum is lost on the entire cluster.
Clone Of:
Environment:
Last Closed: 2015-07-29 05:26:21 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2015:1494 0 normal SHIPPED_LIVE Red Hat Gluster Storage Console 3.1 Enhancement and bug fixes 2015-07-29 09:24:02 UTC

Description Shruti Sampat 2014-06-16 10:00:11 UTC
Description of problem:
-------------------------

Consider a cluster having two volume, say vol1 and vol2, both with server-side quorum enabled.

When quorum is lost for vol1, the status of the cluster-quorum service changes to critical, and the status information reads - 

QUORUM: Server quorum lost for volume vol1. Stopping local bricks.

A notification is sent via e-mail and SNMP traps.

If quorum is lost for vol2, later, (before it was regained for vol1), the status of the service would remain critical. The status information would read - 

QUORUM: Server quorum lost for volume vol2. Stopping local bricks.

Since the status of the quorum service did not change, notifications will not be sent. 

Version-Release number of selected component (if applicable):
gluster-nagios-addons-0.1.2-1.el6rhs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create two distributed-replicate volumes, vol1 and vol2.
2. Cause quorum to be lost for vol1, observe that the status of the service changes to critical and that notifications are sent.
3. Cause quorum to be lost for vol2.

Actual results:
The status of the service remains critical, hence status change is not involved and notifications are not sent.

Expected results:
Notifications should be sent whenever quorum is lost/regained for any volume in the cluster.

Additional info:

Comment 1 Shalaka 2014-06-18 05:58:27 UTC
Please add doc text for the known issue

Comment 2 Shalaka 2014-06-24 16:55:01 UTC
Please review and signoff edited doc text.

Comment 3 Sahina Bose 2014-09-04 05:16:24 UTC
Looks good to me

Comment 4 Sahina Bose 2015-02-09 07:28:50 UTC
As per redesign, notification is sent only once as Quorum is cluster level service

Comment 5 RamaKasturi 2015-06-01 11:33:51 UTC
Verified and works fine with build gluster-nagios-addons-0.2.0-1.

As per comment 7, notification is sent only once as Quorum is cluster level service. Below is the way notification comes.

** PROBLEM Service Alert: cluster1/Cluster - Quorum is CRITICAL **

***** Nagios *****

Notification Type: PROBLEM

Service: Cluster - Quorum
Host: cluster1
Address: cluster1
State: CRITICAL

Date/Time: Mon Jun 1 16:46:20 IST 2015

Additional Info:

QUORUM: Cluster server-side quorum lost.

When it is regained, email notification is sent as :

** RECOVERY Service Alert: cluster1/Cluster - Quorum is OK **

***** Nagios *****

Notification Type: RECOVERY

Service: Cluster - Quorum
Host: cluster1
Address: cluster1
State: OK

Date/Time: Mon Jun 1 16:56:00 IST 2015

Additional Info:

QUORUM: Cluster server-side quorum regained.

Comment 6 Divya 2015-07-26 05:27:27 UTC
Sahina,

Could you review the edited doc text and sign-off.

Comment 7 Sahina Bose 2015-07-27 05:03:49 UTC
Acked

Comment 9 errata-xmlrpc 2015-07-29 05:26:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-1494.html


Note You need to log in before you can comment on or make changes to this bug.