Bug 1109744

Summary: [Nagios] notifications are not sent when quorum is lost for multiple volumes, one after the other
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Shruti Sampat <ssampat>
Component: gluster-nagios-addonsAssignee: Sahina Bose <sabose>
Status: CLOSED ERRATA QA Contact: RamaKasturi <knarra>
Severity: high Docs Contact:
Priority: medium    
Version: rhgs-3.0CC: asriram, asrivast, divya, dpati, kmayilsa, knarra, rhsc-qe-bugs, sabose
Target Milestone: ---   
Target Release: RHGS 3.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: gluster-nagios-addons-0.2.0-1 Doc Type: Bug Fix
Doc Text:
Previously, there was a misleading notification message that quorum is lost for only one volume even if multiple volumes have lost quorum. With this fix, the notification message is corrected to inform the user that the quorum is lost on the entire cluster.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-29 05:26:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1087818, 1202842    

Description Shruti Sampat 2014-06-16 10:00:11 UTC
Description of problem:
-------------------------

Consider a cluster having two volume, say vol1 and vol2, both with server-side quorum enabled.

When quorum is lost for vol1, the status of the cluster-quorum service changes to critical, and the status information reads - 

QUORUM: Server quorum lost for volume vol1. Stopping local bricks.

A notification is sent via e-mail and SNMP traps.

If quorum is lost for vol2, later, (before it was regained for vol1), the status of the service would remain critical. The status information would read - 

QUORUM: Server quorum lost for volume vol2. Stopping local bricks.

Since the status of the quorum service did not change, notifications will not be sent. 

Version-Release number of selected component (if applicable):
gluster-nagios-addons-0.1.2-1.el6rhs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create two distributed-replicate volumes, vol1 and vol2.
2. Cause quorum to be lost for vol1, observe that the status of the service changes to critical and that notifications are sent.
3. Cause quorum to be lost for vol2.

Actual results:
The status of the service remains critical, hence status change is not involved and notifications are not sent.

Expected results:
Notifications should be sent whenever quorum is lost/regained for any volume in the cluster.

Additional info:

Comment 1 Shalaka 2014-06-18 05:58:27 UTC
Please add doc text for the known issue

Comment 2 Shalaka 2014-06-24 16:55:01 UTC
Please review and signoff edited doc text.

Comment 3 Sahina Bose 2014-09-04 05:16:24 UTC
Looks good to me

Comment 4 Sahina Bose 2015-02-09 07:28:50 UTC
As per redesign, notification is sent only once as Quorum is cluster level service

Comment 5 RamaKasturi 2015-06-01 11:33:51 UTC
Verified and works fine with build gluster-nagios-addons-0.2.0-1.

As per comment 7, notification is sent only once as Quorum is cluster level service. Below is the way notification comes.

** PROBLEM Service Alert: cluster1/Cluster - Quorum is CRITICAL **

***** Nagios *****

Notification Type: PROBLEM

Service: Cluster - Quorum
Host: cluster1
Address: cluster1
State: CRITICAL

Date/Time: Mon Jun 1 16:46:20 IST 2015

Additional Info:

QUORUM: Cluster server-side quorum lost.

When it is regained, email notification is sent as :

** RECOVERY Service Alert: cluster1/Cluster - Quorum is OK **

***** Nagios *****

Notification Type: RECOVERY

Service: Cluster - Quorum
Host: cluster1
Address: cluster1
State: OK

Date/Time: Mon Jun 1 16:56:00 IST 2015

Additional Info:

QUORUM: Cluster server-side quorum regained.

Comment 6 Divya 2015-07-26 05:27:27 UTC
Sahina,

Could you review the edited doc text and sign-off.

Comment 7 Sahina Bose 2015-07-27 05:03:49 UTC
Acked

Comment 9 errata-xmlrpc 2015-07-29 05:26:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-1494.html