Bug 1236290

Summary: [New] - Cluster-Quorum status does not change when one of the node in the cluster is powered off
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: RamaKasturi <knarra>
Component: nagios-server-addonsAssignee: Sahina Bose <sabose>
Status: CLOSED ERRATA QA Contact: RamaKasturi <knarra>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.1CC: asriram, asrivast, bmohanra, dpati, rnachimu, sabose, vagarwal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.1Flags: knarra: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: gluster-nagios-addons-0.2.5-1 Doc Type: Bug Fix
Doc Text:
Previously, the nodes were updating the older service even after the Cluster Quorum service was renamed. Due to this, the Cluster Quorum service status in Nagios was not reflected. With this fix, the plugins on the nodes are updated so that the notifications are pushed to the new service and the Cluster Quorum status is reflected correctly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-05 09:21:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1216951, 1251815    

Description RamaKasturi 2015-06-27 11:28:27 UTC
Description of problem:
Cluster-Quorum status remains as "ok" with status information "Server quorum turned on for vol4,vol1 " when one of the node in the cluster is powered off.

Version-Release number of selected component (if applicable):
nagios-server-addons-0.2.1-2.el6rhs.noarch

How reproducible:
Always

Steps to Reproduce:
1. Add two nodes in the cluster.
2. set quorum on any one of the volume by running the command "gluster volume set <vol-name> cluster.server-quorum-type server
3. Now run cluster auto-config.
4. Now power off one of the node.

Actual results:
status remains "OK" with status information "Server quorum turned on for <vol_names>"

Expected results:
Cluster-Quorum status should change the status to CRITICAL with status information " QUORUM: Cluster server-side quorum lost."

Additional info:

Comment 2 RamaKasturi 2015-06-27 11:29:32 UTC
Seeing the following in nagios.log.

[1435403539] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;cluster1;Cluster - Quorum;2;QUORUM: Cluster server-side quorum lost.
[1435403539] Warning:  Passive check result was received for service 'Cluster - Quorum' on host 'cluster1', but the service could not be found!

Comment 4 Sahina Bose 2015-06-30 11:13:25 UTC
Issue due to change in service name in Nagios, and ncsa was sending alert to older service. The service name was changed to ensure that the command definition was modified on update, as new freshness check was introduced.

http://review.gluster.org/#/c/11465 - posted to fix this

Comment 8 monti lawrence 2015-07-23 14:34:54 UTC
Doc text is edited. Please sign off to be included in Known Issues.

Comment 9 Sahina Bose 2015-07-24 12:00:39 UTC
minor edit.

Comment 13 RamaKasturi 2015-08-28 11:28:39 UTC
Hi Sahina,

   After enabling server quorum on a volume, i powered off one node in the cluster and now my quorum status goes to UNKNOWN with status information "Server quorum not turned on for any volume". Can you please check this?

Thanks
kasturi

Comment 14 RamaKasturi 2015-09-01 05:40:58 UTC
Verified on RHS+Nagios deployment and works fine with build gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64.

When one of the nodes in the cluster is powered off, Cluster - Quorum Status is marked as CRITICAL with status information "QUORUM: Cluster server-side quorum lost".

Comment 15 Bhavana 2015-09-21 08:24:26 UTC
Hi Sahina,

The doc text is updated. Please review it and share your technical review comments. If it looks ok, then sign-off on the same.

Comment 17 errata-xmlrpc 2015-10-05 09:21:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1848.html