Bug 1284874

Summary: cluster quorum status wrongly shows ok even when one of the nodes is powered down
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Triveni Rao <trao>
Component: nagios-server-addonsAssignee: Sahina Bose <sabose>
Status: CLOSED ERRATA QA Contact: Triveni Rao <trao>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.1CC: asrivast, divya, knarra, sabose, sankarshan, sashinde
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: nagios-server-addons-0.2.3-1 Doc Type: Bug Fix
Doc Text:
Previously, Quorum service incorrectly displayed OK status even when more than 50% of the nodes were down. This was because the freshness check overwrote the quorum service. With the fix, freshness check overrides stale status only when status is not Critical. Now, Quorum service displays correct status.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-01 06:12:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1260783    
Attachments:
Description Flags
power down
none
power up none

Description Triveni Rao 2015-11-24 11:28:34 UTC
Description of problem:
cluster quorum status shows flipping after one of the nodes is powered down.


Version-Release number of selected component (if applicable):

nagios-server-addons-0.2.2-1.el6rhs.noarch
gluster-nagios-common-0.2.3-1.el6rhs.noarch

How reproducible:
always

Steps to Reproduce:
1.Install RHSC+nagios on new build of 312.
2.add RHGS nodes RHEL6.7 or RHEL7.2
3.power down one of the nodes and check in UI
4.Cluster quorum status shows quorum lost message initially but starts flapping between the states after some time.


Actual results:
Cluster quorum status shows quorum lost message initially but starts flapping between the states after some time.

Expected results:

It should show cluster quorum lost message not changing the status.

Additional info:

Comment 1 Sahina Bose 2015-11-25 06:54:47 UTC
The issue is due to the active check overriding the nagios output. The active check should only override, in case the service status is not critical - currently the existing service status check is not returning results, causing wrong output.
Fixed in patch - http://review.gluster.org/12735

Comment 3 Triveni Rao 2015-12-15 05:03:12 UTC
This bug is verified with the fixed version provided nagios-server-addons-0.2.3-1

Steps followed:
1.Install RHSC+nagios on new build of 312.
2.add RHGS nodes RHEL6.7 or RHEL7.2
3.power down one of the nodes and check in UI
4.Cluster quorum status shows quorum lost message properly and no flapping.
5.Power up the node and checked the UI, services came back to normal states.

attached are the 2 screen shots taken after power down and power up.

Version:
gluster-nagios-common-0.2.3-1.el6rhs.noarch
nagios-server-addons-0.2.3-1.el6rhs.noarch

Comment 4 Triveni Rao 2015-12-15 05:04:31 UTC
Created attachment 1105835 [details]
power down

Comment 5 Triveni Rao 2015-12-15 05:05:14 UTC
Created attachment 1105836 [details]
power up

Comment 6 Divya 2016-01-28 09:28:18 UTC
Sahina,

Could you review and sign-off the edited doc text.

Comment 7 Sahina Bose 2016-01-29 10:51:45 UTC
Looks good to me

Comment 9 errata-xmlrpc 2016-03-01 06:12:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0310.html