Bug 1284874 - cluster quorum status wrongly shows ok even when one of the nodes is powered down
Summary: cluster quorum status wrongly shows ok even when one of the nodes is powered ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: nagios-server-addons
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: RHGS 3.1.2
Assignee: Sahina Bose
QA Contact: Triveni Rao
URL:
Whiteboard:
Depends On:
Blocks: 1260783
TreeView+ depends on / blocked
 
Reported: 2015-11-24 11:28 UTC by Triveni Rao
Modified: 2016-05-16 04:38 UTC (History)
6 users (show)

Fixed In Version: nagios-server-addons-0.2.3-1
Doc Type: Bug Fix
Doc Text:
Previously, Quorum service incorrectly displayed OK status even when more than 50% of the nodes were down. This was because the freshness check overwrote the quorum service. With the fix, freshness check overrides stale status only when status is not Critical. Now, Quorum service displays correct status.
Clone Of:
Environment:
Last Closed: 2016-03-01 06:12:46 UTC
Embargoed:


Attachments (Terms of Use)
power down (445.36 KB, image/png)
2015-12-15 05:04 UTC, Triveni Rao
no flags Details
power up (800.17 KB, image/png)
2015-12-15 05:05 UTC, Triveni Rao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0310 0 normal SHIPPED_LIVE Red Hat Gluster Storage Console 3.1 update 2 bug fixes 2016-03-01 10:31:05 UTC

Description Triveni Rao 2015-11-24 11:28:34 UTC
Description of problem:
cluster quorum status shows flipping after one of the nodes is powered down.


Version-Release number of selected component (if applicable):

nagios-server-addons-0.2.2-1.el6rhs.noarch
gluster-nagios-common-0.2.3-1.el6rhs.noarch

How reproducible:
always

Steps to Reproduce:
1.Install RHSC+nagios on new build of 312.
2.add RHGS nodes RHEL6.7 or RHEL7.2
3.power down one of the nodes and check in UI
4.Cluster quorum status shows quorum lost message initially but starts flapping between the states after some time.


Actual results:
Cluster quorum status shows quorum lost message initially but starts flapping between the states after some time.

Expected results:

It should show cluster quorum lost message not changing the status.

Additional info:

Comment 1 Sahina Bose 2015-11-25 06:54:47 UTC
The issue is due to the active check overriding the nagios output. The active check should only override, in case the service status is not critical - currently the existing service status check is not returning results, causing wrong output.
Fixed in patch - http://review.gluster.org/12735

Comment 3 Triveni Rao 2015-12-15 05:03:12 UTC
This bug is verified with the fixed version provided nagios-server-addons-0.2.3-1

Steps followed:
1.Install RHSC+nagios on new build of 312.
2.add RHGS nodes RHEL6.7 or RHEL7.2
3.power down one of the nodes and check in UI
4.Cluster quorum status shows quorum lost message properly and no flapping.
5.Power up the node and checked the UI, services came back to normal states.

attached are the 2 screen shots taken after power down and power up.

Version:
gluster-nagios-common-0.2.3-1.el6rhs.noarch
nagios-server-addons-0.2.3-1.el6rhs.noarch

Comment 4 Triveni Rao 2015-12-15 05:04:31 UTC
Created attachment 1105835 [details]
power down

Comment 5 Triveni Rao 2015-12-15 05:05:14 UTC
Created attachment 1105836 [details]
power up

Comment 6 Divya 2016-01-28 09:28:18 UTC
Sahina,

Could you review and sign-off the edited doc text.

Comment 7 Sahina Bose 2016-01-29 10:51:45 UTC
Looks good to me

Comment 9 errata-xmlrpc 2016-03-01 06:12:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0310.html


Note You need to log in before you can comment on or make changes to this bug.