Bug 1284874

Summary:

cluster quorum status wrongly shows ok even when one of the nodes is powered down

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Triveni Rao <trao>

Component:

nagios-server-addons

Assignee:

Sahina Bose <sabose>

Status:

CLOSED ERRATA

QA Contact:

Triveni Rao <trao>

Severity:

high

Docs Contact:

Priority:

high

Version:

rhgs-3.1

CC:

asrivast, divya, knarra, sabose, sankarshan, sashinde

Target Milestone:

---

Keywords:

ZStream

Target Release:

RHGS 3.1.2

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

nagios-server-addons-0.2.3-1

Doc Type:

Bug Fix

Doc Text:

Previously, Quorum service incorrectly displayed OK status even when more than 50% of the nodes were down. This was because the freshness check overwrote the quorum service. With the fix, freshness check overrides stale status only when status is not Critical. Now, Quorum service displays correct status.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-03-01 06:12:46 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1260783

Attachments:

Description	Flags
power down	none
power up	none

Description Triveni Rao 2015-11-24 11:28:34 UTC

Description of problem:
cluster quorum status shows flipping after one of the nodes is powered down.


Version-Release number of selected component (if applicable):

nagios-server-addons-0.2.2-1.el6rhs.noarch
gluster-nagios-common-0.2.3-1.el6rhs.noarch

How reproducible:
always

Steps to Reproduce:
1.Install RHSC+nagios on new build of 312.
2.add RHGS nodes RHEL6.7 or RHEL7.2
3.power down one of the nodes and check in UI
4.Cluster quorum status shows quorum lost message initially but starts flapping between the states after some time.


Actual results:
Cluster quorum status shows quorum lost message initially but starts flapping between the states after some time.

Expected results:

It should show cluster quorum lost message not changing the status.

Additional info:

Comment 1 Sahina Bose 2015-11-25 06:54:47 UTC

The issue is due to the active check overriding the nagios output. The active check should only override, in case the service status is not critical - currently the existing service status check is not returning results, causing wrong output.
Fixed in patch - http://review.gluster.org/12735

Comment 3 Triveni Rao 2015-12-15 05:03:12 UTC

This bug is verified with the fixed version provided nagios-server-addons-0.2.3-1

Steps followed:
1.Install RHSC+nagios on new build of 312.
2.add RHGS nodes RHEL6.7 or RHEL7.2
3.power down one of the nodes and check in UI
4.Cluster quorum status shows quorum lost message properly and no flapping.
5.Power up the node and checked the UI, services came back to normal states.

attached are the 2 screen shots taken after power down and power up.

Version:
gluster-nagios-common-0.2.3-1.el6rhs.noarch
nagios-server-addons-0.2.3-1.el6rhs.noarch

Comment 4 Triveni Rao 2015-12-15 05:04:31 UTC

Created attachment 1105835 [details]
power down

Comment 5 Triveni Rao 2015-12-15 05:05:14 UTC

Created attachment 1105836 [details]
power up

Comment 6 Divya 2016-01-28 09:28:18 UTC

Sahina,

Could you review and sign-off the edited doc text.

Comment 7 Sahina Bose 2016-01-29 10:51:45 UTC

Looks good to me

Comment 9 errata-xmlrpc 2016-03-01 06:12:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0310.html