1236997 – [New] - Volume status is shown incorrect when glusterd is down on one of the node.

Bug 1236997 - [New] - Volume status is shown incorrect when glusterd is down on one of the node.

Summary: [New] - Volume status is shown incorrect when glusterd is down on one of the ...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nagios-server-addons
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Ramesh N
QA Contact:	RHS-C QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1216951
TreeView+	depends on / blocked

Reported:	2015-06-30 07:06 UTC by RamaKasturi
Modified:	2016-04-13 06:30 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	Bricks with an 'UNKNOWN' status are not considered as DOWN when volume status is calculated. When the glusterd service is down in one node, brick status changes to 'UNKNOWN' while the volume status remains 'OK'. You may think the volume is up and running when bricks may not be running. You are not able to detect the correct status. Workaround: You are notified when glusterd is down and when bricks are in an 'UNKNOWN' state.
Clone Of:
Environment:
Last Closed:	2016-04-13 06:30:11 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description RamaKasturi 2015-06-30 07:06:10 UTC

Description of problem:
When glusterd is down in one of the node, volume status should be changed accordingly. For example create a distribute volume on two nodes say node1 and node2. Now stop glusterd in one of the node. Distribute volume status should be shown critical since one of the brick of the volume resides in node2.

But in nagios UI, when glusterd is down volume status is still maked as "OK".

Version-Release number of selected component (if applicable):
nagios-server-addons-0.2.1-3.el6rhs.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create a cluster with two nodes and monitor them using nagios
2. Now bring down glusterd in one of the node.
3.

Actual results:
Volume status always shows "OK" with status information "all bricks are up"

Expected results:
Volume status for distribute volume should be marked critical with status information one of the brick is down.

Additional info:

Comment 2 Sahina Bose 2015-06-30 11:57:49 UTC

I think this is expected behaviour. Even if glusterd is down on one of the nodes, the bricks are still online and accessible. Till the time, the bricks are marked down, the volume is not marked CRITICAL.

Is this a regression, because I don't see any change to this plugin behaviour. Removing devel_ack till confirmed.

Comment 6 RamaKasturi 2015-07-21 07:15:40 UTC

Brick status is marked as UNKNOWN in the nagios UI when glusterd in that node goes down. IMO, volume status should also be changed.

Comment 7 monti lawrence 2015-07-22 20:56:41 UTC

Doc text is edited. Please sign off to be included in Known Issues.

Comment 8 Ramesh N 2015-07-24 11:04:45 UTC

doc text looks good.

Comment 9 Sahina Bose 2016-04-13 06:30:11 UTC

Nagios monitors brick status and glusterd status separately and sends notifications if these service are down. For this particular case, volume status cannot be correctly determined - hence even a change in volume status could be interpreted incorrectly.

Closing this - please re-open if you can suggest the volume status that it needs to move to.

Note You need to log in before you can comment on or make changes to this bug.