Bug 1236997 - [New] - Volume status is shown incorrect when glusterd is down on one of the node.
Summary: [New] - Volume status is shown incorrect when glusterd is down on one of the ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: nagios-server-addons
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Ramesh N
QA Contact: RHS-C QE
URL:
Whiteboard:
Depends On:
Blocks: 1216951
TreeView+ depends on / blocked
 
Reported: 2015-06-30 07:06 UTC by RamaKasturi
Modified: 2016-04-13 06:30 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Bricks with an 'UNKNOWN' status are not considered as DOWN when volume status is calculated. When the glusterd service is down in one node, brick status changes to 'UNKNOWN' while the volume status remains 'OK'. You may think the volume is up and running when bricks may not be running. You are not able to detect the correct status. Workaround: You are notified when glusterd is down and when bricks are in an 'UNKNOWN' state.
Clone Of:
Environment:
Last Closed: 2016-04-13 06:30:11 UTC
Embargoed:


Attachments (Terms of Use)

Description RamaKasturi 2015-06-30 07:06:10 UTC
Description of problem:
When glusterd is down in one of the node, volume status should be changed accordingly. For example create a distribute volume on two nodes say node1 and node2. Now stop glusterd in one of the node. Distribute volume status should be shown critical since one of the brick of the volume resides in node2.

But in nagios UI, when glusterd is down volume status is still maked as "OK".

Version-Release number of selected component (if applicable):
nagios-server-addons-0.2.1-3.el6rhs.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create a cluster with two nodes and monitor them using nagios
2. Now bring down glusterd in one of the node.
3.

Actual results:
Volume status always shows "OK" with status information "all bricks are up"

Expected results:
Volume status for distribute volume should be marked critical with status information one of the brick is down.

Additional info:

Comment 2 Sahina Bose 2015-06-30 11:57:49 UTC
I think this is expected behaviour. Even if glusterd is down on one of the nodes, the bricks are still online and accessible. Till the time, the bricks are marked down, the volume is not marked CRITICAL.

Is this a regression, because I don't see any change to this plugin behaviour. Removing devel_ack till confirmed.

Comment 6 RamaKasturi 2015-07-21 07:15:40 UTC
Brick status is marked as UNKNOWN in the nagios UI when glusterd in that node goes down. IMO, volume status should also be changed.

Comment 7 monti lawrence 2015-07-22 20:56:41 UTC
Doc text is edited. Please sign off to be included in Known Issues.

Comment 8 Ramesh N 2015-07-24 11:04:45 UTC
doc text looks good.

Comment 9 Sahina Bose 2016-04-13 06:30:11 UTC
Nagios monitors brick status and glusterd status separately and sends notifications if these service are down. For this particular case, volume status cannot be correctly determined - hence even a change in volume status could be interpreted incorrectly.

Closing this - please re-open if you can suggest the volume status that it needs to move to.


Note You need to log in before you can comment on or make changes to this bug.