Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1136205 - [Nagios] Volume status is seen to be in warning status with status information "null" when glusterd is stopped on one RHS node. [NEEDINFO]
[Nagios] Volume status is seen to be in warning status with status informatio...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: gluster-nagios-addons (Show other bugs)
3.0
All Linux
urgent Severity urgent
: ---
: RHGS 3.0.3
Assigned To: Nishanth Thomas
Shruti Sampat
: ZStream
Depends On: 1109843
Blocks: 1087818
  Show dependency treegraph
 
Reported: 2014-09-02 03:26 EDT by Shruti Sampat
Modified: 2015-05-13 13:41 EDT (History)
10 users (show)

See Also:
Fixed In Version: nagios-server-addons-0.1.9-1.el6rhs
Doc Type: Bug Fix
Doc Text:
Previously, the Nagios plug-in sent the volume status request to the Red Hat Storage node without converting the Nagios host name to the respective IP Address. When the glusterd service was stopped on one of the nodes in a Red Hat Storage Trusted Storage Pool, the volume status displayed a warning and the status information was empty. With this fix, the error scenarios are handled properly and the system ensures that the glusterd service starts before it sends such a request to a Red Hat Storage node.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-01-15 08:49:17 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
psriniva: needinfo? (nthomas)


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0039 normal SHIPPED_LIVE Red Hat Storage Console 3.0 enhancement and bug fix update #3 2015-01-15 13:46:40 EST

  None (edit)
Description Shruti Sampat 2014-09-02 03:26:34 EDT
Description of problem:
-----------------------

When glusterd is stopped on one node in a cluster being monitored, the volume status of one of the volumes in the cluster was seen to be in warning state with "null" in the status information. One of the bricks of this volume was present on the node where glusterd was stopped. 

Occasionally the volume status service was seen to be unknown, with the status information displaying the message "Invalid host name rhs.4" (BZ #1109843)

Sometimes the volume status service was OK , with the status information reading "OK: Volume : DISTRIBUTE type - All bricks are Up"

Version-Release number of selected component (if applicable):
--------------------------------------------------------------

gluster-nagios-addons-0.1.10-2.el6rhs.x86_64
nagios-server-addons-0.1.6-1.el6rhs.noarch

How reproducible:
Saw it once.

Steps to Reproduce:

1. Setup a cluster of 4 RHS nodes and configure it to be monitored nagios server that is setup outside the RHS cluster.

2. Create a distribute volume with one brick each on 2 of the servers in the cluster.

3. Bring down glusterd on one of the nodes in the cluster, this node should have one of the bricks created above.

4. Observe the volume status service for this volume. 

Actual results:

The volume status service is seen to be flapping between OK, warning and unknown states as explained above.


Expected results:

The volume status service should not be in the warning state.

Additional info:
Comment 2 Shalaka 2014-09-20 05:12:28 EDT
Please review and sign-off the edited doc text.
Comment 5 Kanagaraj 2014-10-15 06:15:10 EDT
Q1. Why Host and Address is Eskan as Eskan is nothing but a cluster name.
ANS: In Nagios cluster is represented as dummy with name as cluster-name

Q2. For NULL issue this is the bug which means Additional Info: NULL am I correct here ?
ANS: selinux in Enforcing mode can cause this issue. Moving selinux to Permissive mode should solve this problem

Q3. How customer can stop these messages to filling up their inboxes any workaround ?
ANS: Messages/Notifications can be disabled using the nagios ui. But its worth checking the selinux status before attempting this.
Comment 6 Kanagaraj 2014-10-15 06:19:04 EDT
Pls read the first answer in Comment #5 as

ANS: In Nagios, cluster is represented as dummy host with name as cluster-name. This is done by auto-discovery script
Comment 7 Vikhyat Umrao 2014-10-15 06:41:12 EDT
Thanks Kanagaraj, for your quick response it will help a lot.
I will get back to you if any thing else is needed from customer end.
Comment 8 Kanagaraj 2014-10-20 04:01:42 EDT
In Comment #5, 

Nagios needs to be restarted "service nagios restart" after moving Selinux to permissive mode.

Vikhyat, pls ask the customer to restart if not already done.
Comment 13 Ramesh N 2014-11-05 06:30:02 EST
Moving back to assigned state as there are some scenarios which is not covered in the bug
Comment 14 Shruti Sampat 2014-11-27 06:28:19 EST
Verified as fixed in nagios-server-addons-0.1.9-1.el6rhs

Tested with RHS+Nagios in a 4 node RHS cluster in the following scenarios -

1. glusterd stopped on one of the nodes, on which one of the bricks of a volume resided. Volume status was OK with status information 

"OK: Volume : DISTRIBUTE type - All bricks are Up "

2. On a cluster with server quorum enabled, brought down glusterd causing quorum to be lost. This issue was not observed in this case too. Volume status of volume with server quorum enabled was critical with status information -

"CRITICAL: Volume : REPLICATE type - All bricks are down"

3. Stopped nrpe service on one node. Volume status shows appropriate status information in this case too.

Marking as verified.
Comment 15 Pavithra 2014-12-17 01:29:19 EST
Nishanth,
Can you please review the edited doc text for technical accuracy and sign off?
Comment 17 errata-xmlrpc 2015-01-15 08:49:17 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0039.html

Note You need to log in before you can comment on or make changes to this bug.