Bug 1109723 - [Nagios] Cluster auto-config service is warning with "(null)" status information when glusterd is down on some nodes in the cluster
Summary: [Nagios] Cluster auto-config service is warning with "(null)" status informat...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: gluster-nagios-addons
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.0.3
Assignee: Ramesh N
QA Contact: Shruti Sampat
URL:
Whiteboard:
Depends On:
Blocks: 1087818
TreeView+ depends on / blocked
 
Reported: 2014-06-16 08:55 UTC by Shruti Sampat
Modified: 2019-04-16 14:12 UTC (History)
10 users (show)

Fixed In Version: nagios-server-addons-0.1.7-1.el6rhs
Doc Type: Bug Fix
Doc Text:
Previously, the Auto-config service would not work if the glusterd service was offline in any of the nodes in the Red Hat Storage trusted storage pool. With this fix, the Auto-config service works even if the glusterd service is down in some of the nodes in the trusted storage pool provided that the glusterd service is running in the node which is used as sync host in the auto-config service.
Clone Of:
Environment:
Last Closed: 2015-01-15 13:48:14 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1218023 0 None None None Never
Red Hat Product Errata RHBA-2015:0039 0 normal SHIPPED_LIVE Red Hat Storage Console 3.0 enhancement and bug fix update #3 2015-01-15 18:46:40 UTC

Description Shruti Sampat 2014-06-16 08:55:07 UTC
Description of problem:
-----------------------

When glusterd was stopped on a couple of nodes in the cluster, the cluster auto-config service was seen to be in warning status with "(null)" as the status information.

Version-Release number of selected component (if applicable):
gluster-nagios-addons-0.1.2-1.el6rhs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Setup a cluster of RHS nodes (I had 7 nodes in the cluster)
2. Monitor the cluster using Nagios.
3. Stop glusterd on a couple of nodes. Observer the cluster auto-config service.

Actual results:
The status of the service is warning and the status information reads "(null)".

Expected results:
glusterd being down on the nodes should not affect the auto-config service.

Additional info:

Comment 1 Shruti Sampat 2014-06-16 08:58:45 UTC
Similar behavior was seen when some nodes in the cluster were powered off. See BZ #1109025.

Comment 2 Ramesh N 2014-06-16 10:23:15 UTC
Fixed in Patch : http://review.gluster.org/#/c/8074/

Comment 3 Ramesh N 2014-06-16 13:49:11 UTC
Downstream patch https://code.engineering.redhat.com/gerrit/#/c/27038/

Comment 4 Shalaka 2014-06-26 05:32:53 UTC
Review and signoff the edited doc text.

Comment 5 Ramesh N 2014-06-26 05:56:42 UTC
Doc text looks good to me.

Comment 10 Shruti Sampat 2014-11-17 09:33:04 UTC
Verified as fixed in nagios-server-addons-0.1.8-1.el6rhs.noarch

When glusterd is down on some of the nodes in the cluster, the cluster auto-config service remains OK and does run successfully to sync the cluster configurations.

If glusterd is down on the node that is used to sync the cluster configurations via the discovery script, then trying to run the auto-config service will cause it to be in CRITICAL state with the following in the status information -

Failed to execute NRPE command 'discover_volume_list' in host <hostname>

This is expected as the discovery script fails to run the required commands owing to glusterd being down.

Comment 12 Pavithra 2014-12-24 09:03:32 UTC
Hi Ramesh,

Can you please review the edited doc text for technical accuracy and sign off?

Comment 13 Ramesh N 2014-12-24 11:18:50 UTC
Doc text looks good to me.

Comment 15 errata-xmlrpc 2015-01-15 13:48:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0039.html


Note You need to log in before you can comment on or make changes to this bug.