1109723 – [Nagios] Cluster auto-config service is warning with "(null)" status information when glusterd is down on some nodes in the cluster

Bug 1109723 - [Nagios] Cluster auto-config service is warning with "(null)" status information when glusterd is down on some nodes in the cluster

Summary: [Nagios] Cluster auto-config service is warning with "(null)" status informat...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-nagios-addons
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.0.3
Assignee:	Ramesh N
QA Contact:	Shruti Sampat
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1087818
TreeView+	depends on / blocked

Reported:	2014-06-16 08:55 UTC by Shruti Sampat
Modified:	2019-04-16 14:12 UTC (History)
CC List:	10 users (show)
Fixed In Version:	nagios-server-addons-0.1.7-1.el6rhs
Doc Type:	Bug Fix
Doc Text:	Previously, the Auto-config service would not work if the glusterd service was offline in any of the nodes in the Red Hat Storage trusted storage pool. With this fix, the Auto-config service works even if the glusterd service is down in some of the nodes in the trusted storage pool provided that the glusterd service is running in the node which is used as sync host in the auto-config service.
Clone Of:
Environment:
Last Closed:	2015-01-15 13:48:14 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	1218023	0	None	None	None	Never
Red Hat Product Errata	RHBA-2015:0039	0	normal	SHIPPED_LIVE	Red Hat Storage Console 3.0 enhancement and bug fix update #3	2015-01-15 18:46:40 UTC

Description Shruti Sampat 2014-06-16 08:55:07 UTC

Description of problem:
-----------------------

When glusterd was stopped on a couple of nodes in the cluster, the cluster auto-config service was seen to be in warning status with "(null)" as the status information.

Version-Release number of selected component (if applicable):
gluster-nagios-addons-0.1.2-1.el6rhs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Setup a cluster of RHS nodes (I had 7 nodes in the cluster)
2. Monitor the cluster using Nagios.
3. Stop glusterd on a couple of nodes. Observer the cluster auto-config service.

Actual results:
The status of the service is warning and the status information reads "(null)".

Expected results:
glusterd being down on the nodes should not affect the auto-config service.

Additional info:

Comment 1 Shruti Sampat 2014-06-16 08:58:45 UTC

Similar behavior was seen when some nodes in the cluster were powered off. See BZ #1109025.

Comment 2 Ramesh N 2014-06-16 10:23:15 UTC

Fixed in Patch : http://review.gluster.org/#/c/8074/

Comment 3 Ramesh N 2014-06-16 13:49:11 UTC

Downstream patch https://code.engineering.redhat.com/gerrit/#/c/27038/

Comment 4 Shalaka 2014-06-26 05:32:53 UTC

Review and signoff the edited doc text.

Comment 5 Ramesh N 2014-06-26 05:56:42 UTC

Doc text looks good to me.

Comment 10 Shruti Sampat 2014-11-17 09:33:04 UTC

Verified as fixed in nagios-server-addons-0.1.8-1.el6rhs.noarch

When glusterd is down on some of the nodes in the cluster, the cluster auto-config service remains OK and does run successfully to sync the cluster configurations.

If glusterd is down on the node that is used to sync the cluster configurations via the discovery script, then trying to run the auto-config service will cause it to be in CRITICAL state with the following in the status information -

Failed to execute NRPE command 'discover_volume_list' in host <hostname>

This is expected as the discovery script fails to run the required commands owing to glusterd being down.

Comment 12 Pavithra 2014-12-24 09:03:32 UTC

Hi Ramesh,

Can you please review the edited doc text for technical accuracy and sign off?

Comment 13 Ramesh N 2014-12-24 11:18:50 UTC

Doc text looks good to me.

Comment 15 errata-xmlrpc 2015-01-15 13:48:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0039.html

Note You need to log in before you can comment on or make changes to this bug.