1122879 – nagios reports that gluster daemon is running while the node is down

Bug 1122879 - nagios reports that gluster daemon is running while the node is down

Summary: nagios reports that gluster daemon is running while the node is down

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-nagios-addons
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Nishanth Thomas
QA Contact:	RHS-C QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-07-24 09:38 UTC by Martin Bukatovic
Modified:	2018-01-30 11:12 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-01-30 11:12:56 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
status of services on node which is down (59.66 KB, image/png) 2014-07-24 09:38 UTC, Martin Bukatovic	no flags	Details
View All

Description Martin Bukatovic 2014-07-24 09:38:54 UTC

Created attachment 920487 [details]
status of services on node which is down

Description of problem
======================

Nagios reports that "Process glusterd is running" even when the node is down.

Version-Release number of selected component (if applicable)
============================================================

On nodes:

# rpm -qa | grep -i nagios                                                   
nagios-plugins-1.4.16-10.el6rhs.x86_64                                       
nagios-plugins-procs-1.4.16-10.el6rhs.x86_64                                 
nagios-common-3.5.1-6.el6.x86_64                                             
gluster-nagios-common-0.1.3-2.el6rhs.noarch                                  
gluster-nagios-addons-0.1.9-1.el6rhs.x86_64                                  
nagios-plugins-ide_smart-1.4.16-10.el6rhs.x86_64 

# rpm -qa | grep gluster                                                     
vdsm-gluster-4.14.7.2-1.el6rhs.noarch                                        
glusterfs-api-3.6.0.24-1.el6rhs.x86_64                                       
glusterfs-geo-replication-3.6.0.24-1.el6rhs.x86_64                           
gluster-nagios-common-0.1.3-2.el6rhs.noarch                                  
samba-glusterfs-3.6.9-168.4.el6rhs.x86_64                                    
glusterfs-3.6.0.24-1.el6rhs.x86_64                                           
glusterfs-fuse-3.6.0.24-1.el6rhs.x86_64                                      
glusterfs-server-3.6.0.24-1.el6rhs.x86_64                                    
glusterfs-rdma-3.6.0.24-1.el6rhs.x86_64                                      
gluster-nagios-addons-0.1.9-1.el6rhs.x86_64                                  
glusterfs-libs-3.6.0.24-1.el6rhs.x86_64                                      
glusterfs-cli-3.6.0.24-1.el6rhs.x86_64 

On the Nagios/RHSC server:

# rpm -qa | grep nagios                                                      
nagios-plugins-1.4.16-10.el6rhs.x86_64                                       
nagios-server-addons-0.1.4-2.el6rhs.noarch                                   
nagios-plugins-dummy-1.4.16-10.el6rhs.x86_64                                 
gluster-nagios-common-0.1.3-2.el6rhs.noarch                                  
nagios-3.5.1-6.el6.x86_64                                                    
nagios-plugins-nrpe-2.14-1.3.el6rhs.x86_64                                   
nagios-common-3.5.1-6.el6.x86_64                                             
nagios-plugins-ping-1.4.16-10.el6rhs.x86_64                                  
pnp4nagios-0.6.20-1.1.el6rhs.x86_64 

Steps to Reproduce
==================

1. Install RHS on 4 nodes and setup volume on them with RHSC
2. Setup nagios monitorign (as described in RHS 3.0 documentation)
3. Kill all node servers (do hard shutdown, virsh undefine or similar) 

Actual results
==============

In Nagios web interface, go to "Services" page which reports that the following
services are OK even though the node itself is down:

 * Gluster Management (Process glusterd is running)
 * NFS (OK: No gluster volume uses nfs)
 * Quota (OK: Quota not enabled)
 * SMB (OK: No gluster volume uses smb)
 * Self-Heal (Gluster Self Heal Daemon is running)

Expected results
================

All mentioned services are reported as CRITICAL.

Additional info
===============

See the attached screenshot from Nagios web interface.

Comment 1 Dusmant 2014-07-30 07:37:34 UTC

Hi Martin, can you confirm, if your Nagios server is setup on RHSC or on one of the RHS node?

Comment 2 Martin Bukatovic 2014-07-30 08:14:50 UTC

I used the following configuration:

 * 4 storage servers with gluster
 * one management server with RHSC and Nagios server

So the Nagios server is outside of trusted storage pool on the manegement server.

Comment 5 Sahina Bose 2018-01-30 11:12:56 UTC

Thank you for your report. However, this bug is being closed as it's logged against gluster-nagios monitoring for which no further new development is being undertaken.

Note You need to log in before you can comment on or make changes to this bug.