Bug 1122879

Summary: nagios reports that gluster daemon is running while the node is down
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Martin Bukatovic <mbukatov>
Component: gluster-nagios-addonsAssignee: Nishanth Thomas <nthomas>
Status: CLOSED CANTFIX QA Contact: RHS-C QE <rhsc-qe-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: rhgs-3.0CC: mbukatov, sankarshan, sgraf
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-30 11:12:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
status of services on node which is down none

Description Martin Bukatovic 2014-07-24 09:38:54 UTC
Created attachment 920487 [details]
status of services on node which is down

Description of problem
======================

Nagios reports that "Process glusterd is running" even when the node is down.

Version-Release number of selected component (if applicable)
============================================================

On nodes:

# rpm -qa | grep -i nagios                                                   
nagios-plugins-1.4.16-10.el6rhs.x86_64                                       
nagios-plugins-procs-1.4.16-10.el6rhs.x86_64                                 
nagios-common-3.5.1-6.el6.x86_64                                             
gluster-nagios-common-0.1.3-2.el6rhs.noarch                                  
gluster-nagios-addons-0.1.9-1.el6rhs.x86_64                                  
nagios-plugins-ide_smart-1.4.16-10.el6rhs.x86_64 

# rpm -qa | grep gluster                                                     
vdsm-gluster-4.14.7.2-1.el6rhs.noarch                                        
glusterfs-api-3.6.0.24-1.el6rhs.x86_64                                       
glusterfs-geo-replication-3.6.0.24-1.el6rhs.x86_64                           
gluster-nagios-common-0.1.3-2.el6rhs.noarch                                  
samba-glusterfs-3.6.9-168.4.el6rhs.x86_64                                    
glusterfs-3.6.0.24-1.el6rhs.x86_64                                           
glusterfs-fuse-3.6.0.24-1.el6rhs.x86_64                                      
glusterfs-server-3.6.0.24-1.el6rhs.x86_64                                    
glusterfs-rdma-3.6.0.24-1.el6rhs.x86_64                                      
gluster-nagios-addons-0.1.9-1.el6rhs.x86_64                                  
glusterfs-libs-3.6.0.24-1.el6rhs.x86_64                                      
glusterfs-cli-3.6.0.24-1.el6rhs.x86_64 

On the Nagios/RHSC server:

# rpm -qa | grep nagios                                                      
nagios-plugins-1.4.16-10.el6rhs.x86_64                                       
nagios-server-addons-0.1.4-2.el6rhs.noarch                                   
nagios-plugins-dummy-1.4.16-10.el6rhs.x86_64                                 
gluster-nagios-common-0.1.3-2.el6rhs.noarch                                  
nagios-3.5.1-6.el6.x86_64                                                    
nagios-plugins-nrpe-2.14-1.3.el6rhs.x86_64                                   
nagios-common-3.5.1-6.el6.x86_64                                             
nagios-plugins-ping-1.4.16-10.el6rhs.x86_64                                  
pnp4nagios-0.6.20-1.1.el6rhs.x86_64 

Steps to Reproduce
==================

1. Install RHS on 4 nodes and setup volume on them with RHSC
2. Setup nagios monitorign (as described in RHS 3.0 documentation)
3. Kill all node servers (do hard shutdown, virsh undefine or similar) 

Actual results
==============

In Nagios web interface, go to "Services" page which reports that the following
services are OK even though the node itself is down:

 * Gluster Management (Process glusterd is running)
 * NFS (OK: No gluster volume uses nfs)
 * Quota (OK: Quota not enabled)
 * SMB (OK: No gluster volume uses smb)
 * Self-Heal (Gluster Self Heal Daemon is running)

Expected results
================

All mentioned services are reported as CRITICAL.

Additional info
===============

See the attached screenshot from Nagios web interface.

Comment 1 Dusmant 2014-07-30 07:37:34 UTC
Hi Martin, can you confirm, if your Nagios server is setup on RHSC or on one of the RHS node?

Comment 2 Martin Bukatovic 2014-07-30 08:14:50 UTC
I used the following configuration:

 * 4 storage servers with gluster
 * one management server with RHSC and Nagios server

So the Nagios server is outside of trusted storage pool on the manegement server.

Comment 5 Sahina Bose 2018-01-30 11:12:56 UTC
Thank you for your report. However, this bug is being closed as it's logged against gluster-nagios monitoring for which no further new development is being undertaken.