Bug 1122879

Summary:

nagios reports that gluster daemon is running while the node is down

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Martin Bukatovic <mbukatov>

Component:

gluster-nagios-addons

Assignee:

Nishanth Thomas <nthomas>

Status:

CLOSED CANTFIX

QA Contact:

RHS-C QE <rhsc-qe-bugs>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

rhgs-3.0

CC:

mbukatov, sankarshan, sgraf

Target Milestone:

---

Keywords:

ZStream

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-01-30 11:12:56 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
status of services on node which is down	none

Description Martin Bukatovic 2014-07-24 09:38:54 UTC

Created attachment 920487 [details]
status of services on node which is down

Description of problem
======================

Nagios reports that "Process glusterd is running" even when the node is down.

Version-Release number of selected component (if applicable)
============================================================

On nodes:

# rpm -qa | grep -i nagios                                                   
nagios-plugins-1.4.16-10.el6rhs.x86_64                                       
nagios-plugins-procs-1.4.16-10.el6rhs.x86_64                                 
nagios-common-3.5.1-6.el6.x86_64                                             
gluster-nagios-common-0.1.3-2.el6rhs.noarch                                  
gluster-nagios-addons-0.1.9-1.el6rhs.x86_64                                  
nagios-plugins-ide_smart-1.4.16-10.el6rhs.x86_64 

# rpm -qa | grep gluster                                                     
vdsm-gluster-4.14.7.2-1.el6rhs.noarch                                        
glusterfs-api-3.6.0.24-1.el6rhs.x86_64                                       
glusterfs-geo-replication-3.6.0.24-1.el6rhs.x86_64                           
gluster-nagios-common-0.1.3-2.el6rhs.noarch                                  
samba-glusterfs-3.6.9-168.4.el6rhs.x86_64                                    
glusterfs-3.6.0.24-1.el6rhs.x86_64                                           
glusterfs-fuse-3.6.0.24-1.el6rhs.x86_64                                      
glusterfs-server-3.6.0.24-1.el6rhs.x86_64                                    
glusterfs-rdma-3.6.0.24-1.el6rhs.x86_64                                      
gluster-nagios-addons-0.1.9-1.el6rhs.x86_64                                  
glusterfs-libs-3.6.0.24-1.el6rhs.x86_64                                      
glusterfs-cli-3.6.0.24-1.el6rhs.x86_64 

On the Nagios/RHSC server:

# rpm -qa | grep nagios                                                      
nagios-plugins-1.4.16-10.el6rhs.x86_64                                       
nagios-server-addons-0.1.4-2.el6rhs.noarch                                   
nagios-plugins-dummy-1.4.16-10.el6rhs.x86_64                                 
gluster-nagios-common-0.1.3-2.el6rhs.noarch                                  
nagios-3.5.1-6.el6.x86_64                                                    
nagios-plugins-nrpe-2.14-1.3.el6rhs.x86_64                                   
nagios-common-3.5.1-6.el6.x86_64                                             
nagios-plugins-ping-1.4.16-10.el6rhs.x86_64                                  
pnp4nagios-0.6.20-1.1.el6rhs.x86_64 

Steps to Reproduce
==================

1. Install RHS on 4 nodes and setup volume on them with RHSC
2. Setup nagios monitorign (as described in RHS 3.0 documentation)
3. Kill all node servers (do hard shutdown, virsh undefine or similar) 

Actual results
==============

In Nagios web interface, go to "Services" page which reports that the following
services are OK even though the node itself is down:

 * Gluster Management (Process glusterd is running)
 * NFS (OK: No gluster volume uses nfs)
 * Quota (OK: Quota not enabled)
 * SMB (OK: No gluster volume uses smb)
 * Self-Heal (Gluster Self Heal Daemon is running)

Expected results
================

All mentioned services are reported as CRITICAL.

Additional info
===============

See the attached screenshot from Nagios web interface.

Comment 1 Dusmant 2014-07-30 07:37:34 UTC

Hi Martin, can you confirm, if your Nagios server is setup on RHSC or on one of the RHS node?

Comment 2 Martin Bukatovic 2014-07-30 08:14:50 UTC

I used the following configuration:

 * 4 storage servers with gluster
 * one management server with RHSC and Nagios server

So the Nagios server is outside of trusted storage pool on the manegement server.

Comment 5 Sahina Bose 2018-01-30 11:12:56 UTC

Thank you for your report. However, this bug is being closed as it's logged against gluster-nagios monitoring for which no further new development is being undertaken.