Bug 1166602

Summary: [New] - Status information needs to be improved when glusterd goes down in all the nodes in the cluster.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: RamaKasturi <knarra>
Component: nagios-server-addonsAssignee: Timothy Asir <tjeyasin>
Status: CLOSED ERRATA QA Contact: RamaKasturi <knarra>
Severity: medium Docs Contact:
Priority: high    
Version: rhgs-3.0CC: divya, dpati, rhsc-qe-bugs, sabose, ssampat, tjeyasin
Target Milestone: ---   
Target Release: RHGS 3.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nnagios-server-addons-0.2.0-1.el6rhs.noarch Doc Type: Bug Fix
Doc Text:
Previously, when glusterd was down on all the nodes in the cluster, the status information for volume status, self-heal, geo-rep status were improperly displayed as "temporary error" instead of "no hosts found in cluster" or "hosts are not up". As a consequence, this confused the user to think that there are some issues with volume status, self-heal, Geo-replication and that needs to be fixed. With this fix, when the glusterd is down in all the nodes of the cluster, Volume Geo Replication ,Volume status,Volume Utilization status will be displayed as "UNKNOWN" with status information "UNKNOWN: NO hosts(with state UP) found in the cluster". The brick status will be displayed as "UNKNOWN" with status information as "UNKNOWN: Status could not be determined as glusterd is not running".
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-29 05:26:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1202842    

Description RamaKasturi 2014-11-21 10:43:00 UTC
Description of problem:
When glusterd goes down in all the nodes in the cluster, volume status, self-heal, geo-rep displays status as 'UNKNOW' with status information ' UNKNOWN: temporary error'. 

Status information needs to be improved as it is not a temporary error, since glusterd went down in all the nodes.

Version-Release number of selected component (if applicable):
nagios-server-addons-0.1.9-1.el6rhs.noarch

How reproducible:
Always

Steps to Reproduce:
1. Install nagios on RHS node.
2. Run discovery.py and start monitoring the nodes.
3. stop glusterd in all the nodes by running the command "service glusterd stop".

Actual results:
volume status, volume Self-heal, Volume Geo-Replication gives status as 'UNKNOWN' with status Information as "UNKOWN: temporary error".

Expected results:
Status information for these services needs to be improved.

Additional info:

Comment 2 Shruti Sampat 2014-11-27 07:13:24 UTC
Similar behavior is seen for Volume Quota services too.

Comment 4 Sahina Bose 2015-02-09 07:04:45 UTC
Enhance the message to suggest to user that issues may be with glusterd. Change temporary error - Glusterd cannot be queried.

Comment 5 Timothy Asir 2015-04-28 11:48:38 UTC
Patch sent to upstream for review: http://review.gluster.org/10421

Comment 7 RamaKasturi 2015-05-28 12:52:55 UTC
Please put the FIV for this bug

Comment 8 RamaKasturi 2015-05-29 08:48:17 UTC
Verified and works with build nagios-server-addons-0.2.0-1.el6rhs.noarch.


When glusterd is down in all the nodes of the cluster, Volume Geo Replication ,Volume status,Volume Utilization status is shown as "UNKNOWN" with status information "UNKNOWN: NO hosts(with state UP) found in the cluster".

Brick status is shown as "UNKNOWN" with status information as "UNKNOWN: Status could not be determined as glusterd is not running"

Comment 9 Divya 2015-07-26 05:32:19 UTC
Tim,

Kindly review and sign-off the edited doc text.

Comment 11 errata-xmlrpc 2015-07-29 05:26:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-1494.html

Comment 12 Red Hat Bugzilla 2023-09-14 02:51:16 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days