Bug 1128007
Summary: | [Nagios] - When all the nodes in a cluster are down, cluster status shows 'UP' with status information as 'OK:None of the volumes are in critical state' | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | RamaKasturi <knarra> | ||||||
Component: | nagios-server-addons | Assignee: | Ramesh N <rnachimu> | ||||||
Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | rhgs-3.0 | CC: | asrivast, dpati, psriniva, rhsc-qe-bugs, rnachimu | ||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||
Target Release: | RHGS 3.0.3 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | nagios-server-addons-0.1.9-1.el6rhs | Doc Type: | Bug Fix | ||||||
Doc Text: |
Previously, when all the nodes in a Red Hat Storage trusted storage pool were offline, all the volumes were moved to an "UNKNOWN" state and the cluster status was displayed as UP with message 'OK:None of the volumes are in critical state'. With this fix, changes are made to consider all the status of volumes while computing the status of the Red Hat Storage trusted storage pool.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2015-01-15 13:49:12 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
RamaKasturi
2014-08-08 05:59:42 UTC
Created attachment 925083 [details]
Screenshot when all the nodes in the cluster are down.
Cluster state is an aggregation of states of volumes inside the cluster As per the current code, Cluster state will be CRITICAL - If all volumes in the cluster in CRITICAL state WARNING - If some volumes in CRITICAL state and the others in NON-CRITICAL state(OK, WARNING, UNKNOWN, PENDING) OK - If all the volumes in NON-CRITICAL state (OK, WARNING, UNKNOWN, PENDING) Fixing this bug would require considering all possible states of the volumes and based on that cluster state needs to be determined. May be something like following, CRITICAL - If all volumes CRITICAL state WARNING - If some volumes in CRITICAL state or all/some volumes in WARNING state UNKNOWN - If all the volumes in UNKNOWN state PENDING - If all the volumes in PENDING state OK - If all the volumes are in OK state This change will affect the existing flow and will introduce newer flows. PENDING state is something internal to Nagios and not possible to change from outside. So in Comment 2, it is not possible to have cluster in PENDING state Further analysis from Kanagaraj -------------------------------- Found a nagios which talks about the mappings. http://nagios.sourceforge.net/docs/3_0/hostchecks.html Plugin Result Preliminary Host State OK UP WARNING UP or DOWN* UNKNOWN DOWN CRITICAL DOWN By going this way, cluster can be marked as DOWN if all the volumes are in CRITICAL or UNKNOWN state. Created attachment 953931 [details]
Status of services in the cluster, when all the nodes are down
based on the comments from 3,4 and 5, following will be the new cluster state and state information. Cluster State State Information UP "OK : None of the Volumes in the cluster are in Critical State" UP "OK : No Volumes present in the cluster" UP "WARNING : Some Volumes in the cluster are in Critical State" DOWN "CRITICAL: All Volumes in the cluster are in Critical State" DOWN "CRITICAL: All Volumes in the cluster are in unknown State" Upstream patch : http://review.gluster.org/#/c/9053/ Following will be the cluster state and state information with the fix. Cluster State State Information UP "OK : None of the Volumes in the cluster are in Critical State" UP "OK : No Volumes present in the cluster" UP "WARNING : Some Volumes in the cluster are in Critical State" UP "WARNING : Some Volumes in the cluster are in Unknown State" UP "WARNING : Some Volumes in the cluster are in Warning State" UP "WARNING : All Volumes in the cluster are in Warning State" DOWN "CRITICAL: All Volumes in the cluster are in Critical State" DOWN "CRITICAL: All Volumes in the cluster are in Unknown State" Verified and works fine with build nagios-server-addons-0.1.9-1.el6rhs. When all the nodes in the cluster goes down, Cluster status is displayed as "DOWN" with status information "CRITICAL : All Volumes in the cluster are in Unknown state". Hi Ramesh, Can you review the edited doc text for technical accuracy and sign off? Doc text looks good. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0039.html |