Bug 1109025
| Summary: | [Nagios] Cluster services show weird behavior when some nodes in the cluster were taken down | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Shruti Sampat <ssampat> |
| Component: | gluster-nagios-addons | Assignee: | Nishanth Thomas <nthomas> |
| Status: | CLOSED ERRATA | QA Contact: | Shruti Sampat <ssampat> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | rhgs-3.0 | CC: | esammons, kmayilsa, nthomas, rhsc-qe-bugs |
| Target Milestone: | --- | ||
| Target Release: | RHGS 3.0.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | gluster-nagios-addons-0.1.4-1.el6rhs, nagios-server-addons-0.1.4-1.el6rhs | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-09-22 19:11:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Shruti Sampat
2014-06-13 05:22:10 UTC
Another observation is that the status of cluster auto-configuration service changes to WARNING with status information reading as 'null' when a couple of nodes were powered off. It returns to OK when the nodes are brought back up. Verified as fixed in gluster-nagios-addons-0.1.4-1.el6rhs.x86_64, nagios-server-addons-0.1.4-1.el6rhs.x86_64 Performed the following steps - 1. Created a cluster of 7 RHS nodes, created a distributed-replicate volume with server-side quorum enabled and server-quorum-ratio set to 80%. 2. Brought down 2 of the RHS nodes, causing quorum to be lost for the volume. The following results were seen - Cluster - Quorum service was critical as quorum was lost for the volume. Volume Utilization was unknown as the volume was down, because of quorum not being met. Volume status was critical as all bricks of the volume were down, owing to quorum not being met. Volume Self-Heal was in warning state as self-heal status could not be determined. Cluster utilization was unknown as volume utilization was unknown. Marking as VERIFIED. One more observation, the host representing the cluster itself in the Nagios UI is down, because all volumes are critical, which is expected behavior. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1277.html |