Bug 1109727
| Summary: | [Nagios] - when one brick in replicate volume goes faulty and if the other one is active geo replication volume status should be shown as 'PARTIAL_FAULTY' | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | RamaKasturi <knarra> |
| Component: | gluster-nagios-addons | Assignee: | Sahina Bose <sabose> |
| Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | rhgs-3.0 | CC: | dpati, nsathyan, psriniva, rnachimu, sabose |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | RHGS 3.0.3 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | gluster-nagios-addons-0.1.11-1.el6rhs | Doc Type: | Bug Fix |
| Doc Text: |
Previously, When one of the bricks in a replica pair was down in a replicate volume type, the status of the Geo-replication session was set to FAULTY. This resulted in the status of the Nagios plugin to be set to CRITICAL. With this fix, changes are made to ensure that if only one of bricks in a replica pair is down, the status of the Geo-replication session is set to PARTIAL FAULTY as the Geo-replication session is active on another Red Hat Storage node in such a scenario.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-01-15 13:48:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
RamaKasturi
2014-06-16 09:08:42 UTC
Currently, we cannot determine the status of nodes - sub-volume wise. There's no way to correlate the output of geo-rep status with that of gluster volume info as geo-rep status uses hostname of the node. We will be able to do this when we have the xml output for geo-rep which returns the host uuid. The logic to determine Faulty is count of passive + faulty nodes > (brick count/replica count) For instances in a 3 X 2 volume, B1 <-> B2, B3 <-> B4, B5 <-> B6 P - F, A - P, A - P Count of P+ F = 4 > (6/2) ==> Critical The existing code had a >= comparison to handle both replicate and distribute cases - separated the logic for these 2 volume types to fix this. in http://review.gluster.org/8443 From kanagaraj, i understand that these bugs have been moved to on_qa by errata. Since QE has not yet received the build i am moving this bug back to assigned state. Please move it on to on_qa once builds are attached to errata. Verified and works fine with build nagios-server-addons-0.1.8-1.el6rhs.noarch. In a replicate and distribute replicate when passive node goes to faulty, geo-replication status is shown as "Warning" with status information "Session Status: <vol_name> - PARTIAL_FAULTY. Hi Sahina, Can you please review the edited doc text and sign off on the technical accuracy? Looks good. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0039.html |