Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1081900 - [Nagios] [RFE] Alerting mechanism for split-brain from Nagios
[Nagios] [RFE] Alerting mechanism for split-brain from Nagios
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: gluster-nagios-addons (Show other bugs)
3.0
Unspecified Unspecified
high Severity high
: ---
: RHGS 3.1.0
Assigned To: Sahina Bose
RamaKasturi
: FutureFeature, Reopened
Depends On: 1100563
Blocks: 1033197 1202842
  Show dependency treegraph
 
Reported: 2014-03-28 03:28 EDT by Prasanth
Modified: 2015-07-29 01:25 EDT (History)
11 users (show)

See Also:
Fixed In Version: gluster-nagios-addons-0.2.1-1
Doc Type: Enhancement
Doc Text:
Previously, there was no way to alert the user when split-brain is detected on a replicate volume. Due to this, users did not know the issue to take timely corrective action. With this enhancement, the Nagios plugin for self-heal monitoring has been enhanced to report if any of the entries are in split-brain state. Plugin has been renamed from "Volume Self-heal" to "Volume Split-brain status".
Story Points: ---
Clone Of: 1033197
Environment:
Last Closed: 2015-07-29 01:25:34 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
divya: needinfo+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2015:1494 normal SHIPPED_LIVE Red Hat Gluster Storage Console 3.1 Enhancement and bug fixes 2015-07-29 05:24:02 EDT

  None (edit)
Description Prasanth 2014-03-28 03:28:22 EDT
+++ This bug was initially created as a clone of Bug #1033197 +++

Description of problem:

split-brains are inevitable in the field either because of network issues or due to bugs in the software stack. 

There is no way currently for storage administrators to be notified of split-brain situations so that they can take remedial action. 

This is RFE (Request For Enhancement) to provide an alerting mechanism to storage administrators of split-brain situations. Furthermore, a mechanism needs to provided to storage administrator to diagnose the situation, identify root cause and take remedial action. This latter part is perhaps a different RFE, but combining it here until we have an wholesome assessment of this entire request. 
 
Version-Release number of selected component (if applicable):

RHSC 2.1 and RHS 2.1 


Additional info:

Alerts should be generated in case of split-brains in 

- client facing network
- server side network 
- or combinations of the above 

If there is a loss of connectivity between the management network (where RHSC is located) with clients and/or servers an alert to that effect also needs to be in place.

--- Additional comment from RHEL Product and Program Management on 2013-11-21 12:24:43 EST ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.
Comment 2 Dusmant 2014-04-01 01:15:07 EDT

*** This bug has been marked as a duplicate of bug 1033197 ***
Comment 3 Prasanth 2014-04-01 06:36:53 EDT
Not sure why this bug was closed as duplicate as it was created specifically for having the feature included in Nagios as per the last Bug triage:

-----------------------------------------------
As discussed in the triage meeting, a new bug is now opened to track this feature through Nagios. ( Currently Alerts would not be shown in RHSC. They will be shown only in Nagios UI )
---------Note from triage meeting--------------
1033197 - Out, for now. A different bug will be created for monitoring split-brain using Nagios. (Bug 1081900 opened for the same)
-----------------------------------------------

Hence re-opening it.
Comment 4 Shubhendu Tripathi 2014-05-21 12:02:51 EDT
Currently there is way in gluster to identify a split brain and so in Nagios UI there is no way to alert the case of a split brain.
Currently in Nagios the split brain scenario is being identified based on the quorum check for the volume.
Comment 5 Shubhendu Tripathi 2014-05-22 10:44:19 EDT
Small correction in the comment earlier. Please read as below -

"Currently there is NO way in gluster to identify a split brain and so in Nagios UI there is no way to alert the case of a split brain.
Currently in Nagios the split brain scenario is being identified based on the quorum check for the volume."

Sorry for the typo.
Comment 6 Dusmant 2014-05-29 03:21:47 EDT
As discussed with Alok, Vijay and other key stake holders over e-mail, i am taking this bug out of Denali release.
Comment 7 Sahina Bose 2015-02-10 00:14:12 EST
We will be taking the following in for Everglades:

1. Alerting when files are in split brain (using the "gluster volume heal split-brain info")

2. When there's a network split-brain this is currently alerted using the Cluster-quorum plugin (this plugin will alert the administrator when volumes have lost quorum as long as server side quorum is turned on)
Comment 8 Sahina Bose 2015-03-02 04:20:07 EST
Patches http://review.gluster.org/9782 and  http://review.gluster.org/9783 posted
Comment 10 RamaKasturi 2015-06-19 09:23:06 EDT
Verified and works fine with gluster-nagios-addons-0.2.3-1.el6rhs.x86_64.

Currently when  nagios detects that split brain has occurred it marks the Volume Split-Brain status - <vol_name> service to critical and  shows how many no.of files are in split brain. 

When there is no split brain detected, Volume Split-brain status - <vol_name> remains in OK state with status information as "No split brain state entries found".

When the volume is stopped / deleted, Volume Split-brain status - <vol_name> displays the status as WARNING with status information as "split brain status could not be determined"
Comment 11 RamaKasturi 2015-06-19 09:27:49 EDT
An email and snmp notifications are sent when split brain status changes to critical and when it comes back to normal again.
Comment 12 Divya 2015-07-26 01:09:48 EDT
Sahina,

Please review the edited doc text and sign-off.
Comment 13 Sahina Bose 2015-07-27 01:04:16 EDT
Acked
Comment 16 errata-xmlrpc 2015-07-29 01:25:34 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-1494.html

Note You need to log in before you can comment on or make changes to this bug.