Bug 1235651 - [New] - When no of bricks greater than the redundancy count goes offline disperse and distributed disperse should be marked critical.
Summary: [New] - When no of bricks greater than the redundancy count goes offline disp...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: nagios-server-addons
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: ---
: RHGS 3.1.1
Assignee: Darshan
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On:
Blocks: 1216951 1251815
TreeView+ depends on / blocked
 
Reported: 2015-06-25 12:43 UTC by RamaKasturi
Modified: 2015-10-05 09:21 UTC (History)
8 users (show)

Fixed In Version: nagios-server-addons-0.2.2-1, gluster-nagios-addons-0.2.5-1
Doc Type: Bug Fix
Doc Text:
Previously, the volume status service did not provide the status of disperse and distributed dispersed volumes. With this fix, the volume status service is modified to include the logic required for interpreting the volume status of disperse and distributed dispersed volumes and the volume status is now displayed correctly.
Clone Of:
Environment:
Last Closed: 2015-10-05 09:21:47 UTC
Embargoed:


Attachments (Terms of Use)
volume status goes to warning for disperse and distribute disperse volumes. (365.55 KB, image/png)
2015-06-25 12:44 UTC, RamaKasturi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1848 0 normal SHIPPED_LIVE Red Hat Gluster Storage Console 3.1 update 1 bug fixes 2015-10-05 13:19:50 UTC

Description RamaKasturi 2015-06-25 12:43:12 UTC
Description of problem:
As of now for disperse and distributed disperse volumes when no of bricks greater than redundancy count goes offline these volumes should be marked as critical. But as of now volume status goes to warning state.

Version-Release number of selected component (if applicable):
nagios-server-addons-0.2.1-2.el6rhs.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create disperse and distributed disperse volumes.
2. Now run configure-gluster-nagios command to monitor them 
3. Now bring down bricks greater than the redundancy count.

Actual results:
volume status for disperse and distribute disperse is marked as WARNING.

Expected results:
volume status should be marked as critical when no.of bricks greater than redundancy count goes offline.

Additional info:

Comment 2 RamaKasturi 2015-06-25 12:44:42 UTC
Created attachment 1043081 [details]
volume status goes to warning for disperse and distribute disperse volumes.

Comment 3 Sahina Bose 2015-06-30 09:54:17 UTC
New volume types are not considered in nagios plugins - this is a functionality break.

Comment 4 Darshan 2015-07-02 08:24:38 UTC
Upstream fix patch link: http://review.gluster.org/#/q/topic:Bug-1235651

Comment 6 monti lawrence 2015-07-23 14:25:34 UTC
Doc text is edited. Please sign off to be included in Known Issues.

Comment 7 Darshan 2015-07-24 05:00:49 UTC
Looks Good.

Comment 12 RamaKasturi 2015-08-27 10:35:31 UTC
Verified and works fine with build gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64 and nagios-server-addons-0.2.2-1.el6rhs.noarch

In a disperse config of 1 x (4 + 2) when one brick goes down volume status in nagios goes to warning with status information as "WARNING : Volume : DISPERSE type Brick(s) - <brickpath> is are down, but disperse pair(s) are up.

In a disperse config of 1 x(4 + 2) when number of brick greater than redundancy count goes down volume status goes to critical state with status information as "CRITICAL : Volume: DISPERSE type Bricks - <brick path> are down, along with one or more disperse pair(s).

In a distributed disperse config of 2 x (4 + 2) when one brick goes down volume status in nagios goes to warning with status information as WARNING :Volume: DISTRIBUTED_DISPERSE type Brick(s) - <brick path> is are down, but disperse pair(s) are up .

In a distributed disperse config of 2 x (4 + 2) when bricks more than redundancy count in each of the distribute sets volume status goes to CRITICAL with status information "CRITICAL:Volume:DISTRIBUTED_DISPERSE type Bricks - <brickpath> are down, along with one or more disperse pair(s)

In a distributed disperse config of 2 x (4 + 2) when all the bricks in one of the distribute set goes down volume status is marked as CRITICAL with status information "CRITICAL:Volume:DISTRIBUTED_DISPERSE type Bricks - <brickpath> are down, along with one or more disperse pair(s)

Comment 13 Bhavana 2015-09-22 10:27:33 UTC
Hi Darshan,

The doc text is updated. Please review it and share your technical review comments. If it looks ok, then sign-off on the same.

Comment 14 Darshan 2015-09-22 10:45:46 UTC
(In reply to Bhavana from comment #13)
> Hi Darshan,
> 
> The doc text is updated. Please review it and share your technical review
> comments. If it looks ok, then sign-off on the same.

Small suggestion:

previously the volume status service was providing the status of disperse and distributed disperse volume, but it was incorrect.

Comment 16 errata-xmlrpc 2015-10-05 09:21:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1848.html


Note You need to log in before you can comment on or make changes to this bug.