Bug 1235651

Summary: [New] - When no of bricks greater than the redundancy count goes offline disperse and distributed disperse should be marked critical.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: RamaKasturi <knarra>
Component: nagios-server-addonsAssignee: Darshan <dnarayan>
Status: CLOSED ERRATA QA Contact: RamaKasturi <knarra>
Severity: unspecified Docs Contact:
Priority: medium    
Version: rhgs-3.1CC: asriram, asrivast, bmohanra, dnarayan, dpati, rnachimu, sabose, vagarwal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nagios-server-addons-0.2.2-1, gluster-nagios-addons-0.2.5-1 Doc Type: Bug Fix
Doc Text:
Previously, the volume status service did not provide the status of disperse and distributed dispersed volumes. With this fix, the volume status service is modified to include the logic required for interpreting the volume status of disperse and distributed dispersed volumes and the volume status is now displayed correctly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-05 09:21:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1216951, 1251815    
Attachments:
Description Flags
volume status goes to warning for disperse and distribute disperse volumes. none

Description RamaKasturi 2015-06-25 12:43:12 UTC
Description of problem:
As of now for disperse and distributed disperse volumes when no of bricks greater than redundancy count goes offline these volumes should be marked as critical. But as of now volume status goes to warning state.

Version-Release number of selected component (if applicable):
nagios-server-addons-0.2.1-2.el6rhs.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create disperse and distributed disperse volumes.
2. Now run configure-gluster-nagios command to monitor them 
3. Now bring down bricks greater than the redundancy count.

Actual results:
volume status for disperse and distribute disperse is marked as WARNING.

Expected results:
volume status should be marked as critical when no.of bricks greater than redundancy count goes offline.

Additional info:

Comment 2 RamaKasturi 2015-06-25 12:44:42 UTC
Created attachment 1043081 [details]
volume status goes to warning for disperse and distribute disperse volumes.

Comment 3 Sahina Bose 2015-06-30 09:54:17 UTC
New volume types are not considered in nagios plugins - this is a functionality break.

Comment 4 Darshan 2015-07-02 08:24:38 UTC
Upstream fix patch link: http://review.gluster.org/#/q/topic:Bug-1235651

Comment 6 monti lawrence 2015-07-23 14:25:34 UTC
Doc text is edited. Please sign off to be included in Known Issues.

Comment 7 Darshan 2015-07-24 05:00:49 UTC
Looks Good.

Comment 12 RamaKasturi 2015-08-27 10:35:31 UTC
Verified and works fine with build gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64 and nagios-server-addons-0.2.2-1.el6rhs.noarch

In a disperse config of 1 x (4 + 2) when one brick goes down volume status in nagios goes to warning with status information as "WARNING : Volume : DISPERSE type Brick(s) - <brickpath> is are down, but disperse pair(s) are up.

In a disperse config of 1 x(4 + 2) when number of brick greater than redundancy count goes down volume status goes to critical state with status information as "CRITICAL : Volume: DISPERSE type Bricks - <brick path> are down, along with one or more disperse pair(s).

In a distributed disperse config of 2 x (4 + 2) when one brick goes down volume status in nagios goes to warning with status information as WARNING :Volume: DISTRIBUTED_DISPERSE type Brick(s) - <brick path> is are down, but disperse pair(s) are up .

In a distributed disperse config of 2 x (4 + 2) when bricks more than redundancy count in each of the distribute sets volume status goes to CRITICAL with status information "CRITICAL:Volume:DISTRIBUTED_DISPERSE type Bricks - <brickpath> are down, along with one or more disperse pair(s)

In a distributed disperse config of 2 x (4 + 2) when all the bricks in one of the distribute set goes down volume status is marked as CRITICAL with status information "CRITICAL:Volume:DISTRIBUTED_DISPERSE type Bricks - <brickpath> are down, along with one or more disperse pair(s)

Comment 13 Bhavana 2015-09-22 10:27:33 UTC
Hi Darshan,

The doc text is updated. Please review it and share your technical review comments. If it looks ok, then sign-off on the same.

Comment 14 Darshan 2015-09-22 10:45:46 UTC
(In reply to Bhavana from comment #13)
> Hi Darshan,
> 
> The doc text is updated. Please review it and share your technical review
> comments. If it looks ok, then sign-off on the same.

Small suggestion:

previously the volume status service was providing the status of disperse and distributed disperse volume, but it was incorrect.

Comment 16 errata-xmlrpc 2015-10-05 09:21:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1848.html