Bug 1473780

Summary: check_volume_status.py from gluster-nagios-addons crashes when requesting -t self-heal with a missing brick
Product: [Community] GlusterFS Reporter: Ted Miller <tmiller21>
Component: unclassifiedAssignee: Sahina Bose <sabose>
Status: CLOSED EOL QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: mainlineCC: atumball, bugs, tmiller21
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-18 09:08:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ted Miller 2017-07-21 16:00:41 UTC
Description of problem: check_volume_status.py from gluster-nagios-addons crashes when requesting -t self-heal on volume with a missing brick. e.g. 
./check_volume_status.py -v <volume_name> -t self-heal

Version-Release number of selected component (if applicable):
gluster-nagios-addons.rpm 1.1.0
gluster-nagios-common 1.1.0
glusterfs 3.7.20

How reproducible: 100%

Steps to Reproduce:
1. Use a replica 5 volume with one brick offline
2. ./check_volume_status.py -v <volume_name> -t self-heal
3.

Actual results: Traceback (most recent call last):
  File "./check_volume_status.py", line 176, in <module>
    exitstatus, message = getVolumeSelfHealSplitBrainStatus(args)
  File "./check_volume_status.py", line 88, in getVolumeSelfHealSplitBrainStatus
    volume = glustercli.volumeHealSplitBrainStatus(args.volume)
  File "/usr/lib64/python2.7/site-packages/glusternagios/glustercli.py", line 639, in volumeHealSplitBrainStatus
    return _volumeHealCommandOutput(volumeName, command, remoteServer)
  File "/usr/lib64/python2.7/site-packages/glusternagios/glustercli.py", line 657, in _volumeHealCommandOutput
    value = _parseVolumeSelfHealInfo(out)
  File "/usr/lib64/python2.7/site-packages/glusternagios/glustercli.py", line 508, in _parseVolumeSelfHealInfo
    entries = int(line.split(':')[1])
ValueError: invalid literal for int() with base 10: '-'

Expected results: (formatted for easy reading)
['Brick 10.130.12.121:/bricks/brick_songs1/songs1', 
'Status: Connected', 
'Number of entries in split-brain: 0', 
'', 
'Brick 10.130.12.131:/bricks/brick_songs1/songs1', 
'Status: Connected', 
'Number of entries in split-brain: 0', 
'', 
'Brick 10.130.12.111:/bricks/brick_songs1/songs1', 
'Status: Connected', 
'Number of entries in split-brain: 0', 
'', 
'Brick 10.130.12.109:/bricks/brick_songs1/songs1', 
'Status: Transport endpoint is not connected', 
'Number of entries in split-brain: -', 
'', 
'Brick 10.130.12.105:/bricks/brick_songs1/songs1', 
'Status: Connected', 
'Number of entries in split-brain: 0', 
'']
No split brain state entries found. (or maybe an error message?)

Additional info:
Problem is in _parseVolumeSelfHealInfo() from glustercli.py in the gluster-nagios-common (following from current github)

def _parseVolumeSelfHealInfo(out):
    value = {}
    splitbrainentries = 0
    for line in out:
        if line.startswith('Number of entries'):
            entries = int(line.split(':')[1])

As can be seen in the "Expected Results", 4th "Number of lines in split-brain" line makes the last code line above crash, because it has a '-' instead of an integer after the colon.
(Expected result above was obtained by inserting a print statement near the beginning of the code snippet above.)

Comment 1 Amar Tumballi 2018-09-18 09:08:14 UTC
gluster-nagios plugin is not planned to be maintained anymore. Please post any concerns and questions on alternative setups in our mailing lists.