Bug 1498112 - [GSS] new PID file location is not known to nagios, causing inaccurate reporting [NEEDINFO]
Summary: [GSS] new PID file location is not known to nagios, causing inaccurate reporting
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: gluster-nagios-addons
Version: rhgs-3.3
Hardware: All
OS: Linux
unspecified
low
Target Milestone: ---
: RHGS 3.3.1
Assignee: Darshan
QA Contact: Sweta Anandpara
URL:
Whiteboard:
Depends On:
Blocks: 1475688 1512609
TreeView+ depends on / blocked
 
Reported: 2017-10-03 13:59 UTC by Pan Ousley
Modified: 2017-11-29 03:24 UTC (History)
11 users (show)

Fixed In Version: gluster-nagios-addons-0.2.10-2
Doc Type: Bug Fix
Doc Text:
Gluster brick process monitoring does not account for the new location of the brick process file. With this update, nagios monitoring checks the correct location of gluster brick PID file.
Clone Of:
: 1512609 (view as bug list)
Environment:
Last Closed: 2017-11-29 03:24:44 UTC
Target Upstream Version:
pmulay: needinfo? (dnarayan)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3236681 None None None 2017-11-09 14:02:19 UTC
Red Hat Product Errata RHBA-2017:3272 normal SHIPPED_LIVE gluster-nagios-addons bug fix and enhancement update 2017-11-29 08:24:15 UTC

Description Pan Ousley 2017-10-03 13:59:57 UTC
Description of problem:

In RHGS 3.2, the PID files of the bricks have a path like:

/var/lib/glusterd/vols/<VOLNAME>/run/<FILE>.pid

(i.e. they are located in the "run" directory within a vols/<volname> location)

In 3.3, this PID file is no longer there, but in:

/var/run/gluster/vols/<VOLNAME>/<FILE>.pid

It appears that /usr/lib64/nagios/plugins/gluster/check_gluster_proc_status.py is not aware of the new location.

When running the monitoring tool (with verified brick running):

 /usr/lib64/nagios/plugins/gluster/check_gluster_proc_status.py -t BRICK -v <VOLNAME> -b /gluster/<VOLNAME>/export
CRITICAL: Brick /gluster/<VOLNAME>/export is down

According to an strace, it looks explicitly for the pid file in the run folder:
---
open("/var/lib/glusterd/vols/<VOLNAME>/run/<FILE>.pid", O_RDONLY) = -1 ENOENT (No such file or directory)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
----


Version-Release number of selected component (if applicable): RHGS 3.3


How reproducible: Reproducible


Steps to Reproduce:
1. Use RHGS 3.3, verify bricks are up
2. Use monitoring tool
3. /usr/lib64/nagios/plugins/gluster/check_gluster_proc_status.py -t BRICK -v <VOLNAME> -b /gluster/<VOLNAME>/export

Actual results: "CRITICAL: Brick /gluster/<VOLNAME>/export is down"


Expected results: Accurate information


Additional info: could impact delivery of the customer's Gluster solution to business partners since operational monitoring is not working correctly.


*bug filed in "core" but please move it if necessary. I was not sure if it should be filed under "gluster-nagios-addons" or not*

Comment 2 Niels de Vos 2017-10-03 14:58:27 UTC
Patch posted, reviews welcome!

  https://review.gluster.org/18425

Comment 4 Niels de Vos 2017-10-09 11:22:17 UTC
I am not sure how stable a workaround would be. You could create symlinks for each .pid file and it should work as long as the .pid files exist under the /var/run/ path. No idea how the tools respond on a symlink that points to a missing file though.

    # ln -s /var/run/gluster/vols/<VOLNAME>/<FILE>.pid \
            /var/lib/glusterd/vols/<VOLNAME>/run/<FILE>.pid

The /var/lib/glusterd/vols/<VOLNAME>/run directory may not exist, so you would need to create that as well.

Please try it out in a test-environment before suggesting it to a customer.

Comment 9 Sweta Anandpara 2017-10-24 11:18:02 UTC
Tested and verified this on the build gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 and glusterfs-.3.8.4-48.
Brick processes when killed goes to CRITICAL, volume-status goes to WARNING. When recovered, everything goes OK again.

Moving this bug to verified for 3.3.1.

Comment 10 Pratik Mulay 2017-11-17 12:34:31 UTC
Adding Doc text to BZ as provided by Sahina in Errata Advisory 30965 (https://errata.devel.redhat.com/docs/show/30965)

Hi Darshan,

I've edited the Doc Text for it's associated Errata.

Request you to review the same and let me know in case of any concerns.

If no changes are required, request you to provide your approval for the same.

Comment 13 errata-xmlrpc 2017-11-29 03:24:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3272


Note You need to log in before you can comment on or make changes to this bug.