Description of problem: In RHGS 3.2, the PID files of the bricks have a path like: /var/lib/glusterd/vols/<VOLNAME>/run/<FILE>.pid (i.e. they are located in the "run" directory within a vols/<volname> location) In 3.3, this PID file is no longer there, but in: /var/run/gluster/vols/<VOLNAME>/<FILE>.pid It appears that /usr/lib64/nagios/plugins/gluster/check_gluster_proc_status.py is not aware of the new location. When running the monitoring tool (with verified brick running): /usr/lib64/nagios/plugins/gluster/check_gluster_proc_status.py -t BRICK -v <VOLNAME> -b /gluster/<VOLNAME>/export CRITICAL: Brick /gluster/<VOLNAME>/export is down According to an strace, it looks explicitly for the pid file in the run folder: --- open("/var/lib/glusterd/vols/<VOLNAME>/run/<FILE>.pid", O_RDONLY) = -1 ENOENT (No such file or directory) fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 ---- Version-Release number of selected component (if applicable): RHGS 3.3 How reproducible: Reproducible Steps to Reproduce: 1. Use RHGS 3.3, verify bricks are up 2. Use monitoring tool 3. /usr/lib64/nagios/plugins/gluster/check_gluster_proc_status.py -t BRICK -v <VOLNAME> -b /gluster/<VOLNAME>/export Actual results: "CRITICAL: Brick /gluster/<VOLNAME>/export is down" Expected results: Accurate information Additional info: could impact delivery of the customer's Gluster solution to business partners since operational monitoring is not working correctly. *bug filed in "core" but please move it if necessary. I was not sure if it should be filed under "gluster-nagios-addons" or not*
Patch posted, reviews welcome! https://review.gluster.org/18425
I am not sure how stable a workaround would be. You could create symlinks for each .pid file and it should work as long as the .pid files exist under the /var/run/ path. No idea how the tools respond on a symlink that points to a missing file though. # ln -s /var/run/gluster/vols/<VOLNAME>/<FILE>.pid \ /var/lib/glusterd/vols/<VOLNAME>/run/<FILE>.pid The /var/lib/glusterd/vols/<VOLNAME>/run directory may not exist, so you would need to create that as well. Please try it out in a test-environment before suggesting it to a customer.
Tested and verified this on the build gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 and glusterfs-.3.8.4-48. Brick processes when killed goes to CRITICAL, volume-status goes to WARNING. When recovered, everything goes OK again. Moving this bug to verified for 3.3.1.
Adding Doc text to BZ as provided by Sahina in Errata Advisory 30965 (https://errata.devel.redhat.com/docs/show/30965) Hi Darshan, I've edited the Doc Text for it's associated Errata. Request you to review the same and let me know in case of any concerns. If no changes are required, request you to provide your approval for the same.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3272
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days