From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1 Description of problem: Nagios configuration runs fine for a couple of minutes... Then it starts to fail all the plugins with the plugin error -127 out of bounds... The error "Error: Cannot open resource file '/etc/nagios/private/resource.cfg' for reading!" is first reported in the log. (It received a SIGHUP from somewhere as well... or it thinks it did...). Default permissions on the file are 0440 with root.root as owner/group. Version-Release number of selected component (if applicable): nagios-2.0-0.2.rc2.fc4 How reproducible: Always Steps to Reproduce: 1. Clean install and rename *sample to *cfg (with minor edits) 2. Run Nagios for a little while... ( 2 minutes on my system) 3. Experience the resource failure.. Actual Results: [1139103353] Caught SIGHUP, restarting... [1139103353] Error: Cannot open resource file '/etc/nagios/private/resource.cfg' for reading! [1139103353] Nagios 2.0rc2 starting... (PID=17066) [1139103353] LOG VERSION: 2.0 Expected Results: Continue running as normal... Additional info:
hmmm. Curious, what command do you use to start Nagios? Also what version of the plugins do you have installed? One more thing, when you say it runs fine for a little while, what does the status look like on the nagios status page? Do the services check as OK for a while and then fail? Or do they start as undetermined and then fail?
/etc/init.d/nagios start Manual Plugins ... some perl .. some pieced together.. most from nagiosplug.sourceforge.net Services check as OK for a while and then after 2 minutes they all switch to (-127 out of bounds, etc. etc.)... It looks like the nagios system receives a SIGHUP after executing one of my scripts (which basically intercepts if it is a HARD fail it will /etc/init.d/sendmail restart)... Anyways, modifying the permissions on resource.cfg to 0444 fixes it (although the resource file shouldn't be read by others... but in my case it's an internal secured server anyways..
Thats interesting, I can't seem to recreate this issue. If you wouldn't mind could you upload the plugin that you think is sending Nagios the SIGHUP?
There isn't a plugin that SIGHUPs the nagios process... When one service checks fails... it goes through the regular process of switching to a SOFT FAIL and then eventually to a HARD FAIL. These events are handed to the event handler (a simple shell script that is basically a case, and one of the cases if HARD will execute /etc/init.d/sendmail restart). Once the event handler fires with the failure, that is when I see the SIGHUP... Anyways, here is the snip of the #!/bin/sh case "$1" in OK) ;; WARNING) ;; UNKNOWN) ;; CRITICAL) case "$2" in SOFT) ;; HARD) sudo /etc/init.d/sendmail restart ;; esac ;; esac exit 0
Going back through the logs... it does occur at other times... it's just more frequent with event handlers..., and for no apparent common reason... Is there a way to increase logging to see where it's getting this SIGHUP signal from?
Hmm, not sure if there's a way that would be helpful for this, now that you have the perms on the file/directory set to world readable, are you still seeing the SIGHUP's? Do any of your scripts send SIGHUP's as part of restarting some failed service? I'm curious if this is just something that has always happened on your system but was only noticable when the private directory was locked down.
Actually it doesn't matter at this point, I'm going to make private and resource.cfg-sample nagios group readable. I don't know whats sending teh SIGHUPs. It's not happening on my box but when I send a SIGHUP it does fail open private/resource.cfg And that shouldn't happen.
Mike, Re-opening this ticket. The solution mentioned in the previous comment still hasn't been committed (the resource.cfg-sample file still belongs to the root group). $ rpm -q nagios nagios-2.5-1.fc5 $ rpm -qlv nagios | grep private drwxr-x--- 2 root root 0 Jul 14 17:49 /etc/nagios/private -rw-r----- 1 root root 1331 Jul 14 17:49 /etc/nagios/private/resource.cfg-sample jpo
I don't know how this slipped my mind but it is now saved and built. Thanks for re-opening it.