Description of problem: In the init script for nagios that is shipped with the RPM: /etc/rc.d/init.d/nagios, when executing "start" after starting the nagios binary runs: pidof nagios > $NagiosRunFile which overwrites the pid that the nagios binary writes to the file /var/run/nagios.pid when it starts with extra pids of child processes spawned from the nagios parent process. This causes erroneous errors to be displayed when a "stop" or "reload" are executed later on because those child processes have finished executing. Version-Release number of selected component (if applicable): nagios-3.4.1-2.el6 How reproducible: Steps to Reproduce: 1. Start nagios daemon with init script with some hosts and services configured 2. Wait a few minutes for the initial spawned child processes to finish 3. Use init script to reload, restart, stop the nagios daemon Actual results: # /etc/init.d/nagios restart Running configuration check...done. Stopping nagios: /etc/init.d/nagios: line 74: kill: (30429) - No such process /etc/init.d/nagios: line 74: kill: (30408) - No such process done. Starting nagios: done. Expected results: # /etc/init.d/nagios restart Running configuration check...done. Stopping nagios: done. Starting nagios: done. Additional info:
Jason, I can't reproduce the problem with nagios-3.5.0-1.el6: # rpm -q nagios nagios-3.5.0-1.el6.i686 # /etc/init.d/nagios restart Running configuration check...done. Stopping nagios: done. Starting nagios: done. Could you upgrade to the latest nagios version available in EPEL6 and see you can reproduce the problem? tia, jpo
I'm actually seeing this with the 3.4.4 version, but after checking 3.5.0 RPM from EPEL6, I believe this would still be the case. The creation of the nagios.pid file with multiple PIDs is dependent upon a race condition between the Nagios process spawning a child and pidof being executed by the init script. - If Nagios process spawns a child first, a second (possibly more) PIDs are see in nagios.pid - If pidof runs before a spawn, only one PID is found in nagios.pid So the outcome is dependent on a lot of factors, particularly if nagios needs to spawn immediately (e.g. for a particular check to run). I would imagine that an empty or small Nagios config would favor correct behavior, as there's little to no need for it to spawn children, so reproducibility would be difficult. As case study, I've only seen this begin to happen after our config has grown significantly over the past year. Regardless, here's the patch that should fix this, taken against nagios-3.5.0-1.el6.x86_64. Since Nagios manages its own PID file just fine, there's no need for the init script to overwrite the config file with pidof. diff -u /tmp/nagios.orig /etc/init.d/nagios --- /tmp/nagios.orig 2013-08-20 22:19:50.158724164 +0000 +++ /etc/init.d/nagios 2013-08-20 22:19:59.501536667 +0000 @@ -138,7 +138,6 @@ chown $NagiosUser:$NagiosGroup $NagiosRunFile [ -x /sbin/restorecon ] && /sbin/restorecon $NagiosRunFile $NagiosBin -d $NagiosCfgFile - pidof nagios > $NagiosRunFile if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi echo " done." exit 0
TODO list (starting point: git master branch): 1. The "pidof nagios > $NagiosRunFile" line is being added by the patch nagios-0001-from-rpm.patch. 2. The patch nagios-0002-SELinux-relabeling.patch also needs to be updated
nagios-3.5.0-2.el6 has been submitted as an update for Fedora EPEL 6. https://admin.fedoraproject.org/updates/nagios-3.5.0-2.el6
Changes also in nagios-3.5.0-9.fc20 and nagios-3.5.0-9.fc21. Koji nagios builds: http://koji.fedoraproject.org/koji/packageinfo?packageID=2593
Package nagios-3.5.0-2.el6: * should fix your issue, * was pushed to the Fedora EPEL 6 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=epel-testing nagios-3.5.0-2.el6' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-11385/nagios-3.5.0-2.el6 then log in and leave karma (feedback).
nagios-3.5.1-1.el6 has been submitted as an update for Fedora EPEL 6. https://admin.fedoraproject.org/updates/nagios-3.5.1-1.el6
nagios-3.5.1-1.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.