Bug 983129 - nagios-3.4.1-2.el6 init script overwrites pid file unnecessarily
nagios-3.4.1-2.el6 init script overwrites pid file unnecessarily
Status: CLOSED ERRATA
Product: Fedora EPEL
Classification: Fedora
Component: nagios (Show other bugs)
el6
All Linux
unspecified Severity low
: ---
: ---
Assigned To: Jose Pedro Oliveira
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-10 10:57 EDT by Jason Kincl
Modified: 2013-09-15 14:33 EDT (History)
8 users (show)

See Also:
Fixed In Version: nagios-3.5.1-1.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-15 14:33:01 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jason Kincl 2013-07-10 10:57:35 EDT
Description of problem:
In the init script for nagios that is shipped with the RPM: /etc/rc.d/init.d/nagios, when executing "start" after starting the nagios binary runs: 

pidof nagios > $NagiosRunFile

which overwrites the pid that the nagios binary writes to the file /var/run/nagios.pid when it starts with extra pids of child processes spawned from the nagios parent process. This causes erroneous errors to be displayed when a "stop" or "reload" are executed later on because those child processes have finished executing.

Version-Release number of selected component (if applicable):

nagios-3.4.1-2.el6

How reproducible:



Steps to Reproduce:
1. Start nagios daemon with init script with some hosts and services configured
2. Wait a few minutes for the initial spawned child processes to finish
3. Use init script to reload, restart, stop the nagios daemon

Actual results:

# /etc/init.d/nagios restart
Running configuration check...done.
Stopping nagios: /etc/init.d/nagios: line 74: kill: (30429) - No such process
/etc/init.d/nagios: line 74: kill: (30408) - No such process
done.
Starting nagios: done.


Expected results:

# /etc/init.d/nagios restart
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.


Additional info:
Comment 1 Jose Pedro Oliveira 2013-07-13 20:36:04 EDT
Jason,

I can't reproduce the problem with nagios-3.5.0-1.el6:
  
   # rpm -q nagios
   nagios-3.5.0-1.el6.i686

   # /etc/init.d/nagios restart
   Running configuration check...done.
   Stopping nagios: done.
   Starting nagios: done.

Could you upgrade to the latest nagios version available in EPEL6 and see you can reproduce the problem?

tia,
jpo
Comment 2 Kevin Sumner 2013-08-20 18:30:39 EDT
I'm actually seeing this with the 3.4.4 version, but after checking 3.5.0 RPM from EPEL6, I believe this would still be the case.  The creation of the nagios.pid file with multiple PIDs is dependent upon a race condition between the Nagios process spawning a child and pidof being executed by the init script.

- If Nagios process spawns a child first, a second (possibly more) PIDs are see in nagios.pid
- If pidof runs before a spawn, only one PID is found in nagios.pid

So the outcome is dependent on a lot of factors, particularly if nagios needs to spawn immediately (e.g. for a particular check to run).  I would imagine that an empty or small Nagios config would favor correct behavior, as there's little to no need for it to spawn children, so reproducibility would be difficult.  As case study, I've only seen this begin to happen after our config has grown significantly over the past year.

Regardless, here's the patch that should fix this, taken against nagios-3.5.0-1.el6.x86_64.  Since Nagios manages its own PID file just fine, there's no need for the init script to overwrite the config file with pidof.

diff -u /tmp/nagios.orig /etc/init.d/nagios
--- /tmp/nagios.orig    2013-08-20 22:19:50.158724164 +0000
+++ /etc/init.d/nagios  2013-08-20 22:19:59.501536667 +0000
@@ -138,7 +138,6 @@
                        chown $NagiosUser:$NagiosGroup $NagiosRunFile
                        [ -x /sbin/restorecon ] && /sbin/restorecon $NagiosRunFile
                        $NagiosBin -d $NagiosCfgFile
-                        pidof nagios > $NagiosRunFile
                        if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi
                        echo " done."
                        exit 0
Comment 3 Jose Pedro Oliveira 2013-08-28 21:55:59 EDT
TODO list (starting point: git master branch):

 1. The "pidof nagios > $NagiosRunFile" line is being added by the patch
    nagios-0001-from-rpm.patch.

 2. The patch nagios-0002-SELinux-relabeling.patch also needs to be updated
Comment 4 Fedora Update System 2013-08-28 23:13:51 EDT
nagios-3.5.0-2.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/nagios-3.5.0-2.el6
Comment 5 Jose Pedro Oliveira 2013-08-28 23:21:35 EDT
Changes also in nagios-3.5.0-9.fc20 and nagios-3.5.0-9.fc21.

Koji nagios builds:
http://koji.fedoraproject.org/koji/packageinfo?packageID=2593
Comment 6 Fedora Update System 2013-08-29 13:42:25 EDT
Package nagios-3.5.0-2.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing nagios-3.5.0-2.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-11385/nagios-3.5.0-2.el6
then log in and leave karma (feedback).
Comment 7 Fedora Update System 2013-08-30 18:31:26 EDT
nagios-3.5.1-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/nagios-3.5.1-1.el6
Comment 8 Fedora Update System 2013-09-15 14:33:01 EDT
nagios-3.5.1-1.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.