Bug 1666209 - Nagios cannot start after system reboot because of missing directory
Summary: Nagios cannot start after system reboot because of missing directory
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: nagios
Version: epel7
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Stephen John Smoogen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-15 08:28 UTC by Stefan Joosten
Modified: 2019-07-18 09:49 UTC (History)
13 users (show)

Fixed In Version: nagios-4.4.3-1.fc28 nagios-4.4.3-1.fc29 nagios-4.4.3-1.el6 nagios-4.4.3-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-30 01:31:56 UTC


Attachments (Terms of Use)

Description Stefan Joosten 2019-01-15 08:28:49 UTC
Description of problem:

Nagios fails to start after a reboot (on a CentOS 7 system). This is caused by a missing directory in the path where Nagios wants to write it's lock file (/var/run/nagios/nagios.pid) at.

Directory {,/var}/run/nagios is created by the RPM after installing the package.  Ans the service runs fine at that moment. But /run is a tmpfs, clearing out all content after a reboot. Starting the service via `systemctl start nagios` does not create the directory causing Nagios to exit abnormally and require manual intervention to get going again.


Version-Release number of selected component (if applicable):
nagios-4.3.4-5.el7.x86_64 (stable)
nagios-4.4.2-3.el7.x86_64 (testing)

How reproducible:

I was able to reproduce this on two CentOS 7 systems. And have also tried the latest nagios package from testing by enabling epel-testing. You install the "nagios" package. Optionally start the service. Reboot the machine. Log back in and try to start the "nagios" service. It fails to start.

# yum install nagios
# ls -laZ /var/run/nagios
drwxr-x---. nagios nagios system_u:object_r:nagios_var_run_t:s0 .
drwxr-xr-x. root   root   system_u:object_r:var_run_t:s0   ..
# 


Steps to Reproduce:
1. Install Nagios:
 # yum install nagios
2. Verify existence of lock file directory: 
 # if [[ -x /run/nagios ]]; then echo "Directory /run/nagios exists"; else echo "Directory /run/nagios does not exist"; fi
3. Optional: start the Nagios service
 # systemctl start nagios
4. Reboot the machine
 # reboot
5. Try to start the Nagios service again (will fail)
 # systemctl start nagios
6. Inspect error message
7. Verify existence of lock file directory again and discover it is missing.

Actual results:
# systemctl start nagios
Job for nagios.service failed because the control process exited with error code. See "systemctl status nagios.service" and "journalctl -xe" for details.

# journalctl -xe --unit nagios
..
Jan 15 08:30:43 <hostname> nagios[5823]: Failed to obtain lock on file /var/run/nagios/nagios.pid: No such file or directory
..


Expected results:
# systemctl start nagios
should exit normally and the Nagios service should be running.

The /run/nagios directory should be created upon start of the service.
Or the Nagios package configuration file could be changed to place the lock file directly in /run instead of in a subdirectory of it.


Additional info:

Out of the box the /etc/nagios/nagios.cfg contains:
# grep lock_file /etc/nagios/nagios.cfg 
lock_file=/var/run/nagios/nagios.pid

Comment 1 Stefan Joosten 2019-01-15 09:45:33 UTC
Two ways of fixing I came up with are:
A. Change nagios.cfg to go directly to /var/run by taking out the `nagios` subdirectory of the `lock_file` option
B. Have the systemd.service create the `nagios` runtime directory. This might be easier if this problem does not occur on EL6 for example (perhaps it's init script already takes care of this, I haven't checked).

Solution A, changing nagios.cfg:

--- nagios.cfg	2019-01-15 10:14:06.346940829 +0100
+++ nagios.cfg.fix_lock_path	2019-01-15 10:14:16.884977509 +0100
@@ -166,7 +166,7 @@
 # This is the lockfile that Nagios will use to store its PID number
 # in when it is running in daemon mode.
 
-lock_file=/var/run/nagios/nagios.pid
+lock_file=/var/run/nagios.pid


Solution B, changing the systemd unit file:

--- nagios.service	2019-01-15 10:30:12.572302032 +0100
+++ nagios.service.fix_lock_path	2019-01-15 10:39:19.450194861 +0100
@@ -7,6 +7,8 @@
 Type=forking
 User=nagios
 Group=nagios
+RuntimeDirectory=nagios
+RuntimeDirectoryMode=0750
 PIDFile=/var/run/nagios/nagios.pid
 # Mimic older config file wants
 EnvironmentFile=-/etc/sysconfig/nagios

This creates the /run/nagios directory with the user and group permissions set, mode gets set to 0750 as it's created by the RPM.
However this does cause a new issue (!) upon removal of the package. Directory /run/nagios is now removed by systemd upon a stop of the service. Causing a `yum remove nagios` to spit out a warning:
   Erasing    : nagios-4.4.2-3.el7.x86_64
 warning: file /var/run/nagios: remove failed: No such file or directory


Of course pick which you prefer, either seem to work for me, or perhaps you'll think of another solution. 
I just hope this helps :)

Comment 2 Stefan Joosten 2019-01-15 09:57:11 UTC
Uh.. I think I just assumed solution A (editing the path of the lock_file) would work... But I tested it and of course it does not work because user nagios has no permission to write to /run :

nagios[6017]: Failed to obtain lock on file /var/run/nagios.pid: Permission denied

So disregard solution A.

Comment 3 Fedora Update System 2019-01-17 00:14:39 UTC
nagios-4.4.3-1.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-d661b588d2

Comment 4 Fedora Update System 2019-01-17 00:25:19 UTC
nagios-4.4.3-1.el6 has been submitted as an update to Fedora EPEL 6. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-17b388679b

Comment 5 Fedora Update System 2019-01-17 00:43:00 UTC
nagios-4.4.3-1.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-376ecc221c

Comment 6 Fedora Update System 2019-01-17 00:55:16 UTC
nagios-4.4.3-1.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2019-0b44528ff1

Comment 7 Fedora Update System 2019-01-18 01:00:24 UTC
nagios-4.4.3-1.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-d661b588d2

Comment 8 Fedora Update System 2019-01-18 01:31:47 UTC
nagios-4.4.3-1.el6 has been pushed to the Fedora EPEL 6 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-17b388679b

Comment 9 Fedora Update System 2019-01-18 03:04:54 UTC
nagios-4.4.3-1.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-0b44528ff1

Comment 10 Fedora Update System 2019-01-18 03:36:14 UTC
nagios-4.4.3-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-376ecc221c

Comment 11 Fedora Update System 2019-01-30 01:31:56 UTC
nagios-4.4.3-1.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 12 Fedora Update System 2019-01-30 02:06:37 UTC
nagios-4.4.3-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 13 Fedora Update System 2019-02-02 00:36:20 UTC
nagios-4.4.3-1.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.

Comment 14 Fedora Update System 2019-02-02 00:39:21 UTC
nagios-4.4.3-1.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report.

Comment 15 Stefan Joosten 2019-03-22 14:41:53 UTC
Thanks for working on this.
Unfortunately the problem still persists for me.

I'm on CentOS 7 using nagios-4.4.3-1.el7 from EPEL.
I extracted the RPM and had a look at file usr/lib/systemd/system/nagios.service
It does not seem to include my little patch of adding the two lines "RuntimeDirectory" and "RuntimeDirectoryMode".

I can still reproduce the error as originally reported. After a reboot Nagios fails to start on CentOS 7 for me unless I manually create the directory to place it's lock/PID file.

This is my current patch to usr/lib/systemd/system/nagios.service :

--- nagios.service	2019-03-22 15:38:48.066376767 +0100
+++ nagios.service.lock_file.patch	2019-03-22 15:38:37.921396470 +0100
@@ -10,6 +10,8 @@
 ExecStop=/usr/bin/kill -s TERM ${MAINPID}
 ExecStopPost=/usr/bin/rm -f /var/spool/nagios/cmd/nagios.cmd
 ExecReload=/usr/bin/kill -s HUP ${MAINPID}
+RuntimeDirectory=nagios
+RuntimeDirectoryMode=0750
 
 [Install]
 WantedBy=multi-user.target

After these are added the nagios service works as intended for me.
I changed the bug's status, hope that's OK.

Comment 16 Mike Surcouf 2019-07-18 09:48:01 UTC
In Centos 7

+RuntimeDirectory=nagios
+RuntimeDirectoryMode=0750

didnt work

I used this workaround which will survive package updates

mkdir -p /etc/systemd/system/nagios.service.d
cat > /etc/systemd/system/nagios.service.d/overides.conf << "EOF"
[Service]
ExecStartPre=/usr/bin/mkdir -p /var/run/nagios
ExecStartPre=/usr/bin/chown nagios /var/run/nagios
EOF
systemctl daemon-reload
systemctl restart nagios

BE nice if this was fixed in the service file though

Comment 17 Mike Surcouf 2019-07-18 09:49:13 UTC
BTW I didn't put the plus in (copied from commit)


Note You need to log in before you can comment on or make changes to this bug.