Bug 1666209

Summary: Nagios cannot start after system reboot because of missing directory
Product: [Fedora] Fedora EPEL Reporter: Stefan Joosten <stefan+redhatbugs>
Component: nagiosAssignee: Guido Aulisi <guido.aulisi>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: epel7CC: affix, athmanem, b.heden, herrold, jose.p.oliveira.oss, lemenkov, mike, redhat, shawn.starr, smooge, smooge, s, swilkerson
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: nagios-4.4.3-1.fc28 nagios-4.4.3-1.fc29 nagios-4.4.3-1.el6 nagios-4.4.3-1.el7 nagios-4.4.6-4.el8 nagios-4.4.6-4.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-22 00:31:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stefan Joosten 2019-01-15 08:28:49 UTC
Description of problem:

Nagios fails to start after a reboot (on a CentOS 7 system). This is caused by a missing directory in the path where Nagios wants to write it's lock file (/var/run/nagios/nagios.pid) at.

Directory {,/var}/run/nagios is created by the RPM after installing the package.  Ans the service runs fine at that moment. But /run is a tmpfs, clearing out all content after a reboot. Starting the service via `systemctl start nagios` does not create the directory causing Nagios to exit abnormally and require manual intervention to get going again.


Version-Release number of selected component (if applicable):
nagios-4.3.4-5.el7.x86_64 (stable)
nagios-4.4.2-3.el7.x86_64 (testing)

How reproducible:

I was able to reproduce this on two CentOS 7 systems. And have also tried the latest nagios package from testing by enabling epel-testing. You install the "nagios" package. Optionally start the service. Reboot the machine. Log back in and try to start the "nagios" service. It fails to start.

# yum install nagios
# ls -laZ /var/run/nagios
drwxr-x---. nagios nagios system_u:object_r:nagios_var_run_t:s0 .
drwxr-xr-x. root   root   system_u:object_r:var_run_t:s0   ..
# 


Steps to Reproduce:
1. Install Nagios:
 # yum install nagios
2. Verify existence of lock file directory: 
 # if [[ -x /run/nagios ]]; then echo "Directory /run/nagios exists"; else echo "Directory /run/nagios does not exist"; fi
3. Optional: start the Nagios service
 # systemctl start nagios
4. Reboot the machine
 # reboot
5. Try to start the Nagios service again (will fail)
 # systemctl start nagios
6. Inspect error message
7. Verify existence of lock file directory again and discover it is missing.

Actual results:
# systemctl start nagios
Job for nagios.service failed because the control process exited with error code. See "systemctl status nagios.service" and "journalctl -xe" for details.

# journalctl -xe --unit nagios
..
Jan 15 08:30:43 <hostname> nagios[5823]: Failed to obtain lock on file /var/run/nagios/nagios.pid: No such file or directory
..


Expected results:
# systemctl start nagios
should exit normally and the Nagios service should be running.

The /run/nagios directory should be created upon start of the service.
Or the Nagios package configuration file could be changed to place the lock file directly in /run instead of in a subdirectory of it.


Additional info:

Out of the box the /etc/nagios/nagios.cfg contains:
# grep lock_file /etc/nagios/nagios.cfg 
lock_file=/var/run/nagios/nagios.pid

Comment 1 Stefan Joosten 2019-01-15 09:45:33 UTC
Two ways of fixing I came up with are:
A. Change nagios.cfg to go directly to /var/run by taking out the `nagios` subdirectory of the `lock_file` option
B. Have the systemd.service create the `nagios` runtime directory. This might be easier if this problem does not occur on EL6 for example (perhaps it's init script already takes care of this, I haven't checked).

Solution A, changing nagios.cfg:

--- nagios.cfg	2019-01-15 10:14:06.346940829 +0100
+++ nagios.cfg.fix_lock_path	2019-01-15 10:14:16.884977509 +0100
@@ -166,7 +166,7 @@
 # This is the lockfile that Nagios will use to store its PID number
 # in when it is running in daemon mode.
 
-lock_file=/var/run/nagios/nagios.pid
+lock_file=/var/run/nagios.pid


Solution B, changing the systemd unit file:

--- nagios.service	2019-01-15 10:30:12.572302032 +0100
+++ nagios.service.fix_lock_path	2019-01-15 10:39:19.450194861 +0100
@@ -7,6 +7,8 @@
 Type=forking
 User=nagios
 Group=nagios
+RuntimeDirectory=nagios
+RuntimeDirectoryMode=0750
 PIDFile=/var/run/nagios/nagios.pid
 # Mimic older config file wants
 EnvironmentFile=-/etc/sysconfig/nagios

This creates the /run/nagios directory with the user and group permissions set, mode gets set to 0750 as it's created by the RPM.
However this does cause a new issue (!) upon removal of the package. Directory /run/nagios is now removed by systemd upon a stop of the service. Causing a `yum remove nagios` to spit out a warning:
   Erasing    : nagios-4.4.2-3.el7.x86_64
 warning: file /var/run/nagios: remove failed: No such file or directory


Of course pick which you prefer, either seem to work for me, or perhaps you'll think of another solution. 
I just hope this helps :)

Comment 2 Stefan Joosten 2019-01-15 09:57:11 UTC
Uh.. I think I just assumed solution A (editing the path of the lock_file) would work... But I tested it and of course it does not work because user nagios has no permission to write to /run :

nagios[6017]: Failed to obtain lock on file /var/run/nagios.pid: Permission denied

So disregard solution A.

Comment 3 Fedora Update System 2019-01-17 00:14:39 UTC
nagios-4.4.3-1.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-d661b588d2

Comment 4 Fedora Update System 2019-01-17 00:25:19 UTC
nagios-4.4.3-1.el6 has been submitted as an update to Fedora EPEL 6. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-17b388679b

Comment 5 Fedora Update System 2019-01-17 00:43:00 UTC
nagios-4.4.3-1.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-376ecc221c

Comment 6 Fedora Update System 2019-01-17 00:55:16 UTC
nagios-4.4.3-1.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2019-0b44528ff1

Comment 7 Fedora Update System 2019-01-18 01:00:24 UTC
nagios-4.4.3-1.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-d661b588d2

Comment 8 Fedora Update System 2019-01-18 01:31:47 UTC
nagios-4.4.3-1.el6 has been pushed to the Fedora EPEL 6 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-17b388679b

Comment 9 Fedora Update System 2019-01-18 03:04:54 UTC
nagios-4.4.3-1.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-0b44528ff1

Comment 10 Fedora Update System 2019-01-18 03:36:14 UTC
nagios-4.4.3-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-376ecc221c

Comment 11 Fedora Update System 2019-01-30 01:31:56 UTC
nagios-4.4.3-1.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 12 Fedora Update System 2019-01-30 02:06:37 UTC
nagios-4.4.3-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 13 Fedora Update System 2019-02-02 00:36:20 UTC
nagios-4.4.3-1.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.

Comment 14 Fedora Update System 2019-02-02 00:39:21 UTC
nagios-4.4.3-1.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report.

Comment 15 Stefan Joosten 2019-03-22 14:41:53 UTC
Thanks for working on this.
Unfortunately the problem still persists for me.

I'm on CentOS 7 using nagios-4.4.3-1.el7 from EPEL.
I extracted the RPM and had a look at file usr/lib/systemd/system/nagios.service
It does not seem to include my little patch of adding the two lines "RuntimeDirectory" and "RuntimeDirectoryMode".

I can still reproduce the error as originally reported. After a reboot Nagios fails to start on CentOS 7 for me unless I manually create the directory to place it's lock/PID file.

This is my current patch to usr/lib/systemd/system/nagios.service :

--- nagios.service	2019-03-22 15:38:48.066376767 +0100
+++ nagios.service.lock_file.patch	2019-03-22 15:38:37.921396470 +0100
@@ -10,6 +10,8 @@
 ExecStop=/usr/bin/kill -s TERM ${MAINPID}
 ExecStopPost=/usr/bin/rm -f /var/spool/nagios/cmd/nagios.cmd
 ExecReload=/usr/bin/kill -s HUP ${MAINPID}
+RuntimeDirectory=nagios
+RuntimeDirectoryMode=0750
 
 [Install]
 WantedBy=multi-user.target

After these are added the nagios service works as intended for me.
I changed the bug's status, hope that's OK.

Comment 16 Mike Surcouf 2019-07-18 09:48:01 UTC
In Centos 7

+RuntimeDirectory=nagios
+RuntimeDirectoryMode=0750

didnt work

I used this workaround which will survive package updates

mkdir -p /etc/systemd/system/nagios.service.d
cat > /etc/systemd/system/nagios.service.d/overides.conf << "EOF"
[Service]
ExecStartPre=/usr/bin/mkdir -p /var/run/nagios
ExecStartPre=/usr/bin/chown nagios /var/run/nagios
EOF
systemctl daemon-reload
systemctl restart nagios

BE nice if this was fixed in the service file though

Comment 17 Mike Surcouf 2019-07-18 09:49:13 UTC
BTW I didn't put the plus in (copied from commit)

Comment 18 Fedora Admin user for bugzilla script actions 2020-08-18 14:57:36 UTC
This package has changed maintainer in the Fedora.
Reassigning to the new maintainer of this component.

Comment 19 Fedora Admin user for bugzilla script actions 2021-02-20 00:05:28 UTC
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.

Comment 20 Fedora Update System 2021-03-07 11:14:17 UTC
FEDORA-EPEL-2021-e9c2beec98 has been submitted as an update to Fedora EPEL 8. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-e9c2beec98

Comment 21 Fedora Update System 2021-03-07 12:07:53 UTC
FEDORA-EPEL-2021-04cc5bcb08 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-04cc5bcb08

Comment 22 Fedora Update System 2021-03-07 15:26:37 UTC
FEDORA-EPEL-2021-04cc5bcb08 has been pushed to the Fedora EPEL 7 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-04cc5bcb08

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 23 Fedora Update System 2021-03-07 15:27:26 UTC
FEDORA-EPEL-2021-e9c2beec98 has been pushed to the Fedora EPEL 8 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-e9c2beec98

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 24 Fedora Update System 2021-03-22 00:31:13 UTC
FEDORA-EPEL-2021-e9c2beec98 has been pushed to the Fedora EPEL 8 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 25 Fedora Update System 2021-03-22 00:37:18 UTC
FEDORA-EPEL-2021-04cc5bcb08 has been pushed to the Fedora EPEL 7 stable repository.
If problem still persists, please make note of it in this bug report.