Red Hat Bugzilla – Bug 1255747
salt-minion won't start anymore after system crashes
Last modified: 2016-10-06 14:48:45 EDT
Description of problem:
After a power outage of our Ceph cluster none of the Calamari graphs for OSDs were displaying anymore. We figured out that the salt-minion daemon and therefor the diamond data collector process weren't able to start anymore on these systems because a left-over timestamp file (/var/cache/salt/minion/proc/20150820122333837170) was causing /usr/bin/salt-minion to crash at start-up. Deleting this file and restarting /usr/bin/salt-minion solved the issue. The file get regularly created and deleted but in the case it can't get deleted in time due to a system crash it prevents salt-minion to start.
Version-Release number of selected component (if applicable):
stop salt-minion, create file under /var/cache/salt/minion/proc/20150820122333837170, try to restart salt-minion.
Steps to Reproduce:
1. systemctl stop salt-minion.service
2. touch /var/cache/salt/minion/proc/20150820122333837170
3. systemctl start salt-minion.service
salt-minion.service start fails.
System startup should clear all left over status files / timestamps / sockets / etc. which are not required or cause startup of service like e.g. salt-minion to fail.
Gregory is this limited to power outage? What happens during a normal shutdown?
Seems to be limited to abnormal terminaton