Description of problem: When systemd uses LSB header service definition (e.g. sshd.service) and main process is killed, systemctl still reports, that service is "running" (and it cannot be start with systemctl start ...). Version-Release number of selected component (if applicable): rpm -q systemd systemd-26-1.fc15.x86_64 How reproducible: every-time Steps to Reproduce: 1. systemctl start sshd.service 2. systemctl status sshd.service (it is running with e.g. Main PID: 1322) 3. kill -9 1322 4. systemctl status sshd.service (it still reports that service is "running", but main PID is described as killed) 5. systemctl start sshd.service (nothing changes) Actual results: $ systemctl status sshd.service sshd.service - LSB: Start up the OpenSSH server daemon Loaded: loaded (/etc/rc.d/init.d/sshd) Active: active (exited) since Thu, 05 May 2011 13:10:53 +0200; 24h ago Process: 1274 ExecStart=/etc/rc.d/init.d/sshd start (code=exited, status=0/SUCCESS) Main PID: 1322 (code=killed, signal=KILL) CGroup: name=systemd:/system/sshd.service $ systemctl start sshd.service $ systemctl status sshd.service sshd.service - LSB: Start up the OpenSSH server daemon Loaded: loaded (/etc/rc.d/init.d/sshd) Active: active (exited) since Thu, 05 May 2011 13:10:53 +0200; 24h ago Process: 1274 ExecStart=/etc/rc.d/init.d/sshd start (code=exited, status=0/SUCCESS) Main PID: 1322 (code=killed, signal=KILL) CGroup: name=systemd:/system/sshd.service Expected results: $ systemctl status sshd.service sshd.service - LSB: Start up the OpenSSH server daemon Loaded: loaded (/etc/rc.d/init.d/sshd) Active: failed since Fri, 06 May 2011 13:39:12 +0200; 6s ago Process: 1274 ExecStart=/etc/rc.d/init.d/sshd start (code=exited, status=0/SUCCESS) CGroup: name=systemd:/system/sshd.service $ systemctl start sshd.service $ systemctl status sshd.service sshd.service - LSB: Start up the OpenSSH server daemon Loaded: loaded (/etc/rc.d/init.d/sshd) Active: active (running) since Fri, 06 May 2011 13:53:07 +0200; 2s ago Process: 30650 ExecStart=/etc/rc.d/init.d/sshd start (code=exited, status=0/SUCCESS) Main PID: 30657 (sshd) CGroup: name=systemd:/system/sshd.service └ 30657 /usr/sbin/sshd Additional info: I know there is a problem of tracking services which double fork, but if I know PID, I can always check if it is running, so there shouldn't be "running" if it's not true. This works correctly if native systemd service file is loaded.
I've found bug #629040, which made it more clear for me and I've realized that status "running" in comment #0 should be replaced by "active (exited)" state. Sorry for that confusion. Nevertheless, as I've understood, there is a problem only with services using more than one main process. SSH uses one main process and it is recognized by systemd (Main PID is found correctly), so is it still necessary to report "active (exited)" in this case?
Note that because systemd is not LSB compliant in Fedora 15 and rawhide (F16), rgmanager, pacemaker, pacemaker-cloud (f16 feature) are DOA in these distributions.
I can confirm the original behavior also for cman. This is effectively a regression from previous init system and affects all services that have not converted yet to native systemd.
Upstream patch: http://cgit.freedesktop.org/systemd/commit/?id=f8788303929c27d0b7f7e4b8ffe22767a3d0ff67 It improves the detection of the type services whose SysV initscripts contain the 'pidfile:' header. With this change: [root@f15 ~]# systemctl status sshd.service sshd.service - LSB: Start up the OpenSSH server daemon Loaded: loaded (/etc/rc.d/init.d/sshd) Active: active (running) since Tue, 05 Jul 2011 14:58:26 +0200; 14s ago Process: 1253 ExecStop=/etc/rc.d/init.d/sshd stop (code=exited, status=0/SUCCESS) Process: 1284 ExecStart=/etc/rc.d/init.d/sshd start (code=exited, status=0/SUCCESS) Main PID: 1291 (sshd) CGroup: name=systemd:/system/sshd.service └ 1291 /usr/sbin/sshd [root@f15 ~]# kill -9 1291 # systemctl status sshd.service sshd.service - LSB: Start up the OpenSSH server daemon Loaded: loaded (/etc/rc.d/init.d/sshd) Active: failed since Tue, 05 Jul 2011 14:59:07 +0200; 15s ago Process: 1349 ExecStop=/etc/rc.d/init.d/sshd stop (code=exited, status=0/SUCCESS) Process: 1284 ExecStart=/etc/rc.d/init.d/sshd start (code=exited, status=0/SUCCESS) Main PID: 1291 (code=killed, signal=KILL) CGroup: name=systemd:/system/sshd.service [root@f15 ~]# echo $? 3
Michal, I owe you one - thanks for the rapid response! I have done a scratch build against f15 and verify the system is operating as expected. Thanks! -steve
systemd-26-7.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/systemd-26-7.fc15
Package systemd-26-7.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing systemd-26-7.fc15' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/systemd-26-7.fc15 then log in and leave karma (feedback).
The patch caused a regression by exposing a latent bug that needs to be fixed first. systemd-26-8.fc15 drops the patch.
Michal, Is it possible to fix the latent issue mentioned in comment #8 for F16? Freeze is fast approaching and our high availability feature set is broken in F15 and rawhide.
The latent issues are the two bugs that this BZ "Depends on". Neither of them should be a problem in Rawhide, they are F15 only. The patch for systemd is already included in systemd-30.fc16. Are you sure this bug is present in the current Rawhide?
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
Is this still a problem or can this bug be closed?