I disabled the slaughtering of ExecStartPre leftover processes to quickly fix a probable F17 blocker, but we should add it back when we find a proper fix. +++ This bug was initially created as a clone of Bug #816842 +++ --- Additional comment from mschmidt on 2012-05-01 12:49:18 EDT --- I have an explanation for the long shutdown. It is a regression caused by commit ecedd90 "service: place control command in subcgroup control/" that I backported to systemd-44-6.fc17 as a fix for blocker bug 805942. To reproduce it, in the system there must be a service with these properties: - It has at least one ExecStartPre command defined (to cause the "control/" subgroup to be created). - Its main process spawns at least one child process. When stopping such a service, the following can happen: 1. systemd sends SIGTERM to all the processes of the service. 2. The main process exits first (it is a race). 3. systemd checks the status of the cgroup. It still sees some live processes. 4. The remaining processes exit. 5. At this point systemd expects to receive a notification from systemd-cgroups-agent. The notification never arrives though, because the cgroup is not really empty - the existing "control/" subdirectory (with no tasks in it) is enough to make it non-empty. dbus.service is often the actual unit that triggers it. For testing a simpler unit can be used - shutdownproblem.service: [Unit] Description=shutdown problem [Service] ExecStartPre=/bin/true ExecStart=/bin/sh -c 'a(){ trap "sleep 3; exit 0" TERM; sleep 3600; }; a & sleep 3600' --- Additional comment from mschmidt on 2012-05-02 09:04:49 EDT --- I agree with this being a blocker. I'll let Lennart come up with a proper fix. In the meantime (for F17 GA) I'll revert commit ecedd90 "service: place control command in subcgroup control/" from F17. To avoid bug 805942 I will also apply a revert of 8f53a7b "service: brutally slaughter processes that are running in the cgroup when we enter START_PRE and START". --- Additional comment from updates on 2012-05-02 18:08:19 EDT --- systemd-44-8.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/systemd-44-8.fc17
Hmm, so, normally a subgroup being around should not be enough to consider a group none-empty. If we do, this would be a bug...
(In reply to comment #1) > If we do, this would be a bug... Well, systemd doesn't, but the kernel does. Thus it will not run our release agent. It can be checked with: cd /sys/fs/cgroup mkdir test mount -t cgroup -o none,name=test none test cd test echo "/usr/bin/logger" > release_agent mkdir service mkdir service/control echo "1" > service/notify_on_release sleep 10 & echo $! > service/tasks # ... now wait 10 s. Check /var/log/messages. Nothing new there. rmdir service/control # Now check the logs again. Find this entry: # [...] logger: /service
hmm, but we should have gotten the event for the subgroup and then have checked up the tree?
I see you fixed this by "service: explicitly remove control/ subcgroup after each control command" (http://cgit.freedesktop.org/systemd/systemd/commit/?id=88f3e0c91f08c65a479e1aa09f171550b744d829) The fix works fine for my testcase, where the ExecStartPre is well-behaved and does not fork off leftover processes. The "control/" subcgroup is removed as expected. Stopping the service works. The fix is not sufficient for naughty services that start daemons from ExecStartPre. cg_kill_recursive() sends SIGKILL to the daemon and then it immediately tries to remove the subcgroup. There is no guarantee that the SIGKILL has already been delivered when rmdir() is called, so we may get EBUSY. When this happens, there's nothing else that would remove the subcgroup later.
Guys let's move this to rawhide or close this if this is still the case. Thanks.
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle. Changing version to '23'. (As we did not run this process for some time, it could affect also pre-Fedora 23 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23
@Michal: Is this still an issue or was this bugzilla forgotten?
It was forgotten. I didn't check if the issue still exists.
I can't reproduce this on F22, even with running a daemon (sshd) in the ExecStartPre.