Bug 818381 - re-add slaughtering of ExecStartPre leftover processes
re-add slaughtering of ExecStartPre leftover processes
Product: Fedora
Classification: Fedora
Component: systemd (Show other bugs)
Unspecified Unspecified
unspecified Severity low
: ---
: ---
Assigned To: systemd-maint
Fedora Extras Quality Assurance
Depends On: 816842
  Show dependency treegraph
Reported: 2012-05-02 18:20 EDT by Michal Schmidt
Modified: 2015-10-21 03:14 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 816842
Last Closed: 2015-10-21 03:14:13 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Michal Schmidt 2012-05-02 18:20:49 EDT
I disabled the slaughtering of ExecStartPre leftover processes to quickly fix
a probable F17 blocker, but we should add it back when we find a proper fix.

+++ This bug was initially created as a clone of Bug #816842 +++

--- Additional comment from mschmidt@redhat.com on 2012-05-01 12:49:18 EDT ---

I have an explanation for the long shutdown. It is a regression caused by commit ecedd90 "service: place control command in subcgroup control/" that I backported to systemd-44-6.fc17 as a fix for blocker bug 805942.

To reproduce it, in the system there must be a service with these properties:
 - It has at least one ExecStartPre command defined (to cause the "control/"
   subgroup to be created).
 - Its main process spawns at least one child process.

When stopping such a service, the following can happen:
1. systemd sends SIGTERM to all the processes of the service.
2. The main process exits first (it is a race).
3. systemd checks the status of the cgroup. It still sees some live processes.
4. The remaining processes exit.
5. At this point systemd expects to receive a notification from
   systemd-cgroups-agent. The notification never arrives though, because the
   cgroup is not really empty - the existing "control/" subdirectory (with no
   tasks in it) is enough to make it non-empty.

dbus.service is often the actual unit that triggers it. For testing a simpler unit can be used - shutdownproblem.service:

Description=shutdown problem

ExecStart=/bin/sh -c 'a(){ trap "sleep 3; exit 0" TERM; sleep 3600; }; a & sleep 3600'

--- Additional comment from mschmidt@redhat.com on 2012-05-02 09:04:49 EDT ---

I agree with this being a blocker.

I'll let Lennart come up with a proper fix.

In the meantime (for F17 GA) I'll revert commit ecedd90 "service: place control command in subcgroup control/" from F17.
To avoid bug 805942 I will also apply a revert of 8f53a7b "service: brutally slaughter processes that are running in the cgroup when we enter START_PRE and START".

--- Additional comment from updates@fedoraproject.org on 2012-05-02 18:08:19 EDT ---

systemd-44-8.fc17 has been submitted as an update for Fedora 17.
Comment 1 Lennart Poettering 2012-05-02 20:03:47 EDT
Hmm, so, normally a subgroup being around should not be enough to consider a group none-empty. If we do, this would be a bug...
Comment 2 Michal Schmidt 2012-05-03 12:22:51 EDT
(In reply to comment #1)
> If we do, this would be a bug...

Well, systemd doesn't, but the kernel does. Thus it will not run our release agent.

It can be checked with:

cd /sys/fs/cgroup
mkdir test
mount -t cgroup -o none,name=test none test
cd test
echo "/usr/bin/logger" > release_agent
mkdir service
mkdir service/control
echo "1" > service/notify_on_release
sleep 10 & echo $! > service/tasks
# ... now wait 10 s. Check /var/log/messages. Nothing new there.
rmdir service/control
# Now check the logs again. Find this entry:
# [...] logger: /service
Comment 3 Lennart Poettering 2012-05-03 12:52:31 EDT
hmm, but we should have gotten the event for the subgroup and then have checked up the tree?
Comment 4 Michal Schmidt 2012-05-04 06:52:20 EDT
I see you fixed this by "service: explicitly remove control/ subcgroup after each control command" (http://cgit.freedesktop.org/systemd/systemd/commit/?id=88f3e0c91f08c65a479e1aa09f171550b744d829)

The fix works fine for my testcase, where the ExecStartPre is well-behaved and does not fork off leftover processes. The "control/" subcgroup is removed as expected. Stopping the service works.

The fix is not sufficient for naughty services that start daemons from ExecStartPre. cg_kill_recursive() sends SIGKILL to the daemon and then it immediately tries to remove the subcgroup. There is no guarantee that the SIGKILL has already been delivered when rmdir() is called, so we may get EBUSY. When this happens, there's nothing else that would remove the subcgroup later.
Comment 5 Jóhann B. Guðmundsson 2013-06-15 11:51:08 EDT
Guys let's move this to rawhide or close this if this is still the case. 

Comment 6 Jan Kurik 2015-07-15 11:09:34 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
Comment 7 Jan Synacek 2015-10-20 09:41:19 EDT
@Michal: Is this still an issue or was this bugzilla forgotten?
Comment 8 Michal Schmidt 2015-10-20 09:57:12 EDT
It was forgotten. I didn't check if the issue still exists.
Comment 9 Jan Synacek 2015-10-21 03:14:13 EDT
I can't reproduce this on F22, even with running a daemon (sshd) in the ExecStartPre.

Note You need to log in before you can comment on or make changes to this bug.