818381 – re-add slaughtering of ExecStartPre leftover processes

Bug 818381 - re-add slaughtering of ExecStartPre leftover processes

Summary: re-add slaughtering of ExecStartPre leftover processes

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	systemd
Sub Component:
Version:	23
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Assignee:	systemd-maint
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:	816842
Blocks:
TreeView+	depends on / blocked

Reported:	2012-05-02 22:20 UTC by Michal Schmidt
Modified:	2015-10-21 07:14 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:	816842
Environment:
Last Closed:	2015-10-21 07:14:13 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Michal Schmidt 2012-05-02 22:20:49 UTC

I disabled the slaughtering of ExecStartPre leftover processes to quickly fix
a probable F17 blocker, but we should add it back when we find a proper fix.


+++ This bug was initially created as a clone of Bug #816842 +++

--- Additional comment from mschmidt on 2012-05-01 12:49:18 EDT ---

I have an explanation for the long shutdown. It is a regression caused by commit ecedd90 "service: place control command in subcgroup control/" that I backported to systemd-44-6.fc17 as a fix for blocker bug 805942.

To reproduce it, in the system there must be a service with these properties:
 - It has at least one ExecStartPre command defined (to cause the "control/"
   subgroup to be created).
 - Its main process spawns at least one child process.

When stopping such a service, the following can happen:
1. systemd sends SIGTERM to all the processes of the service.
2. The main process exits first (it is a race).
3. systemd checks the status of the cgroup. It still sees some live processes.
4. The remaining processes exit.
5. At this point systemd expects to receive a notification from
   systemd-cgroups-agent. The notification never arrives though, because the
   cgroup is not really empty - the existing "control/" subdirectory (with no
   tasks in it) is enough to make it non-empty.

dbus.service is often the actual unit that triggers it. For testing a simpler unit can be used - shutdownproblem.service:


[Unit]
Description=shutdown problem

[Service]
ExecStartPre=/bin/true
ExecStart=/bin/sh -c 'a(){ trap "sleep 3; exit 0" TERM; sleep 3600; }; a & sleep 3600'

--- Additional comment from mschmidt on 2012-05-02 09:04:49 EDT ---

I agree with this being a blocker.

I'll let Lennart come up with a proper fix.

In the meantime (for F17 GA) I'll revert commit ecedd90 "service: place control command in subcgroup control/" from F17.
To avoid bug 805942 I will also apply a revert of 8f53a7b "service: brutally slaughter processes that are running in the cgroup when we enter START_PRE and START".

--- Additional comment from updates on 2012-05-02 18:08:19 EDT ---

systemd-44-8.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/systemd-44-8.fc17

Comment 1 Lennart Poettering 2012-05-03 00:03:47 UTC

Hmm, so, normally a subgroup being around should not be enough to consider a group none-empty. If we do, this would be a bug...

Comment 2 Michal Schmidt 2012-05-03 16:22:51 UTC

(In reply to comment #1)
> If we do, this would be a bug...

Well, systemd doesn't, but the kernel does. Thus it will not run our release agent.

It can be checked with:

cd /sys/fs/cgroup
mkdir test
mount -t cgroup -o none,name=test none test
cd test
echo "/usr/bin/logger" > release_agent
mkdir service
mkdir service/control
echo "1" > service/notify_on_release
sleep 10 & echo $! > service/tasks
# ... now wait 10 s. Check /var/log/messages. Nothing new there.
rmdir service/control
# Now check the logs again. Find this entry:
# [...] logger: /service

Comment 3 Lennart Poettering 2012-05-03 16:52:31 UTC

hmm, but we should have gotten the event for the subgroup and then have checked up the tree?

Comment 4 Michal Schmidt 2012-05-04 10:52:20 UTC

I see you fixed this by "service: explicitly remove control/ subcgroup after each control command" (http://cgit.freedesktop.org/systemd/systemd/commit/?id=88f3e0c91f08c65a479e1aa09f171550b744d829)

The fix works fine for my testcase, where the ExecStartPre is well-behaved and does not fork off leftover processes. The "control/" subcgroup is removed as expected. Stopping the service works.

The fix is not sufficient for naughty services that start daemons from ExecStartPre. cg_kill_recursive() sends SIGKILL to the daemon and then it immediately tries to remove the subcgroup. There is no guarantee that the SIGKILL has already been delivered when rmdir() is called, so we may get EBUSY. When this happens, there's nothing else that would remove the subcgroup later.

Comment 5 Jóhann B. Guðmundsson 2013-06-15 15:51:08 UTC

Guys let's move this to rawhide or close this if this is still the case. 

Thanks.

Comment 6 Jan Kurik 2015-07-15 15:09:34 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23

Comment 7 Jan Synacek 2015-10-20 13:41:19 UTC

@Michal: Is this still an issue or was this bugzilla forgotten?

Comment 8 Michal Schmidt 2015-10-20 13:57:12 UTC

It was forgotten. I didn't check if the issue still exists.

Comment 9 Jan Synacek 2015-10-21 07:14:13 UTC

I can't reproduce this on F22, even with running a daemon (sshd) in the ExecStartPre.

Note You need to log in before you can comment on or make changes to this bug.