RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1766477 - Systemd does not attempt to kill child processes if ExecStopPost is set
Summary: Systemd does not attempt to kill child processes if ExecStopPost is set
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemd
Version: 7.7
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: ---
Assignee: David Tardon
QA Contact: Frantisek Sumsal
URL:
Whiteboard:
Depends On: 1761260 1766479
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-29 08:22 UTC by Vojtech Juranek
Modified: 2020-09-29 20:33 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1761260
Environment:
Last Closed: 2020-09-29 20:32:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github systemd systemd issues 14566 0 None closed KillMode=mixed can leave left over processes when used with ExecStopPost 2020-05-21 11:24:44 UTC
Red Hat Product Errata RHSA-2020:4007 0 None None None 2020-09-29 20:33:11 UTC

Description Vojtech Juranek 2019-10-29 08:22:56 UTC
+++ This bug was initially created as a clone of Bug #1761260 +++

Description of problem:

When using KillMode=mixed, and ExecStopPost is set, systemd does not deliver
the expected SIGKILL to child processes, leaving child processes running after
the service is stopped.

Version-Release number of selected component (if applicable):
systemd-239-14.git33ccd62.fc29.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Install a service with KillMode=mixed and ExecStopPost set
2. Run a service starting child processes
3. Stop the service

Actual results:
systemd does not send SIGTERM or SIGKILL to child process.
Child process started by the service is running after the service is stopped.

Expected results:
Systemd sends SIGKILL to child process.
Child process started by the service is not running after service is stopped.

Here are example scripts and service file reproducing this issue.

## Service script

$ cat /usr/local/bin/kill-children-test
#!/usr/bin/python3 -u

import signal
import subprocess
import time

def terminate(signo, frame):
    print("Received termination signal: {}".format(signo))
    raise SystemExit

signal.signal(signal.SIGTERM, terminate)

try:
    p = subprocess.Popen(["/usr/local/bin/non-terminating-child"])
    print("started child process pid={}".format(p.pid))
    p.wait()
    print("child terminated rc={}".format(p.returncode))
except SystemExit:
    pass

print("Terminating")
time.sleep(2)


## Child process script

This script intentionally ignores SIGTERM, for testing that systemd
send a SIGKILL.

$ cat /usr/local/bin/non-terminating-child
#!/usr/bin/python3 -u

import signal
import time

def terminate(signo, frame):
    print("Child ignoring signal {}".format(signo))

signal.signal(signal.SIGTERM, terminate)

while 1:
    print("Child waiting")
    time.sleep(300)
    print("Child woke up")


## Service file

The service simulate pre/post scripts that take some time to run.

$ cat /etc/systemd/system/kill-children-test.service
[Unit]
Description=Testing killMode=mixed with ExecStopPost

[Service]
Type=simple
ExecStartPre=/usr/bin/true
ExecStart=/usr/local/bin/kill-children-test
ExecStopPost=/usr/bin/true
Restart=always
TimeoutStopSec=10
KillMode=mixed


## Reproduce leftover process

$ sudo systemctl daemon-reload

$ sudo systemctl start kill-children-test.service; sleep 5; sudo systemctl stop kill-children-test

$ ps -eF | grep non-terminating-child | grep -v grep
root      4931     1  0 57994  9024   5 22:35 ?        00:00:00 /usr/bin/python3 -u /usr/local/bin/non-terminating-child

$ sudo kill -9 4931

$ journalctl -f -u kill-children-test -n0
-- Logs begin at Sat 2019-09-28 20:04:15 IDT. --
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: Trying to enqueue job kill-children-test.service/start/replace
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: Installed new job kill-children-test.service/start as 25707
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: Enqueued job kill-children-test.service/start as 25707
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: About to execute: /usr/bin/true
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: Forked /usr/bin/true as 4928
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: Changed dead -> start-pre
Oct 13 22:35:58 sparse-local systemd[1]: Starting Testing killMode=mixed with ExecStopPost...
Oct 13 22:35:58 sparse-local systemd[4928]: kill-children-test.service: Executing: /usr/bin/true
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: Child 4928 belongs to kill-children-test.service.
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: Control process exited, code=exited status=0
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: Got final SIGCHLD for state start-pre.
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: Passing 0 fds to service
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: About to execute: /usr/local/bin/kill-children-test
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: Forked /usr/local/bin/kill-children-test as 4929
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: Changed start-pre -> running
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: Job kill-children-test.service/start finished, result=done
Oct 13 22:35:58 sparse-local systemd[1]: Started Testing killMode=mixed with ExecStopPost.
Oct 13 22:35:58 sparse-local systemd[1]: kill-children-test.service: Failed to send unit change signal for kill-children-test.service: Connection reset by peer
Oct 13 22:35:58 sparse-local systemd[4929]: kill-children-test.service: Executing: /usr/local/bin/kill-children-test
Oct 13 22:35:58 sparse-local kill-children-test[4929]: started child process pid=4931

    Child started here...

Oct 13 22:35:58 sparse-local kill-children-test[4929]: Child waiting
Oct 13 22:36:03 sparse-local systemd[1]: kill-children-test.service: Trying to enqueue job kill-children-test.service/stop/replace
Oct 13 22:36:03 sparse-local systemd[1]: kill-children-test.service: Installed new job kill-children-test.service/stop as 25814
Oct 13 22:36:03 sparse-local systemd[1]: kill-children-test.service: Enqueued job kill-children-test.service/stop as 25814
Oct 13 22:36:03 sparse-local kill-children-test[4929]: Received termination signal: 15

    Main process received SIGTERM

Oct 13 22:36:03 sparse-local kill-children-test[4929]: Terminating
Oct 13 22:36:03 sparse-local systemd[1]: kill-children-test.service: Changed running -> stop-sigterm
Oct 13 22:36:03 sparse-local systemd[1]: Stopping Testing killMode=mixed with ExecStopPost...
Oct 13 22:36:05 sparse-local systemd[1]: kill-children-test.service: Child 4929 belongs to kill-children-test.service.
Oct 13 22:36:05 sparse-local systemd[1]: kill-children-test.service: Main process exited, code=exited, status=0/SUCCESS

    Main process exited, system running ExecStopPost, ignoring child process

Oct 13 22:36:05 sparse-local systemd[1]: kill-children-test.service: About to execute: /usr/bin/true
Oct 13 22:36:05 sparse-local systemd[1]: kill-children-test.service: Forked /usr/bin/true as 4939
Oct 13 22:36:05 sparse-local systemd[1]: kill-children-test.service: Changed stop-sigterm -> stop-post
Oct 13 22:36:05 sparse-local systemd[4939]: kill-children-test.service: Executing: /usr/bin/true
Oct 13 22:36:05 sparse-local systemd[1]: kill-children-test.service: Child 4939 belongs to kill-children-test.service.
Oct 13 22:36:05 sparse-local systemd[1]: kill-children-test.service: Control process exited, code=exited status=0
Oct 13 22:36:05 sparse-local systemd[1]: kill-children-test.service: Got final SIGCHLD for state stop-post.
Oct 13 22:36:05 sparse-local systemd[1]: kill-children-test.service: Changed stop-post -> dead
Oct 13 22:36:05 sparse-local systemd[1]: kill-children-test.service: Failed to destroy cgroup /system.slice/kill-children-test.service, ignoring: Device or resource busy

    Systemd fail fail to destroy the cgroup because it left the child
    running.

Oct 13 22:36:05 sparse-local systemd[1]: kill-children-test.service: Job kill-children-test.service/stop finished, result=done
Oct 13 22:36:05 sparse-local systemd[1]: Stopped Testing killMode=mixed with ExecStopPost.

    Service is reported as stopped, but it has running child processes

Oct 13 22:38:45 sparse-local systemd[1]: kill-children-test.service: Child 4931 belongs to kill-children-test.service.

    Child was killed externally

Oct 13 22:38:45 sparse-local systemd[1]: kill-children-test.service: cgroup is empty
Oct 13 22:38:45 sparse-local systemd[1]: kill-children-test.service: Collecting.


## Same flow without setting ExecStopPost

(Removed the ExecStopPost from the service file)

$ journalctl -f -u kill-children-test -n0
-- Logs begin at Sat 2019-09-28 20:04:15 IDT. --
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: Trying to enqueue job kill-children-test.service/start/replace
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: Installed new job kill-children-test.service/start as 26137
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: Enqueued job kill-children-test.service/start as 26137
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: About to execute: /usr/bin/true
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: Forked /usr/bin/true as 5332
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: Changed dead -> start-pre
Oct 13 22:44:38 sparse-local systemd[1]: Starting Testing killMode=mixed with ExecStopPost...
Oct 13 22:44:38 sparse-local systemd[5332]: kill-children-test.service: Executing: /usr/bin/true
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: Child 5332 belongs to kill-children-test.service.
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: Control process exited, code=exited status=0
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: Got final SIGCHLD for state start-pre.
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: Passing 0 fds to service
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: About to execute: /usr/local/bin/kill-children-test
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: Forked /usr/local/bin/kill-children-test as 5333
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: Changed start-pre -> running
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: Job kill-children-test.service/start finished, result=done
Oct 13 22:44:38 sparse-local systemd[1]: Started Testing killMode=mixed with ExecStopPost.
Oct 13 22:44:38 sparse-local systemd[1]: kill-children-test.service: Failed to send unit change signal for kill-children-test.service: Connection reset by peer
Oct 13 22:44:38 sparse-local systemd[5333]: kill-children-test.service: Executing: /usr/local/bin/kill-children-test
Oct 13 22:44:38 sparse-local kill-children-test[5333]: started child process pid=5335
Oct 13 22:44:38 sparse-local kill-children-test[5333]: Child waiting
Oct 13 22:44:43 sparse-local kill-children-test[5333]: Received termination signal: 15

    Main process received SIGTERM

Oct 13 22:44:43 sparse-local kill-children-test[5333]: Terminating
Oct 13 22:44:43 sparse-local systemd[1]: kill-children-test.service: Trying to enqueue job kill-children-test.service/stop/replace
Oct 13 22:44:43 sparse-local systemd[1]: kill-children-test.service: Installed new job kill-children-test.service/stop as 26244
Oct 13 22:44:43 sparse-local systemd[1]: kill-children-test.service: Enqueued job kill-children-test.service/stop as 26244
Oct 13 22:44:43 sparse-local systemd[1]: kill-children-test.service: Changed running -> stop-sigterm
Oct 13 22:44:43 sparse-local systemd[1]: Stopping Testing killMode=mixed with ExecStopPost...
Oct 13 22:44:45 sparse-local systemd[1]: kill-children-test.service: Child 5333 belongs to kill-children-test.service.
Oct 13 22:44:45 sparse-local systemd[1]: kill-children-test.service: Main process exited, code=exited, status=0/SUCCESS

    Main process terminated

Oct 13 22:44:45 sparse-local systemd[1]: kill-children-test.service: Killing process 5335 (non-terminating) with signal SIGKILL.
Oct 13 22:44:45 sparse-local systemd[1]: kill-children-test.service: Changed stop-sigterm -> final-sigkill
Oct 13 22:44:45 sparse-local systemd[1]: kill-children-test.service: Child 5335 belongs to kill-children-test.service.

    Child process was terminated by SIGKILL

Oct 13 22:44:45 sparse-local systemd[1]: kill-children-test.service: cgroup is empty
Oct 13 22:44:45 sparse-local systemd[1]: kill-children-test.service: Changed final-sigkill -> dead
Oct 13 22:44:45 sparse-local systemd[1]: kill-children-test.service: Job kill-children-test.service/stop finished, result=done
Oct 13 22:44:45 sparse-local systemd[1]: Stopped Testing killMode=mixed with ExecStopPost.
Oct 13 22:44:45 sparse-local systemd[1]: kill-children-test.service: Collecting.
Oct 13 22:44:45 sparse-local systemd[1]: kill-children-test.service: Collecting.
Oct 13 22:44:45 sparse-local systemd[1]: kill-children-test.service: Collecting.


I think the expected behavior would be:

1. Stop main process (SIGTERM + SIGKILL if needed)
2. Stop child processes with SIGKILL
3. Run execStopPost if set

If systemd does not handle this, the only reliable way to ensure that
your child processes are killed is using prctl(). This is not easy to
do in many cases (e.g. python), and hard to get right, since the all
processes started by a service, including child processes started the
child processes need to use prctl() after forking the child, before
execve the child.

Comment 2 David Tardon 2020-03-11 13:12:28 UTC
PR: https://github.com/systemd-rhel/rhel-7/pull/88

Comment 4 Lukáš Nykrýn 2020-04-27 13:15:23 UTC
fix merged to github master branch -> https://github.com/systemd-rhel/rhel-7/pull/88

Comment 11 errata-xmlrpc 2020-09-29 20:32:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: systemd security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4007


Note You need to log in before you can comment on or make changes to this bug.