Bug 952634 - RFE: We should have a way for KillSignal to apply only to the main process but send SIGKILL to other processes
Summary: RFE: We should have a way for KillSignal to apply only to the main process bu...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: systemd-RFE
TreeView+ depends on / blocked
 
Reported: 2013-04-16 10:49 UTC by Jan Kaluža
Modified: 2015-02-16 19:33 UTC (History)
15 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-01-29 04:45:57 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jan Kaluža 2013-04-16 10:49:14 UTC
Description of problem:
httpd.service sends SIGWINCH to $MAINPID in ExecStop. This means that httpd should shutdown gracefully [1]. However, systemd sends SIGTERM *right* after ExecStop execution without any configurable timeout. Httpd does not have any time to shutdown gracefully or to do some cleanup and receives SIGTERM. I think there should be some timeout to set delay between these two actions.

The true is that httpd should ignore SIGTERM when SIGWINCH is handled. I'm working with upstream on fix for this, but I think it's still not ideal behaviour.


[1] http://httpd.apache.org/docs/2.2/stopping.html#gracefulstop

Comment 1 Michal Schmidt 2013-04-16 10:52:02 UTC
The ExecStop action ought to be synchronous. I.e. the service should be already stopped by the time ExecStop finishes.

Comment 2 Joe Orton 2013-04-16 11:05:58 UTC
This is what systemd.service(5) says:

TimeoutStopSec=
           Configures the time to wait for stop. If a service is asked to stop but does not terminate in the specified
           time, it will be terminated forcibly via SIGTERM, and after another delay of this time with SIGKILL (See

Is the man page wrong?  It says the SIGTERM is sent *after* a timeout.

Comment 3 Michal Schmidt 2013-04-19 15:32:24 UTC
(In reply to comment #0)
> Description of problem:
> httpd.service sends SIGWINCH to $MAINPID in ExecStop.

Would it solve your problem if instead of defining an ExecStop action you'd set "KillSignal=SIGWINCH" ? See systemd.kill(5).

(In reply to comment #2)
> Is the man page wrong?  It says the SIGTERM is sent *after* a timeout.

Not wrong, but apparently unclear. The timeout limits the time the ExecStop action is allowed to run.

There are two actual systemd issues:
1) The start timeout is applied by mistake in one place where the stop timeout
   should be used.
2) The documentation should be improved. Perhaps a man page describing
   the service state machine would help answer questions like this.

[ Reopening to resolve 1) ].

Comment 4 Jan Kaluža 2013-04-22 06:23:06 UTC
(In reply to comment #3)
> Would it solve your problem if instead of defining an ExecStop action you'd
> set "KillSignal=SIGWINCH" ? See systemd.kill(5).

No, because this would send SIGWINCH to all processes in control-group. I need to send it just to main process but in the same time, I want SendSIGKILL to be sent to all processes (and not just to main process), otherwise children processes would remain running after systemctl stop if something went wrong during SIGWINCH handling.

So far I'm using this to "fix" it:

ExecStop=/usr/sbin/httpd $OPTIONS -k graceful-stop
# We want systemd to give httpd some time to finish gracefully, but still want
# it to kill httpd after TimeoutStopSec if something went wrong during the
# grafecul stop. Normally, Systemd sends SIGTERM signal right after the
# ExecStop, which would kill httpd. We are sending useless SIGCONT here to give
# httpd time to finish.
KillSignal=SIGCONT

Comment 5 Joe Orton 2013-04-22 07:47:05 UTC
I must say it seems odd to expect ExecStop to be synchronous.  Most daemon packages will surely implement this by sending some async signal to the parent.  What are we supposed to do instead?  wait() for the parent to die?  

Could we provide a shell script which does

#!/bin/sh
kill $1
wait $1

and then use that in ExecStop to work around this?

Comment 6 Michal Schmidt 2013-04-22 08:18:13 UTC
(In reply to comment #4)
Thanks for the explanation. Let's see if we can come up with nicer support for this usecase in systemd upstream.

(In reply to comment #5)
"wait" won't work on a PID that's not a child of the shell.

Comment 7 Lennart Poettering 2013-05-06 16:39:59 UTC
(In reply to comment #5)
> I must say it seems odd to expect ExecStop to be synchronous.  Most daemon
> packages will surely implement this by sending some async signal to the
> parent.  What are we supposed to do instead?  wait() for the parent to die?  

Well, what you do in ExecStop= already had to be synchronous in old sysv, because "service foobar stop ; service foobar start" wouldn't work correctly otherwise. 

> Could we provide a shell script which does
> 
> #!/bin/sh
> kill $1
> wait $1
> 
> and then use that in ExecStop to work around this?

start-stop-daemon can do that, which is why people used that on SysV.

Comment 8 Lennart Poettering 2013-05-06 16:45:14 UTC
(In reply to comment #4)

> No, because this would send SIGWINCH to all processes in control-group. I
> need to send it just to main process but in the same time, I want
> SendSIGKILL to be sent to all processes (and not just to main process),
> otherwise children processes would remain running after systemctl stop if
> something went wrong during SIGWINCH handling.

This indeed sounds like a good usecase and we should provide something for this in systemd.

Maybe SendSIGKILL should also accept "control-group" and "process" as arguments.

Hence:

SendSIGKILL=no               # don't send SIGKILL
SendSIGKILL=yes              # send SIGKILL to the same as set with KillMode=
SendSIGKILL=control-group    # send SIGKILL to the entire control group
SendSIGKILL=process          # Send SIGKILL to the main process only

I think this would be a pretty natural extension of the current logic.

Comment 9 Jan Kaluža 2013-05-07 08:11:42 UTC
Thanks, this would fix our problem.

Comment 10 Colin Guthrie 2013-06-19 20:45:55 UTC
Just adding myself to CC, but as I don't see any back reference to the bug that caused this one to be opened, here it is: https://bugzilla.redhat.com/show_bug.cgi?id=912288

Comment 11 Lennart Poettering 2014-02-12 11:45:00 UTC
This is fixed in git as "KillMode=mixed".

Comment 12 Fedora End Of Life 2015-01-09 22:26:26 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.


Note You need to log in before you can comment on or make changes to this bug.