1352264 – systemd immediately sends SIGKILL after SIGTERM during shutdown

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1352264 - systemd immediately sends SIGKILL after SIGTERM during shutdown

Summary: systemd immediately sends SIGKILL after SIGTERM during shutdown

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	systemd
Sub Component:
Version:	7.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	systemd-maint
QA Contact:	qe-baseos-daemons
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1311170
TreeView+	depends on / blocked

Reported:	2016-07-03 05:14 UTC by Tomáš Kašpárek
Modified:	2016-11-30 13:11 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-30 13:11:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Tomáš Kašpárek 2016-07-03 05:14:26 UTC

Description of problem:
systemd immediately sends SIGKILL after SIGTERM during shutdown, there's no window of opportunity for processes to terminate  

Version-Release number of selected component (if applicable):
systemd-219-19.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. run following python script which illustrates our use case:

#!/usr/bin/python

import os
import sys
import signal
import time

def sigterm_handler(signum, frame):
  time.sleep(5) # emulate time needed for clean-up actions
  print 'caught sigterm'
  sys.exit(0)

if __name__ == '__main__':
  signal.signal(signal.SIGTERM, sigterm_handler)
  while True: # emulate workload
    time.sleep(1)

2. reboot / shutdown -r now / init 6 / poweroff

Actual results:
Code of sigterm_handler never gets to complete.
\
Expected results:
Signal handling code should complete and script should exit with 0 exit code.

Additional info:
I believe code reponsible for this behavior is in src/core/unit.c file, function unit_kill_context should return true.

Comment 2 Michal Sekletar 2016-07-04 16:51:26 UTC

How do you spawn your python script? Is it running as system service or as a part of a user session?

Comment 3 Tomáš Kašpárek 2016-07-07 04:03:17 UTC

In our use case it can be run manualy as part of user session, but also spawned by system deamon.

Comment 4 Tomas Lestach 2016-07-11 09:56:51 UTC

This problem introduces an annoying regression for Satellite 5 customers.
If they schedule a system reboot for a system on Satellite 5 server, the system ends up in a reboot loop as the client tooling has no chance to update Satellite server about the reboot action being performed. The client system keeps picking up and performing the reboot action.
Is there a way to get it fixed for RHEL7.3?

Comment 5 Michal Sekletar 2016-07-11 17:19:07 UTC

I was debugging this using systemtap and I could not reproduce, i.e. systemd never sent SIGKILL immediately after sending SIGTERM.

However, note that systemd does send SIGHUP immediately after sending SIGTERM, this is default behaviour for all user sessions. Does your program correctly handles SIGHUP? I guess that you want to ignore that signal.

Comment 6 Tomáš Kašpárek 2016-07-13 14:46:57 UTC

I've tried catching SIGHUP (doing same things in signal handling code) and I've received same result as in case of SIGTERM - whole signal handling code never got to finish. I've also tried to sigprocmask TERM signal (however not the SIGHUP you're mentioning), I will run additional tests next week and report results here.

Comment 7 Tomáš Kašpárek 2016-07-21 11:59:49 UTC

(In reply to Michal Sekletar from comment #5)
> I was debugging this using systemtap and I could not reproduce, i.e. systemd
> never sent SIGKILL immediately after sending SIGTERM.
> 
> However, note that systemd does send SIGHUP immediately after sending
> SIGTERM, this is default behaviour for all user sessions. Does your program
> correctly handles SIGHUP? I guess that you want to ignore that signal.

With following script ran at the time of reboot/shutdown I am not able intercept shutdown process as I should be.

#!/usr/bin/python

import os
import sys
import signal
import time

def sigterm_handler(signum, frame):
  print 'catching sigterm'
  time.sleep(5) # emulate time needed for clean-up actions
  print 'caught sigterm'
  sys.exit(0)

def sighup_handler(signum, frame):
  print 'catching sighup'
  time.sleep(5) # emulate time needed for clean-up actions
  print 'caught sighup'
  sys.exit(0)

if __name__ == '__main__':
  signal.signal(signal.SIGTERM, sigterm_handler)
  signal.signal(signal.SIGTERM, sighup_handler)
  while True: # emulate workload
    time.sleep(1)

Could you please try with this script and share results?

Reproduced on systemd-219-19.el7_2.12.x86_64

Comment 8 Michal Sekletar 2016-07-21 12:06:03 UTC

(In reply to Tomáš Kašpárek from comment #7)

> With following script ran at the time of reboot/shutdown I am not able
> intercept shutdown process as I should be.
<snip>
>   signal.signal(signal.SIGTERM, sighup_handler)

I haven't tested your script yet, because above line jumped out at me immediately. Shouldn't it be  "signal.signal(signal.SIGHUP, sighup_handler)" to actually handle SIGHUP?

Comment 9 Tomáš Kašpárek 2016-07-21 13:31:45 UTC

(In reply to Michal Sekletar from comment #8)
> (In reply to Tomáš Kašpárek from comment #7)
> 
> > With following script ran at the time of reboot/shutdown I am not able
> > intercept shutdown process as I should be.
> <snip>
> >   signal.signal(signal.SIGTERM, sighup_handler)
> 
> I haven't tested your script yet, because above line jumped out at me
> immediately. Shouldn't it be  "signal.signal(signal.SIGHUP, sighup_handler)"
> to actually handle SIGHUP?

It is and in my script I am correctly catching SIGHUP, this is just copy/paste error.
Please use signal.signal(signal.SIGHUP, sighup_handler)

Comment 10 Michal Sekletar 2016-07-21 16:43:24 UTC

I've retested with new version of your script but still no SIGKILL in sight (verified by running systemtap script in parallel session). Weird thing is that when signals are sent immediately one after the other (SIGTERM first) then I never see "catching sigterm" nor "caught sigterm" message.

Note that during user session shutdown also parent process of your scripts gets those signals. I could reproduce your issue when I spawned script from shell and then sent SIGTERM, SIGCONT (probably not necessary but systemd does that and so did I) and SIGHUP to both script and shell. Command I've used looks as follows,

 kill -s TERM 16695 && kill -s CONT 16695 && kill -s HUP 16695 && kill -s TERM 16674 && kill -s CONT 16674 && kill -s HUP 16674 

16695 and 16674 are PIDs of script and shell (parent process). I changed your script, so it logs to syslog where I can see timestamp for each message. After running above command, script produced following output,

Jul 21 11:18:17 qeos-236.lab.eng.rdu2.redhat.com sighup.py[16695]: catching sighup
Jul 21 11:18:17 qeos-236.lab.eng.rdu2.redhat.com sighup.py[16695]: caught sighup

Observe the timestamp. Process exited immediately.

Comment 12 Jiří Dostál 2016-11-30 13:09:13 UTC

Hello,

according to https://bugzilla.redhat.com/show_bug.cgi?id=1260527, we were able to solve our issue, which resulted in this Bugzilla. Thank you for your help and patience. You might close this bug.

Note You need to log in before you can comment on or make changes to this bug.