Bug 1352264

Summary:	systemd immediately sends SIGKILL after SIGTERM during shutdown
Product:	Red Hat Enterprise Linux 7	Reporter:	Tomáš Kašpárek <tkasparek>
Component:	systemd	Assignee:	systemd-maint
Status:	CLOSED NOTABUG	QA Contact:	qe-baseos-daemons
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	7.2	CC:	jdostal, msekleta, systemd-maint-list, tkasparek, tlestach
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-11-30 13:11:43 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1311170

Description Tomáš Kašpárek 2016-07-03 05:14:26 UTC

Description of problem:
systemd immediately sends SIGKILL after SIGTERM during shutdown, there's no window of opportunity for processes to terminate  

Version-Release number of selected component (if applicable):
systemd-219-19.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. run following python script which illustrates our use case:

#!/usr/bin/python

import os
import sys
import signal
import time

def sigterm_handler(signum, frame):
  time.sleep(5) # emulate time needed for clean-up actions
  print 'caught sigterm'
  sys.exit(0)

if __name__ == '__main__':
  signal.signal(signal.SIGTERM, sigterm_handler)
  while True: # emulate workload
    time.sleep(1)

2. reboot / shutdown -r now / init 6 / poweroff

Actual results:
Code of sigterm_handler never gets to complete.
\
Expected results:
Signal handling code should complete and script should exit with 0 exit code.

Additional info:
I believe code reponsible for this behavior is in src/core/unit.c file, function unit_kill_context should return true.

Comment 2 Michal Sekletar 2016-07-04 16:51:26 UTC

How do you spawn your python script? Is it running as system service or as a part of a user session?

Comment 3 Tomáš Kašpárek 2016-07-07 04:03:17 UTC

In our use case it can be run manualy as part of user session, but also spawned by system deamon.

Comment 4 Tomas Lestach 2016-07-11 09:56:51 UTC

This problem introduces an annoying regression for Satellite 5 customers.
If they schedule a system reboot for a system on Satellite 5 server, the system ends up in a reboot loop as the client tooling has no chance to update Satellite server about the reboot action being performed. The client system keeps picking up and performing the reboot action.
Is there a way to get it fixed for RHEL7.3?

Comment 5 Michal Sekletar 2016-07-11 17:19:07 UTC

I was debugging this using systemtap and I could not reproduce, i.e. systemd never sent SIGKILL immediately after sending SIGTERM.

However, note that systemd does send SIGHUP immediately after sending SIGTERM, this is default behaviour for all user sessions. Does your program correctly handles SIGHUP? I guess that you want to ignore that signal.

Comment 6 Tomáš Kašpárek 2016-07-13 14:46:57 UTC

I've tried catching SIGHUP (doing same things in signal handling code) and I've received same result as in case of SIGTERM - whole signal handling code never got to finish. I've also tried to sigprocmask TERM signal (however not the SIGHUP you're mentioning), I will run additional tests next week and report results here.

Comment 7 Tomáš Kašpárek 2016-07-21 11:59:49 UTC

(In reply to Michal Sekletar from comment #5)
> I was debugging this using systemtap and I could not reproduce, i.e. systemd
> never sent SIGKILL immediately after sending SIGTERM.
> 
> However, note that systemd does send SIGHUP immediately after sending
> SIGTERM, this is default behaviour for all user sessions. Does your program
> correctly handles SIGHUP? I guess that you want to ignore that signal.

With following script ran at the time of reboot/shutdown I am not able intercept shutdown process as I should be.

#!/usr/bin/python

import os
import sys
import signal
import time

def sigterm_handler(signum, frame):
  print 'catching sigterm'
  time.sleep(5) # emulate time needed for clean-up actions
  print 'caught sigterm'
  sys.exit(0)

def sighup_handler(signum, frame):
  print 'catching sighup'
  time.sleep(5) # emulate time needed for clean-up actions
  print 'caught sighup'
  sys.exit(0)

if __name__ == '__main__':
  signal.signal(signal.SIGTERM, sigterm_handler)
  signal.signal(signal.SIGTERM, sighup_handler)
  while True: # emulate workload
    time.sleep(1)

Could you please try with this script and share results?

Reproduced on systemd-219-19.el7_2.12.x86_64

Comment 8 Michal Sekletar 2016-07-21 12:06:03 UTC

(In reply to Tomáš Kašpárek from comment #7)

> With following script ran at the time of reboot/shutdown I am not able
> intercept shutdown process as I should be.
<snip>
>   signal.signal(signal.SIGTERM, sighup_handler)

I haven't tested your script yet, because above line jumped out at me immediately. Shouldn't it be  "signal.signal(signal.SIGHUP, sighup_handler)" to actually handle SIGHUP?

Comment 9 Tomáš Kašpárek 2016-07-21 13:31:45 UTC

(In reply to Michal Sekletar from comment #8)
> (In reply to Tomáš Kašpárek from comment #7)
> 
> > With following script ran at the time of reboot/shutdown I am not able
> > intercept shutdown process as I should be.
> <snip>
> >   signal.signal(signal.SIGTERM, sighup_handler)
> 
> I haven't tested your script yet, because above line jumped out at me
> immediately. Shouldn't it be  "signal.signal(signal.SIGHUP, sighup_handler)"
> to actually handle SIGHUP?

It is and in my script I am correctly catching SIGHUP, this is just copy/paste error.
Please use signal.signal(signal.SIGHUP, sighup_handler)

Comment 10 Michal Sekletar 2016-07-21 16:43:24 UTC

I've retested with new version of your script but still no SIGKILL in sight (verified by running systemtap script in parallel session). Weird thing is that when signals are sent immediately one after the other (SIGTERM first) then I never see "catching sigterm" nor "caught sigterm" message.

Note that during user session shutdown also parent process of your scripts gets those signals. I could reproduce your issue when I spawned script from shell and then sent SIGTERM, SIGCONT (probably not necessary but systemd does that and so did I) and SIGHUP to both script and shell. Command I've used looks as follows,

 kill -s TERM 16695 && kill -s CONT 16695 && kill -s HUP 16695 && kill -s TERM 16674 && kill -s CONT 16674 && kill -s HUP 16674 

16695 and 16674 are PIDs of script and shell (parent process). I changed your script, so it logs to syslog where I can see timestamp for each message. After running above command, script produced following output,

Jul 21 11:18:17 qeos-236.lab.eng.rdu2.redhat.com sighup.py[16695]: catching sighup
Jul 21 11:18:17 qeos-236.lab.eng.rdu2.redhat.com sighup.py[16695]: caught sighup

Observe the timestamp. Process exited immediately.

Comment 12 Jiří Dostál 2016-11-30 13:09:13 UTC

Hello,

according to https://bugzilla.redhat.com/show_bug.cgi?id=1260527, we were able to solve our issue, which resulted in this Bugzilla. Thank you for your help and patience. You might close this bug.