Bug 1352264
| Summary: | systemd immediately sends SIGKILL after SIGTERM during shutdown | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Tomáš Kašpárek <tkasparek> |
| Component: | systemd | Assignee: | systemd-maint |
| Status: | CLOSED NOTABUG | QA Contact: | qe-baseos-daemons |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.2 | CC: | jdostal, msekleta, systemd-maint-list, tkasparek, tlestach |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-30 13:11:43 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1311170 | ||
How do you spawn your python script? Is it running as system service or as a part of a user session? In our use case it can be run manualy as part of user session, but also spawned by system deamon. This problem introduces an annoying regression for Satellite 5 customers. If they schedule a system reboot for a system on Satellite 5 server, the system ends up in a reboot loop as the client tooling has no chance to update Satellite server about the reboot action being performed. The client system keeps picking up and performing the reboot action. Is there a way to get it fixed for RHEL7.3? I was debugging this using systemtap and I could not reproduce, i.e. systemd never sent SIGKILL immediately after sending SIGTERM. However, note that systemd does send SIGHUP immediately after sending SIGTERM, this is default behaviour for all user sessions. Does your program correctly handles SIGHUP? I guess that you want to ignore that signal. I've tried catching SIGHUP (doing same things in signal handling code) and I've received same result as in case of SIGTERM - whole signal handling code never got to finish. I've also tried to sigprocmask TERM signal (however not the SIGHUP you're mentioning), I will run additional tests next week and report results here. (In reply to Michal Sekletar from comment #5) > I was debugging this using systemtap and I could not reproduce, i.e. systemd > never sent SIGKILL immediately after sending SIGTERM. > > However, note that systemd does send SIGHUP immediately after sending > SIGTERM, this is default behaviour for all user sessions. Does your program > correctly handles SIGHUP? I guess that you want to ignore that signal. With following script ran at the time of reboot/shutdown I am not able intercept shutdown process as I should be. #!/usr/bin/python import os import sys import signal import time def sigterm_handler(signum, frame): print 'catching sigterm' time.sleep(5) # emulate time needed for clean-up actions print 'caught sigterm' sys.exit(0) def sighup_handler(signum, frame): print 'catching sighup' time.sleep(5) # emulate time needed for clean-up actions print 'caught sighup' sys.exit(0) if __name__ == '__main__': signal.signal(signal.SIGTERM, sigterm_handler) signal.signal(signal.SIGTERM, sighup_handler) while True: # emulate workload time.sleep(1) Could you please try with this script and share results? Reproduced on systemd-219-19.el7_2.12.x86_64 (In reply to Tomáš Kašpárek from comment #7) > With following script ran at the time of reboot/shutdown I am not able > intercept shutdown process as I should be. <snip> > signal.signal(signal.SIGTERM, sighup_handler) I haven't tested your script yet, because above line jumped out at me immediately. Shouldn't it be "signal.signal(signal.SIGHUP, sighup_handler)" to actually handle SIGHUP? (In reply to Michal Sekletar from comment #8) > (In reply to Tomáš Kašpárek from comment #7) > > > With following script ran at the time of reboot/shutdown I am not able > > intercept shutdown process as I should be. > <snip> > > signal.signal(signal.SIGTERM, sighup_handler) > > I haven't tested your script yet, because above line jumped out at me > immediately. Shouldn't it be "signal.signal(signal.SIGHUP, sighup_handler)" > to actually handle SIGHUP? It is and in my script I am correctly catching SIGHUP, this is just copy/paste error. Please use signal.signal(signal.SIGHUP, sighup_handler) I've retested with new version of your script but still no SIGKILL in sight (verified by running systemtap script in parallel session). Weird thing is that when signals are sent immediately one after the other (SIGTERM first) then I never see "catching sigterm" nor "caught sigterm" message. Note that during user session shutdown also parent process of your scripts gets those signals. I could reproduce your issue when I spawned script from shell and then sent SIGTERM, SIGCONT (probably not necessary but systemd does that and so did I) and SIGHUP to both script and shell. Command I've used looks as follows, kill -s TERM 16695 && kill -s CONT 16695 && kill -s HUP 16695 && kill -s TERM 16674 && kill -s CONT 16674 && kill -s HUP 16674 16695 and 16674 are PIDs of script and shell (parent process). I changed your script, so it logs to syslog where I can see timestamp for each message. After running above command, script produced following output, Jul 21 11:18:17 qeos-236.lab.eng.rdu2.redhat.com sighup.py[16695]: catching sighup Jul 21 11:18:17 qeos-236.lab.eng.rdu2.redhat.com sighup.py[16695]: caught sighup Observe the timestamp. Process exited immediately. Hello, according to https://bugzilla.redhat.com/show_bug.cgi?id=1260527, we were able to solve our issue, which resulted in this Bugzilla. Thank you for your help and patience. You might close this bug. |
Description of problem: systemd immediately sends SIGKILL after SIGTERM during shutdown, there's no window of opportunity for processes to terminate Version-Release number of selected component (if applicable): systemd-219-19.el7.x86_64 How reproducible: always Steps to Reproduce: 1. run following python script which illustrates our use case: #!/usr/bin/python import os import sys import signal import time def sigterm_handler(signum, frame): time.sleep(5) # emulate time needed for clean-up actions print 'caught sigterm' sys.exit(0) if __name__ == '__main__': signal.signal(signal.SIGTERM, sigterm_handler) while True: # emulate workload time.sleep(1) 2. reboot / shutdown -r now / init 6 / poweroff Actual results: Code of sigterm_handler never gets to complete. \ Expected results: Signal handling code should complete and script should exit with 0 exit code. Additional info: I believe code reponsible for this behavior is in src/core/unit.c file, function unit_kill_context should return true.