1595733 – containers are not gracefully stopped on shutdown/reboot

Bug 1595733 - containers are not gracefully stopped on shutdown/reboot

Summary: containers are not gracefully stopped on shutdown/reboot

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-paunch
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	Upstream M3
Target Release:	14.0 (Rocky)
Assignee:	Steve Baker
QA Contact:	Marius Cornea
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1597797
TreeView+	depends on / blocked

Reported:	2018-06-27 13:01 UTC by Brent Eagles
Modified:	2021-12-10 16:28 UTC (History)
CC List:	10 users (show)
Fixed In Version:	python-paunch-3.2.0-0.20180921003258.6d2ec11.el7ost
Doc Type:	Bug Fix
Doc Text:	This update corrects an issue that prevented the system from properly shutting down and waiting for containers to stop on reboot. That issue could cause the containers to get killed before they stopped properly. This update adds a new service which ensures that the system waits for the containers to fully stop before continuing during the reboot.
Clone Of:
Clones:	1597797 (view as bug list)
Environment:
Last Closed:	2019-01-11 11:50:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1778913	None	None	None	2018-06-27 13:01:53 UTC
OpenStack gerrit	578614	None	MERGED	Implement stop_signal, stop_grace_period	2020-09-17 02:17:18 UTC
RDO	14540	None	None	None	2018-06-29 20:09:39 UTC
Red Hat Product Errata	RHEA-2019:0045	None	None	None	2019-01-11 11:50:37 UTC

Description Brent Eagles 2018-06-27 13:01:13 UTC

There is no mechanism to gracefully shutdown containers when shutting down/rebooting a node. They keep running until systemd performs a process cleanup and sends SIGTERM to all running process. If they are still running for some interval after the SIGTERM, systemd sends SIGKILL. It is unknown what impact that this might have.

Example from console logs:

[ 1556.163360] type=1700 audit(1529955121.410:884): dev=qr-90058585-3f prom=0 old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967295
[ 1558.874838] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[ 1558.886653] systemd-shutdown[1]: Sending SIGKILL to PID 2044 (docker-containe).
[ 1558.892701] systemd-shutdown[1]: Sending SIGKILL to PID 2079 (kolla_start).
[ 1558.897848] systemd-shutdown[1]: Sending SIGKILL to PID 2189 (ceilometer-agen).
[ 1558.903574] systemd-shutdown[1]: Sending SIGKILL to PID 2190 (docker-containe).
[ 1558.909220] systemd-shutdown[1]: Sending SIGKILL to PID 2217 (docker-containe).
[ 1558.914799] systemd-shutdown[1]: Sending SIGKILL to PID 2277 (docker-containe).
[ 1558.920219] systemd-shutdown[1]: Sending SIGKILL to PID 2317 (kolla_start).
[ 1558.924792] systemd-shutdown[1]: Sending SIGKILL to PID 2329 (kolla_start).
[ 1558.932073] systemd-shutdown[1]: Sending SIGKILL to PID 2376 (kolla_start).
[ 1558.943927] systemd-shutdown[1]: Sending SIGKILL to PID 2584 (aodh-notifier: ).
[ 1558.952055] systemd-shutdown[1]: Sending SIGKILL to PID 2588 (docker-containe).
[ 1558.956779] systemd-shutdown[1]: Sending SIGKILL to PID 2612 (cinder-schedule).
[ 1558.961131] systemd-shutdown[1]: Sending SIGKILL to PID 2690 (neutron-l3-agen).

It's worth noting that projects like neutron refined the systemd service behavior over years including the addition of order-dependent cleanup oneshot services like "neutron-ovs-cleanup". These refinements addressed actual issues so some form of graceful shutdown mechanism is desirable to avoid unexpected, difficult to debug, issues.

Comment 1 Alex Schultz 2018-06-27 14:07:11 UTC

Yea not sure if this is super configurable from a container runtime  perspective. We'll have to figure this out. That being said, it would be beneficial for the services not to rely on this as other deployment mechanisms with containers might need to rely on different cleanup processes.

Comment 9 errata-xmlrpc 2019-01-11 11:50:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045

Note You need to log in before you can comment on or make changes to this bug.