Description of problem: We'd like to terminate services inside the container more gracefully. Despite the set STOPSIGNAL SIGRTMIN+3 in Dockerfile, OCP ignores it and sends always SIGTERM. Then systemd tries to rexecute services, and when terminationGracePeriodSeconds expires, the SIGKILL signal is sent. Version-Release number of selected component (if applicable): 4.6 How reproducible: always Steps to Reproduce: 1. Run journalctl -f inside container 2. Scale down sts to 0 replicas Actual results: Jul 21 09:36:49 ipshost-0 systemd: Received SIGTERM. Jul 21 09:36:49 ipshost-0 systemd: Reexecuting. Jul 21 09:36:49 ipshost-0 systemd: systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN) Jul 21 09:36:49 ipshost-0 systemd: Detected virtualization other. Jul 21 09:36:49 ipshost-0 systemd: Detected architecture x86-64. Jul 21 09:36:49 ipshost-0 systemd: /usr/lib/systemd/system-generators/nfs-server-generator failed with error code 1. Jul 21 09:36:49 ipshost-0 systemd: [/usr/lib/systemd/system/nfs-ganesha.service:28] Unknown lvalue 'LogsDirectory' in section 'Service' Jul 21 09:36:49 ipshost-0 systemd: [/usr/lib/systemd/system/nfs-ganesha.service:29] Unknown lvalue 'StateDirectory' in section 'Service' ... Jul 21 09:44:04 ipshost-0 su: (to nz) root on none Jul 21 09:44:04 ipshost-0 dbus[107]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service' Jul 21 09:44:04 ipshost-0 dbus[107]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.login1.service': Unit is masked. command terminated with exit code 137 Expected results: OCP should send given signal to container instead SIGTERM signal. Additional info:
Can you provide us a reproducer with a Dockerfile?
You can use a this example: ----- $ cat Dockerfile FROM fedora:latest COPY ./echo-the-trapper.sh / RUN chmod +x /echo-the-trapper.sh ENTRYPOINT ["/echo-the-trapper.sh"] STOPSIGNAL SIGRTMIN+3 ----- $ cat echo-the-trapper.sh #!/bin/bash echo "Starting $0" # Signals to trap, you can add here as need, just remove the "SIG" # part, to list all the signals use 'kill -l': signal2trap=(INT TERM EXIT HUP RTMIN+3) # Default stdout for trap is standard stdout for local tests trap_stdout=2 func_trap() { echo "Trapped: $1" >&${trap_stdout} } # If no stdin we're probably inside the container or # someone pipe info to the script o.0 if [[ ! -t 0 ]];then echo "In a container, adjusting!" trap_stdout=9 mkfifo /tmp/bucket # Not sure why I cannot use $trap_stdout here # So I just hard-coded it until further investigation. exec 9<> /tmp/bucket signal_timeout=600 # Seconds fi echo "Setting the traps for signals ${signal2trap[*]// /|}" for s in ${signal2trap[@]};do trap "func_trap ${s}" "${s}" done echo "Traps set for:" trap -p echo "Send any of them to PID ${$} (Ignore this PID in a container), we should echo the signal." # Checking if we're on a container if [[ $trap_stdout -ne 2 ]];then read -t $signal_timeout -u $trap_stdout msg echo $msg fi # On a container the 'read' will be ignored as stdin is closed echo "Hit a key when you want to exit" read ----- You can use the script from the command line for test any change you do using 'bash echo-the-trapper.sh' You can try this using podman with: 1- Build with 'podman build -t echo-the-trapper .' 2- Run it: # sudo podman run -d --name echo-the-trapper-0 echo-the-trapper 3- Check the logs: # sudo podman logs -f echo-the-trapper-0 4- Kill the process # sudo podman stop echo-the-trapper-0 --log-level debug On the logs of the pod you get the 'Trapped: RTMIN+3' line, and you can check on the output of the podman stop the correct signal is sent.
We ask to the customer to make another test with the signal number instead of the name and the issue persist.
This is the expected behavior for Openshift. The Kubelet is in control of the signals passed to the runtime. The Kubelet will send the SIGTERM to allow for a graceful shutdown, if the process has not shutdown by the gracefulTerrminatioPeriod (default 30 seconds) then the kubelet sends a SIGKILL. The STOPSIGNAL is the Dockerfile will not change this behavior.
(In reply to Ryan Phillips from comment #4) > This is the expected behavior for Openshift. The Kubelet is in control of > the signals passed to the runtime. The Kubelet will send the SIGTERM to > allow for a graceful shutdown, if the process has not shutdown by the > gracefulTerrminatioPeriod (default 30 seconds) then the kubelet sends a > SIGKILL. The STOPSIGNAL is the Dockerfile will not change this behavior. Is there any way to force a different signal?
(In reply to Ryan Phillips from comment #4) > This is the expected behavior for Openshift. The Kubelet is in control of > the signals passed to the runtime. The Kubelet will send the SIGTERM to > allow for a graceful shutdown, if the process has not shutdown by the > gracefulTerrminatioPeriod (default 30 seconds) then the kubelet sends a > SIGKILL. The STOPSIGNAL is the Dockerfile will not change this behavior. The problem is that the first signal sent is not the one specified by the STOPSIGNAL value when the example runs on Openshift, but is correctly handled running the example pod in podman. The other part is the standard behaviour used to avoid to keep a pod that is stuck and not listening the signals. To be a bit more clear: - What is the current behavior: The kubelet send the SIGTERM wait the gracefulTerminationPeriod and then send SIGKILL - What is the expected behavior: The kubelet sends the signal set as STOPSIGNAL (e.g: SIGRTMIN+3 or any other one) wait the gracefulTerminationPeriod and then send SIGKILL. With the example I made the pod will log will show in the logs "Trapped: <STOPSIGNAL-VALUE>" and then after the gracefulTerminationPeriod it will receive the SIGKILL and die. I dig up a bit in the k8s project upstream and see this: https://github.com/kubernetes/kubernetes/issues/30051 And it mention that is supposed to be supported using docker, sadly I didn't have the chance to make a test environment with Openshift for this because currently the lab is a bit unstable.
Openshift uses crio for the container runtime. The STOPSIGNAL behavior is a side effect of Docker and is not really supported by Kubernetes. Going to close this for now, since it is working as designed.
Checking with the customer, the image contains the correct stopsignal image definition: # sudo podman inspect f8c7cfe0750e | grep StopSignal "StopSignal": "SIGRTMIN+3" As the STOPSIGNAL is part of the OCI image-spec: https://github.com/opencontainers/image-spec/blob/main/config.md And that is part of the specification that is supported by crio. Checking on the kubernetes documentation we see on the Pod Lifecycle - Termination of Pods: "Typically, the container runtime sends a TERM signal to the main process in each container. Many container runtimes respect the STOPSIGNAL value defined in the container image and send this instead of TERM. Once the grace period has expired, the KILL signal is sent to any remaining processes, and the Pod is then deleted from the API Server." https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/ Apparently all the scaffolding is done to support this behaviour in crio and kubernetes mention that they honour the signal customization. I bring this to the slack chat to get some clarifications and we see that apparently could be an issue of the specific signal used in this case as is outside of the current signals supported by the go package used for the parse and send the signal. I will ask to the customer to check with a signal that is in that package definition and will further update the bz with the results.
fixed in attached PR
JFTR: I confirmed with the customer that the issue is isolated to signals above 31.
PR merged
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056