Bug 2000877
Summary: | OCP ignores STOPSIGNAL in Dockerfile and sends SIGTERM | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Szymon.Sawis | |
Component: | Node | Assignee: | Peter Hunt <pehunt> | |
Node sub component: | CRI-O | QA Contact: | pmali | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | medium | CC: | akretzsc, aos-bugs, dwalsh, ebrizuel, pehunt, rphillips, skclark, tsweeney | |
Version: | 4.6 | |||
Target Milestone: | --- | |||
Target Release: | 4.10.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause:
CRI-O using an outdated signal parsing library
Consequence:
A stop signal set to anything greater than SIGRTMIN would be ignored
Fix:
Update signal parsing library
Result:
stop signals > SIGRTMIN are sent as a stop signal
|
Story Points: | --- | |
Clone Of: | ||||
: | 2084259 (view as bug list) | Environment: | ||
Last Closed: | 2022-03-10 16:07:01 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2084259 |
Description
Szymon.Sawis
2021-09-03 09:25:37 UTC
Can you provide us a reproducer with a Dockerfile? You can use a this example: ----- $ cat Dockerfile FROM fedora:latest COPY ./echo-the-trapper.sh / RUN chmod +x /echo-the-trapper.sh ENTRYPOINT ["/echo-the-trapper.sh"] STOPSIGNAL SIGRTMIN+3 ----- $ cat echo-the-trapper.sh #!/bin/bash echo "Starting $0" # Signals to trap, you can add here as need, just remove the "SIG" # part, to list all the signals use 'kill -l': signal2trap=(INT TERM EXIT HUP RTMIN+3) # Default stdout for trap is standard stdout for local tests trap_stdout=2 func_trap() { echo "Trapped: $1" >&${trap_stdout} } # If no stdin we're probably inside the container or # someone pipe info to the script o.0 if [[ ! -t 0 ]];then echo "In a container, adjusting!" trap_stdout=9 mkfifo /tmp/bucket # Not sure why I cannot use $trap_stdout here # So I just hard-coded it until further investigation. exec 9<> /tmp/bucket signal_timeout=600 # Seconds fi echo "Setting the traps for signals ${signal2trap[*]// /|}" for s in ${signal2trap[@]};do trap "func_trap ${s}" "${s}" done echo "Traps set for:" trap -p echo "Send any of them to PID ${$} (Ignore this PID in a container), we should echo the signal." # Checking if we're on a container if [[ $trap_stdout -ne 2 ]];then read -t $signal_timeout -u $trap_stdout msg echo $msg fi # On a container the 'read' will be ignored as stdin is closed echo "Hit a key when you want to exit" read ----- You can use the script from the command line for test any change you do using 'bash echo-the-trapper.sh' You can try this using podman with: 1- Build with 'podman build -t echo-the-trapper .' 2- Run it: # sudo podman run -d --name echo-the-trapper-0 echo-the-trapper 3- Check the logs: # sudo podman logs -f echo-the-trapper-0 4- Kill the process # sudo podman stop echo-the-trapper-0 --log-level debug On the logs of the pod you get the 'Trapped: RTMIN+3' line, and you can check on the output of the podman stop the correct signal is sent. We ask to the customer to make another test with the signal number instead of the name and the issue persist. This is the expected behavior for Openshift. The Kubelet is in control of the signals passed to the runtime. The Kubelet will send the SIGTERM to allow for a graceful shutdown, if the process has not shutdown by the gracefulTerrminatioPeriod (default 30 seconds) then the kubelet sends a SIGKILL. The STOPSIGNAL is the Dockerfile will not change this behavior. (In reply to Ryan Phillips from comment #4) > This is the expected behavior for Openshift. The Kubelet is in control of > the signals passed to the runtime. The Kubelet will send the SIGTERM to > allow for a graceful shutdown, if the process has not shutdown by the > gracefulTerrminatioPeriod (default 30 seconds) then the kubelet sends a > SIGKILL. The STOPSIGNAL is the Dockerfile will not change this behavior. Is there any way to force a different signal? (In reply to Ryan Phillips from comment #4) > This is the expected behavior for Openshift. The Kubelet is in control of > the signals passed to the runtime. The Kubelet will send the SIGTERM to > allow for a graceful shutdown, if the process has not shutdown by the > gracefulTerrminatioPeriod (default 30 seconds) then the kubelet sends a > SIGKILL. The STOPSIGNAL is the Dockerfile will not change this behavior. The problem is that the first signal sent is not the one specified by the STOPSIGNAL value when the example runs on Openshift, but is correctly handled running the example pod in podman. The other part is the standard behaviour used to avoid to keep a pod that is stuck and not listening the signals. To be a bit more clear: - What is the current behavior: The kubelet send the SIGTERM wait the gracefulTerminationPeriod and then send SIGKILL - What is the expected behavior: The kubelet sends the signal set as STOPSIGNAL (e.g: SIGRTMIN+3 or any other one) wait the gracefulTerminationPeriod and then send SIGKILL. With the example I made the pod will log will show in the logs "Trapped: <STOPSIGNAL-VALUE>" and then after the gracefulTerminationPeriod it will receive the SIGKILL and die. I dig up a bit in the k8s project upstream and see this: https://github.com/kubernetes/kubernetes/issues/30051 And it mention that is supposed to be supported using docker, sadly I didn't have the chance to make a test environment with Openshift for this because currently the lab is a bit unstable. Openshift uses crio for the container runtime. The STOPSIGNAL behavior is a side effect of Docker and is not really supported by Kubernetes. Going to close this for now, since it is working as designed. Checking with the customer, the image contains the correct stopsignal image definition: # sudo podman inspect f8c7cfe0750e | grep StopSignal "StopSignal": "SIGRTMIN+3" As the STOPSIGNAL is part of the OCI image-spec: https://github.com/opencontainers/image-spec/blob/main/config.md And that is part of the specification that is supported by crio. Checking on the kubernetes documentation we see on the Pod Lifecycle - Termination of Pods: "Typically, the container runtime sends a TERM signal to the main process in each container. Many container runtimes respect the STOPSIGNAL value defined in the container image and send this instead of TERM. Once the grace period has expired, the KILL signal is sent to any remaining processes, and the Pod is then deleted from the API Server." https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/ Apparently all the scaffolding is done to support this behaviour in crio and kubernetes mention that they honour the signal customization. I bring this to the slack chat to get some clarifications and we see that apparently could be an issue of the specific signal used in this case as is outside of the current signals supported by the go package used for the parse and send the signal. I will ask to the customer to check with a signal that is in that package definition and will further update the bz with the results. fixed in attached PR JFTR: I confirmed with the customer that the issue is isolated to signals above 31. PR merged Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |