Bug 2090609
Summary: | ERRO[0009] Error forwarding signal 18 to container using rootless user with timeout+sleep in the podman run command | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Sameer <snangare> | |
Component: | podman | Assignee: | Jindrich Novy <jnovy> | |
Status: | CLOSED ERRATA | QA Contact: | Alex Jia <ajia> | |
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 8.6 | CC: | ahuchcha, ajia, bbaude, dornelas, dwalsh, falim, jligon, jnovy, lsm5, mheon, pghadge, pthomas, tsweeney, umohnani, ypu | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | podman-4.1.1-6.el8 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2097049 (view as bug list) | Environment: | ||
Last Closed: | 2022-11-08 09:15:47 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2097049 | |||
Deadline: | 2022-08-23 |
Description
Sameer
2022-05-26 06:30:54 UTC
As a note, I would strongly recommend using `podman run --timeout` instead of the `timeout` command. `timeout` does not work as advertised with Podman when the container ignores the stop signal. In simple terms, `timeout` tries to manage Podman, but Podman is not the container. The container is not a direct child of Podman (it double-forks to daemonize, to allow it to survive the early death of the Podman process - users can deliberately request to detach from the container via the detach keys, for example, which causes Podman to exit, but the container must continue running). Timeout is sending signals to Podman, not to the container. Podman by default will forward most signals (with exceptions, the most notable being SIGSTOP and SIGKILL, which cannot be forwarded) into the container, but PID 1 in the container is special (the kernel automatically accords PID 1 in a namespace special treatment) and ignores all signals it does not explicitly register a signal handler to; thus, the usual stop signal that Timeout sends (SIGTERM, I believe) will have no effect on any program that does not explicitly set a signal handler to stop after it, and many common tools do not register a handler for SIGTERM. As a result, Timeout simply spins, sending signals to Podman which Podman dutifully forwards on to the container, which just ignores them. The situation can be worse than this if a container is run with `--sig-proxy=false` or Timeout is run with `-k` - Timeout will successfully kill Podman, but the container will continue running in the background because of its double-fork. In short, while this approach may work for Python (and honestly, from what I'm seeing with the reproducer, it doesn't - Podman exits after the amount of seconds in Python's sleep, not the 4 second timeout, seemingly indicating that timeout is just being ignored), it should not be considered general-purpose, and some containers will simply refuse to exit when managed by Timeout. You can potentially avoid this by setting container stop signals to SIGKILL, but this prevents any cleanup when a container is told to exit. Furthermore, we've seen occasional bad behavior from Timeout in the past where it will spam Podman with signals repeatedly, causing occasional logs like this when the container transitions to exited. Podman run --timeout 4 Is the way to handle this to get the container to work correctly. Having Podman killed is an unexpected occurrence. I am not sure what you are asking. If your customer wants to stop a container that runs longer then TIMEX, then podman run --timeout TIMEX is the way to go. If you are asking, should we fix a bug where Podman deadlocks when killed? Then the answer is yes. But not sure why the customer should care, and not sure what kind of priority this gets, since the customer has a better solution. Matt, do you know if `Podman run|start` catches the SIGTERM and that triggers a `podman stop`? Matt it might make sense to catch SIGTERM and then send the STOP_SIGNAL to the container (IE Do a podman stop). This way if SIG_TERM is ignored then podman and the container will exit in 10 seconds. Tested with podman-4.1.1-6.module+el8.7.0+15923+b0ec4f51.x86_64, and test result looks good. [root@sweetpig-20 ~]# yes | head -10 | xargs --replace -P5 timeout 2 podman run --rm --init registry.access.redhat.com/ubi8 sleep 5 ERRO[0000] container not running ERRO[0000] container not running ERRO[0000] container not running [root@sweetpig-20 ~]# timeout 4 podman run --log-level=info -i --rm registry.redhat.io/ansible-automation-platform-21/ee-supported-rhel8 python3 -c 'import time; time.sleep(5)' INFO[0000] podman filtering at log level info INFO[0000] Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled INFO[0000] Setting parallel job count to 7 Trying to pull registry.redhat.io/ansible-automation-platform-21/ee-supported-rhel8:latest... Getting image source signatures Checking if image destination supports signatures Copying blob efe94c0f6ff6 [>-------------------------------------] 3.1MiB / 189.4MiB Copying blob 1e09a5ee0038 [===>----------------------------------] 3.4MiB / 34.7MiB Copying blob 47c1fb849539 [==>-----------------------------------] 1.8MiB / 19.9MiB Copying blob 971ebcb22551 [--------------------------------------] 586.0KiB / 65.6MiB Copying blob 0d725b91398e done INFO[0004] Received shutdown signal "terminated", terminating! PID=30003 INFO[0004] Invoking shutdown handler "libpod" PID=30003 [root@sweetpig-20 ~]# cat /etc/redhat-release Red Hat Enterprise Linux release 8.7 Beta (Ootpa) [root@sweetpig-20 ~]# rpm -q podman runc systemd kernel podman-4.1.1-6.module+el8.7.0+15923+b0ec4f51.x86_64 runc-1.1.3-2.module+el8.7.0+15923+b0ec4f51.x86_64 systemd-239-60.el8.x86_64 kernel-4.18.0-408.el8.x86_64 And also verified on podman-4.1.1-6.module+el8.7.0+15895+a6753917.x86_64. [root@sweetpig-20 ~]# yes | head -10 | xargs --replace -P5 timeout 2 podman run --rm --init registry.access.redhat.com/ubi8 sleep 5 ERRO[0000] container not running [root@sweetpig-20 ~]# timeout 4 podman run --log-level=info -i --rm registry.redhat.io/ansible-automation-platform-21/ee-supported-rhel8 python3 -c 'import time; time.sleep(5)' INFO[0000] podman filtering at log level info INFO[0000] Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled INFO[0000] Setting parallel job count to 7 Trying to pull registry.redhat.io/ansible-automation-platform-21/ee-supported-rhel8:latest... Getting image source signatures Checking if image destination supports signatures Copying blob efe94c0f6ff6 [>-------------------------------------] 2.6MiB / 189.4MiB Copying blob 0d725b91398e done Copying blob 1e09a5ee0038 [==>-----------------------------------] 2.9MiB / 34.7MiB Copying blob 47c1fb849539 [===>----------------------------------] 1.9MiB / 19.9MiB Copying blob 971ebcb22551 [=>------------------------------------] 2.7MiB / 65.6MiB INFO[0004] Received shutdown signal "terminated", terminating! PID=35277 INFO[0004] Invoking shutdown handler "libpod" PID=35277 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: container-tools:rhel8 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7457 |