Bug 2119072
Summary: | podman gating test issues in RHEL8.7 | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Jindrich Novy <jnovy> | |
Component: | conmon | Assignee: | Jindrich Novy <jnovy> | |
Status: | CLOSED ERRATA | QA Contact: | Joy Pu <ypu> | |
Severity: | medium | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 8.7 | CC: | bbaude, dwalsh, gscrivan, hshiina, jligon, jnovy, lfriedma, lsm5, mheon, msekleta, pthomas, santiago, tsweeney, umohnani, vrothber, ypu | |
Target Milestone: | rc | Keywords: | Triaged, ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | conmon-2.1.2-2.el8 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2154403 2154417 2154418 (view as bug list) | Environment: | ||
Last Closed: | 2022-11-08 09:16:44 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2002451, 2154403, 2154417, 2154418 |
Description
Jindrich Novy
2022-08-17 12:54:06 UTC
Trivial reproducer (as root on RHEL8.7 with MB 16321 installed): # podman create --pull=always --name=foo quay.io/libpod/testimage:20220615 top # podman generate systemd --new foo >/run/systemd/system/foo.service # podman rm foo # systemctl daemon-reload # systemctl start foo So far, so good. Now: # systemctl stop foo # systemctl status foo ? foo.service - Podman container-cd68b1f76c9880d29ca5a0377848f72308d4ea1ee2965cc64924f77b864746a6.service Loaded: loaded (/run/systemd/system/foo.service; disabled; vendor preset: disabled) Active: failed (Result: signal) since Wed 2022-08-17 10:49:54 EDT; 32s ago Docs: man:podman-generate-systemd(1) Process: 13978 ExecStopPost=/usr/bin/podman rm -f --ignore --cidfile=/run/foo.service.ctr-id (code=exited, status=0/SUCCESS) Process: 13917 ExecStop=/usr/bin/podman stop --ignore --cidfile=/run/foo.service.ctr-id (code=exited, status=0/SUCCESS) Process: 13896 ExecStart=/usr/bin/podman run --cidfile=/run/foo.service.ctr-id --cgroups=no-conmon --rm --sdnotify=conmon -d --replace --pull=always --name=foo quay.io/libpod/testimage:20220615 top (code=killed, signal=USR1) Process: 13841 ExecStartPre=/bin/rm -f /run/foo.service.ctr-id (code=exited, status=0/SUCCESS) Main PID: 13896 (code=killed, signal=USR1) Aug 17 10:49:49 ci-vm-10-0-137-138.hosted.upshift.rdu2.redhat.com podman[13843]: Copying config sha256:f26aa69bb3f3de470d358b6c4e7f9de8ca165ef7f1689af1cdd446902dc27928 Aug 17 10:49:49 ci-vm-10-0-137-138.hosted.upshift.rdu2.redhat.com podman[13843]: Writing manifest to image destination Aug 17 10:49:49 ci-vm-10-0-137-138.hosted.upshift.rdu2.redhat.com podman[13843]: Storing signatures Aug 17 10:49:49 ci-vm-10-0-137-138.hosted.upshift.rdu2.redhat.com systemd[1]: Started Podman container-cd68b1f76c9880d29ca5a0377848f72308d4ea1ee2965cc64924f77b864746a6.service. Aug 17 10:49:49 ci-vm-10-0-137-138.hosted.upshift.rdu2.redhat.com podman[13843]: 0804ed1173c14036e9813829e2a2a58ea836f4d964925ab018888eb6de36bd00 Aug 17 10:49:54 ci-vm-10-0-137-138.hosted.upshift.rdu2.redhat.com systemd[1]: Stopping Podman container-cd68b1f76c9880d29ca5a0377848f72308d4ea1ee2965cc64924f77b864746a6.service... Aug 17 10:49:54 ci-vm-10-0-137-138.hosted.upshift.rdu2.redhat.com podman[13917]: 0804ed1173c14036e9813829e2a2a58ea836f4d964925ab018888eb6de36bd00 Aug 17 10:49:54 ci-vm-10-0-137-138.hosted.upshift.rdu2.redhat.com systemd[1]: foo.service: Main process exited, code=killed, status=10/USR1 Aug 17 10:49:54 ci-vm-10-0-137-138.hosted.upshift.rdu2.redhat.com systemd[1]: foo.service: Failed with result 'signal'. Aug 17 10:49:54 ci-vm-10-0-137-138.hosted.upshift.rdu2.redhat.com systemd[1]: Stopped Podman container-cd68b1f76c9880d29ca5a0377848f72308d4ea1ee2965cc64924f77b864746a6.service. Reproduce at will via `podman start/stop foo`. Valentin, please take a look when you return from PTO. I am unable to reproduce with build 2119676 [1] on RHEL-8.7.0-20220820.0. What I get is the expected exit code 143. > foo.service: Main process exited, code=exited, status=143/n/a > foo.service: Main process exited, code=killed, status=10/USR1 I have never seen this issue so far and do not know who's sending the USR1 signal. [1] https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2119676 After chatting with Ed, I noticed a difference between the non-working 1minute VMs and the one I got from beaker. Working: 4.18.0-418.el8.x86_64 and systemd-239-65.el8 Broken: 4.18.0-416.el8.x86_64 and systemd-239-62.el8 I ran killsnoop from the `bcc-tools` package to figure out who's sending the USR1 signal on the broken machine. What I observed is that systemd is immediately sending SIGTERM on `systemctl stop` but I couldn't spot USR1. My conclusion is that something in stack (outside of Podman) had a temporary hiccup that has been fixed in the meantime. Tom, I suggest to close. 1minutetip has RHEL-8.7.0-20220816.0 available but not 0820 (the one Valentin is using). On 0816 I cannot dnf-upgrade to the systemd or kernel versions that Valentin sees. Until I get those versions and confirm the bug is fixed, I recommend leaving open. Thanks for digging into this quickly folks. Let's keep this open until Ed gets a chance to dive a bit deeper. Sorry Valentin, there's something different about your beaker setup. 1mt has just made 0823 available, with systemd-239-65.el8.x86_64 and kernel 4.18.0-419.el8.x86_64 (both greater than or equal to the ones you report). The bug is present in this setup. (In reply to Ed Santiago from comment #7) > Sorry Valentin, there's something different about your beaker setup. 1mt has > just made 0823 available, with systemd-239-65.el8.x86_64 and kernel > 4.18.0-419.el8.x86_64 (both greater than or equal to the ones you report). > The bug is present in this setup. That is very curious. Ed, could you spin up a beaker machine and test there? I will give 1mt a shot later today. Pulling in Michal from the system team. @Michal, Ed gave me access to a "broken" system. I made the following observations that make be believe it's (temporary?) issue in systemd: * `killsnoop` revealed that systemd sends TERM and KILL immediately on `systemctl stop` to the conmon process. * Given `TimeoutStopSec=70`, systemd should not do that and does not on a "working" system. * Setting `KillMode=none` resolves the issue but it should not be required. Is there a temporary issue in systemd or am I looking in the wrong direction? Ed and I ran another debugging session. I couldn't reproduce as I only installed the podman{-catonit} package. Installing the entire module yielded the failure again. It turns it to be `conmon`. We bisected that `d91cc4321797eaa84dcfb7863e91632d2fe26861` [1] is the first commit introducing the issue. Note that we ship conmon v2.1.0 in Fedora while we're running conmon v2.1.3 in RHEL. Assigning the issue to Giuseepe. [1] https://github.com/containers/conmon/commit/d91cc4321797eaa84dcfb7863e91632d2fe26861 opened a PR: https://bugzilla.redhat.com/show_bug.cgi?id=2119072 Quick note that, for purposes of RHEL, it might be safest to revert conmon back to 2.1.2-2 sorry, the PR is here: https://github.com/containers/conmon/pull/352 (In reply to Ed Santiago from comment #13) > Quick note that, for purposes of RHEL, it might be safest to revert conmon > back to 2.1.2-2 Giuseppe, Tom, Jindrich, what do you think about Ed's suggestion? I concur with Ed but if the conmon experts feel confident, I will as well. Just reverted conmon to 2.1.2. Hope it's the last change for 8.7. *** Bug 2123152 has been marked as a duplicate of this bug. *** Test with conmon-2.1.2-2.module+el8.7.0+16493+89f82ab8.x86_64.rpm and run the test case "podman generate - systemd - basic" 200 times. It all finished successfully as expected. So move this bug to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: container-tools:rhel8 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7457 |