RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1838081 - podman healthchecks fail on RHEL 8 when using rootless
Summary: podman healthchecks fail on RHEL 8 when using rootless
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: systemd
Version: 8.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: 8.0
Assignee: David Tardon
QA Contact: Frantisek Sumsal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-20 12:49 UTC by Suhaas Bhat
Modified: 2023-12-15 17:58 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-04 02:02:13 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github systemd-rhel rhel-8 pull 102 0 None closed (#1838081) sd-bus: allow cross-uid-namespace connections 2020-11-25 16:46:25 UTC
Github systemd systemd pull 11785 0 None closed sd-bus: allow cross-uid-namespace connections 2020-11-25 16:46:25 UTC
Red Hat Knowledge Base (Solution) 5096391 0 None None None 2020-05-21 12:31:05 UTC
Red Hat Product Errata RHSA-2020:4553 0 None None None 2020-11-04 02:02:30 UTC

Description Suhaas Bhat 2020-05-20 12:49:23 UTC
Description of problem:
On RHEL 8 podman is using systemd for healthchecks and fails while using rootless.

Version-Release number of selected component (if applicable):
podman-1.6.4-11.module+el8.2.0+6368+cf16aa14.x86_64

How reproducible:
Fully reproducible

Steps to Reproduce:

[joe@vm249-81 ~]$ podman run -d --name test1 --health-cmd="/bin/sh -c 'echo hello'" --healthcheck-interval=30s --health-timeout=10s ubi8
Trying to pull registry.access.redhat.com/ubi8...
Getting image source signatures
Copying blob 78afc5364ad2 done
Copying blob 58e1deb9693d done
Copying config 8e0b0194b7 done
Writing manifest to image destination
Storing signatures
ERRO[0021] exit status 1                                
ERRO[0021] Unit 3ea606270fe65b703d94f03b710bf2182471cfe8c1f24ebec3130f8e126f40c8.service not found. 
3ea606270fe65b703d94f03b710bf2182471cfe8c1f24ebec3130f8e126f40c8

[joe@vm249-81 ~]$ podman healthcheck run test1
healthy

We have the fix here : https://github.com/containers/libpod/pull/3529/commits/c6c637da0012d54ddce32e06706c8903564b41cc

We need to back port this.

Actual results:
Healthchecks fail for rootless podman on rhel 8

Expected results:
Healthchecks should run for rootless podman on rhel 8

Additional info:

Comment 1 Matthew Heon 2020-05-20 13:21:57 UTC
That commit is present in Podman 1.5.0, so it's definitely in the 1.6.x versions we build & ship for RHEL 8. How certain are you that it is the cause?

Comment 2 Robb Manes 2020-05-20 14:13:16 UTC
(In reply to Matthew Heon from comment #1)
> That commit is present in Podman 1.5.0, so it's definitely in the 1.6.x
> versions we build & ship for RHEL 8. How certain are you that it is the
> cause?

We're not certain.  I made an error in my checking and apologize.

The behavior we do see on podman 1.6.4 is the following, however:

$ podman -v
podman version 1.6.4

$ podman run -d --name healthyboi --health-cmd="/bin/sh -c 'echo hello'" --healthcheck-interval=30s --health-timeout=10s ubi8
ERRO[0000] exit status 1                                
ERRO[0000] Unit e8cfe5ce36e59858f103ca863956caa1a10ae54a36e9e79af72e9d7014c7abe5.service not found. 
e8cfe5ce36e59858f103ca863956caa1a10ae54a36e9e79af72e9d7014c7abe5

I did the same exact thing on 2.0.0-dev and it succeeds, rootless, so please ignore my mistake regarding the github issue I gave to Suhaas.

Comment 3 Matthew Heon 2020-05-20 14:24:41 UTC
Thanks for the quick response. The 2.0 thing is interesting. We're shipping 1.9.2 in RHEL 8.2.1 - any chance you can try with that and verify if it's fixed there as well?

Comment 4 Robb Manes 2020-05-20 15:02:48 UTC
Disclaimer: previously I tried 2.0 on my ubuntu20 dev box and 1.6.4 on a rhel8 box, just because it was handy.  This time, I tried 2.0, 1.9.2, and 1.6.4 on my ubuntu20 box, and they all worked, all versions, with no errors.

---

Using strictly the RHEL8 system that can reliably reproduce the error I posted in #2, it still fails on 2.0, so again I muddied the waters there by jumping the gun.

So _all_ versions of podman fail on brand new RHEL8 installation using rootless healthchecks.

$ ./bin/podman -v
podman version 2.0.0-dev

$ ./bin/podman run -d --name healthyboi --health-cmd="/bin/sh -c 'echo hello'" --healthcheck-interval=30s --health-timeout=10s ubi8
ERRO[0000] exit status 1                                
ERRO[0000] Unit cb300d9a0ecbcd72146fcd3b058fbf48f395f7d6bb64fb03061c2796659512cb.service not found. 
cb300d9a0ecbcd72146fcd3b058fbf48f395f7d6bb64fb03061c2796659512cb

Debugging further on my system...

Comment 5 Robb Manes 2020-05-20 15:26:50 UTC
Debug logging:

$ /usr/bin/podman --log-level=debug run -d --name healthyboi --health-cmd="/bin/sh -c 'echo hello'" --healthcheck-interval=30s --health-timeout=10s ubi8
- - - - 8< - - - -
DEBU[0000] /usr/bin/conmon messages will be logged to syslog 
DEBU[0000] running conmon: /usr/bin/conmon               args="[--api-version 1 -c 54866c61dafaa28ce372465e03f6f9162940252adfa8132c6eed92866c44834c -u 54866c61dafaa28ce372465e03f6f9162940252adfa8132c6eed92866c44834c -r /usr/bin/runc -b /home/rmanes/.local/share/containers/storage/overlay-containers/54866c61dafaa28ce372465e03f6f9162940252adfa8132c6eed92866c44834c/userdata -p /run/user/1000/overlay-containers/54866c61dafaa28ce372465e03f6f9162940252adfa8132c6eed92866c44834c/userdata/pidfile -l k8s-file:/home/rmanes/.local/share/containers/storage/overlay-containers/54866c61dafaa28ce372465e03f6f9162940252adfa8132c6eed92866c44834c/userdata/ctr.log --exit-dir /run/user/1000/libpod/tmp/exits --socket-dir-path /run/user/1000/libpod/tmp/socket --log-level debug --syslog --conmon-pidfile /run/user/1000/overlay-containers/54866c61dafaa28ce372465e03f6f9162940252adfa8132c6eed92866c44834c/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/rmanes/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1000 --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mount_program=/usr/bin/fuse-overlayfs --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 54866c61dafaa28ce372465e03f6f9162940252adfa8132c6eed92866c44834c]"
[conmon:d]: failed to write to /proc/self/oom_score_adj: Permission denied

DEBU[0000] Received: 111750                             
INFO[0000] Got Conmon PID as 111739                     
DEBU[0000] Created container 54866c61dafaa28ce372465e03f6f9162940252adfa8132c6eed92866c44834c in OCI runtime 
DEBU[0000] creating systemd-transient files: systemd-run [--user --unit 54866c61dafaa28ce372465e03f6f9162940252adfa8132c6eed92866c44834c --on-unit-inactive=30s --timer-property=AccuracySec=1s /usr/bin/podman healthcheck run 54866c61dafaa28ce372465e03f6f9162940252adfa8132c6eed92866c44834c] 
ERRO[0000] exit status 1                                
DEBU[0000] Starting container 54866c61dafaa28ce372465e03f6f9162940252adfa8132c6eed92866c44834c with command [/bin/bash] 
DEBU[0000] Started container 54866c61dafaa28ce372465e03f6f9162940252adfa8132c6eed92866c44834c 
ERRO[0000] Unit 54866c61dafaa28ce372465e03f6f9162940252adfa8132c6eed92866c44834c.service not found. 
54866c61dafaa28ce372465e03f6f9162940252adfa8132c6eed92866c44834c

Looks like systemd-run is the one having issues:

$ ~/go/bin/dlv exec /usr/bin/podman -- run -d --name healthyboi --health-cmd="/bin/sh -c 'echo hello'" --healthcheck-interval=30s --health-timeout=10s ubi8

(dlv) b libpod.Container.createTimer
Breakpoint 1 set at 0x557e6e6b2faa for github.com/containers/libpod/libpod.(*Container).createTimer() /usr/src/debug/podman-1.6.4-11.module+el8.2.0+6368+cf16aa14.x86_64/_build/src/github.com/containers/libpod/libpod/healthcheck_linux.go:52(dlv) continue
> github.com/containers/libpod/libpod.(*Container).createTimer() /usr/src/debug/podman-1.6.4-11.module+el8.2.0+6368+cf16aa14.x86_64/_build/src/github.com/containers/libpod/libpod/healthcheck_linux.go:52 (hits goroutine(1):1 total:1) (PC: 0x5555e21f8faa)
Warning: debugging optimized function
    47:		}
    48:		return dbus.NewSystemdConnection()
    49:	}
    50:	
    51:	// createTimer systemd timers for healthchecks of a container
=>  52:	func (c *Container) createTimer() error {
    53:		if c.disableHealthCheckSystemd() {
    54:			return nil
    55:		}
    56:		podman, err := os.Executable()
    57:		if err != nil {

Stepped through a bit...

(dlv) continue
> github.com/containers/libpod/libpod.(*Container).createTimer() /usr/src/debug/podman-1.6.4-11.module+el8.2.0+6368+cf16aa14.x86_64/_build/src/github.com/containers/libpod/libpod/healthcheck_linux.go:67 (hits goroutine(1):1 total:1) (PC: 0x55725d35c2c3)
Warning: debugging optimized function
    62:		if rootless.IsRootless() {
    63:			cmd = append(cmd, "--user")
    64:		}
    65:		cmd = append(cmd, "--unit", c.ID(), fmt.Sprintf("--on-unit-inactive=%s", c.HealthCheckConfig().Interval.String()), "--timer-property=AccuracySec=1s", podman, "healthcheck", "run", c.ID())
    66:	
=>  67:		conn, err := getConnection()
    68:		if err != nil {
    69:			return errors.Wrapf(err, "unable to get systemd connection to add healthchecks")
    70:		}
    71:		conn.Close()
    72:		logrus.Debugf("creating systemd-transient files: %s %s", "systemd-run", cmd)

(dlv) p cmd
[]string len: 9, cap: 9, [
	"--user",
	"--unit",
	"2df5e01ded4a2063532c203d30fa1d1877f3bf7944de467a084ac23580197fe5",
	"--on-unit-inactive=30s",
	"--timer-property=AccuracySec=1s",
	"/usr/bin/podman",
	"healthcheck",
	"run",
	"2df5e01ded4a2063532c203d30fa1d1877f3bf7944de467a084ac23580197fe5",
]

(dlv) b 74
Breakpoint 6 set at 0x55725d35c3dd for github.com/containers/libpod/libpod.(*Container).createTimer() /usr/src/debug/podman-1.6.4-11.module+el8.2.0+6368+cf16aa14.x86_64/_build/src/github.com/containers/libpod/libpod/healthcheck_linux.go:74

(dlv) p systemdRun
*os/exec.Cmd {
	Path: "/usr/bin/systemd-run",
	Args: []string len: 10, cap: 10, [
		"systemd-run",
		"--user",
		"--unit",
		"2df5e01ded4a2063532c203d30fa1d1877f3bf7944de467a084ac23580197fe5",
		"--on-unit-inactive=30s",
		"--timer-property=AccuracySec=1s",
		"/usr/bin/podman",
		"healthcheck",
		"run",
		"2df5e01ded4a2063532c203d30fa1d1877f3bf7944de467a084ac23580197fe5",
	],
	Env: []string len: 0, cap: 0, nil,
	Dir: "",
	Stdin: io.Reader nil,
	Stdout: io.Writer nil,
	Stderr: io.Writer nil,
	ExtraFiles: []*os.File len: 0, cap: 0, nil,
	SysProcAttr: *syscall.SysProcAttr nil,
	Process: *os.Process nil,
	ProcessState: *os.ProcessState nil,
	ctx: context.Context nil,
	lookPathErr: error nil,
	finished: false,
	childFiles: []*os.File len: 0, cap: 0, nil,
	closeAfterStart: []io.Closer len: 0, cap: 0, nil,
	closeAfterWait: []io.Closer len: 0, cap: 0, nil,
	goroutine: []func() error len: 0, cap: 0, nil,
	errch: chan error {},
	waitDone: chan struct {} {},}

At this moment, since this has already run in the debugger, I went to see if the unit `2df5e01ded4a2063532c203d30fa1d1877f3bf7944de467a084ac23580197fe5` existed, and it did not:

$ systemctl status 2df5e01ded4a2063532c203d30fa1d1877f3bf7944de467a084ac23580197fe5.service
Unit 2df5e01ded4a2063532c203d30fa1d1877f3bf7944de467a084ac23580197fe5.service could not be found.

So perhaps the unit isn't being created?  Is it supposed to be created by systemd-run or earlier?  Still looking.

Comment 6 Matthew Heon 2020-05-20 15:32:39 UTC
Can you see if `systemctl --user` works? For rootless Podman, we interact with the user's systemd instance.

Comment 7 Robb Manes 2020-05-20 15:38:47 UTC
When I run a run command manually, it makes the unit on the RHEL8 system without issues:

$ systemd-run --user --unit myunit --on-unit-inactive=30s --timer-property=AccuracySec=1s echo 'hello'
Running timer as unit: myunit.timer
Will run service as unit: myunit.service

$ systemctl list-units --user | grep my
myunit.timer                         loaded active     elapsed   /usr/bin/echo hello

Even when I run an identical command it works:

$ systemd-run --user --unit 2df5e01ded4a2063532c203d30fa1d1877f3bf7944de467a084ac23580197fe5 --on-unit-inactive=30s --timer-property=AccuracySec=1s /usr/bin/podman healthcheck run 2df5e01ded4a2063532c203d30fa1d1877f3bf7944de467a084ac23580197fe5
Running timer as unit: 2df5e01ded4a2063532c203d30fa1d1877f3bf7944de467a084ac23580197fe5.timer
Will run service as unit: 2df5e01ded4a2063532c203d30fa1d1877f3bf7944de467a084ac23580197fe5.service

$ systemctl list-units --user | grep 2df5e01ded4a2063532c203d30fa1d1877f3bf7944de467a084ac23580197fe5.timer
2df5e01ded4a2063532c203d30fa1d1877f3bf7944de467a084ac23580197fe5.timer loaded active     elapsed   /usr/bin/podman healthcheck run 2df5e01ded4a2063532c203d30fa1d1877f3bf7944de467a084ac23580197fe5

But when I run it in podman, it definitely isn't created, or the `podman healthcheck` is removing it far too fast for me to see when it fires once.

Comment 8 Robb Manes 2020-05-20 16:50:22 UTC
The healthcheck isn't removing the systemd unit, it doesn't appear to be created at all.  I've been trying to debug this a bit more with dlv and it seems to be failing at different times during os/exec.Cmd.Start when we try to launch systemd-run, but I can't seem to have it reliably fail in the same place every time.  I'm still not sure why our systemd-run would be failing.

Comment 9 Robb Manes 2020-05-20 16:59:29 UTC
(dlv) b libpod.Container.createTimer
Breakpoint 1 set at 0x560333295faa for github.com/containers/libpod/libpod.(*Container).createTimer() /usr/src/debug/podman-1.6.4-11.module+el8.2.0+6368+cf16aa14.x86_64/_build/src/github.com/containers/libpod/libpod/healthcheck_linux.go:52

(dlv) c
> github.com/containers/libpod/libpod.(*Container).createTimer() /usr/src/debug/podman-1.6.4-11.module+el8.2.0+6368+cf16aa14.x86_64/_build/src/github.com/containers/libpod/libpod/healthcheck_linux.go:52 (hits goroutine(1):1 total:1) (PC: 0x560333295faa)
Warning: debugging optimized function
    47:		}
    48:		return dbus.NewSystemdConnection()
    49:	}
    50:	
    51:	// createTimer systemd timers for healthchecks of a container
=>  52:	func (c *Container) createTimer() error {
    53:		if c.disableHealthCheckSystemd() {
    54:			return nil
    55:		}
    56:		podman, err := os.Executable()
    57:		if err != nil {

(dlv) b 75
Breakpoint 3 set at 0x5603332963f0 for github.com/containers/libpod/libpod.(*Container).createTimer() /usr/src/debug/podman-1.6.4-11.module+el8.2.0+6368+cf16aa14.x86_64/_build/src/github.com/containers/libpod/libpod/healthcheck_linux.go:75

(dlv) c
> github.com/containers/libpod/libpod.(*Container).createTimer() /usr/src/debug/podman-1.6.4-11.module+el8.2.0+6368+cf16aa14.x86_64/_build/src/github.com/containers/libpod/libpod/healthcheck_linux.go:75 (hits goroutine(1):1 total:1) (PC: 0x5603332963f0)
Warning: debugging optimized function
    70:		}
    71:		conn.Close()
    72:		logrus.Debugf("creating systemd-transient files: %s %s", "systemd-run", cmd)
    73:		systemdRun := exec.Command("systemd-run", cmd...)
    74:		_, err = systemdRun.CombinedOutput()
=>  75:		if err != nil {
    76:			return err
    77:		}
    78:		return nil
    79:	}
    80:	

(dlv) p err.ProcessState.status
256

So I am not sure why we get this returned to us, if I print the systemdRun values they all look exactly what I ran at the command line, unless I'm missing something.

Comment 10 Giuseppe Scrivano 2020-05-20 19:36:04 UTC
a newer version of systemd than the one available on RHEL 8.1 is required.

The reproducer for the issue is "unshare -r systemd-run --user echo it works", can you verify if it works fine for you?

probably systemd is missing this patch: https://github.com/systemd/systemd/commit/1ed4723d38cd0d1423c8fe650f90fa86007ddf55

I am moving the issue to systemd for further triaging.

Comment 11 Robb Manes 2020-05-21 12:16:02 UTC
(In reply to Giuseppe Scrivano from comment #10)
> The reproducer for the issue is "unshare -r systemd-run --user echo it
> works", can you verify if it works fine for you?

It fails on RHEL 8.2:

$ rpm -q systemd
systemd-239-30.el8_2.x86_64

$ unshare -r systemd-run --user echo it works
Failed to start transient service unit: Access denied

And works upstream:

$ unshare -r systemd-run --user echo it works
Running as unit: run-rf9711da4a745461599342f6b1592f3aa.service

So you're probably right, thank you!

Comment 13 Lukáš Nykrýn 2020-06-08 14:01:37 UTC
fix merged to github master branch -> https://github.com/systemd-rhel/rhel-8/pull/102

Comment 20 errata-xmlrpc 2020-11-04 02:02:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: systemd security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4553


Note You need to log in before you can comment on or make changes to this bug.