Bug 1897579

Summary:	rootless podman with cgroupV2 does not work
Product:	Red Hat Enterprise Linux 8	Reporter:	philipp.vymazal
Component:	systemd	Assignee:	Michal Sekletar <msekleta>
Status:	CLOSED MIGRATED	QA Contact:	Frantisek Sumsal <fsumsal>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	8.3	CC:	admiller, ajia, amarirom, aperotti, bbaude, ben.webber, bhenders, chhuang, computerboyzack, cpippin, David.Taylor, dornelas, dshumake, dtrainor, dwalsh, fadamo, fgiloux, gonzalo.vera, gscrivan, hartsjc, hasuzuki, hdaems, hytszk2, jaykim, jligon, jnovy, john.meyer, jwesterl, kajtzu, kmoriwak, kyoneyam, leiwang, lkolacek, lsm5, markmc, mheon, mhofmann, mhostinsky, msekleta, ossman, peter.kjellstrom, pthomas, qguo, richard.shaw, snangare, steveb, systemd-maint-list, tsweeney, umohnani, vasmith, wwurzbac, ypu, yujiang, zbyszek
Target Milestone:	rc	Keywords:	MigratedToJIRA, Reopened
Target Release:	8.0	Flags:	pm-rhel: mirror+
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-09-21 11:13:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1947432, 1960948, 2035227

Description philipp.vymazal 2020-11-13 14:03:09 UTC

Description of problem:
running podman in rootless mode (as user) with ubi8-init (systemd inside container) does not work. 
"The issue seems to be in podman setting a default pids limit, but the pids controller is not enabled by systemd for unprivileged users"

Version-Release number of selected component (if applicable):
$ podman version
Version:      2.0.5
API Version:  1
Go Version:   go1.14.7
Built:        Wed Sep 23 18:18:02 2020
OS/Arch:      linux/amd64

$ lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterprise
Description:    Red Hat Enterprise Linux release 8.3 (Ootpa)
Release:        8.3
Codename:       Ootpa


How reproducible:
1) enable cgroup2: add cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1 to kernel parameter + reboot
2) set subuid / subgid for the user to be running podman rootless
3) sysctl increase namespaces: /proc/sys/user/max_user_namespaces
4) run podman as user in question: podman run --name=ubi-init-test1 --cgroup-manager=systemd -it --rm --systemd=true ubi8-init


Steps to Reproduce:
1. enable cgroup2 + set subuid/subguid + set namespaces 
2. reboot
3. podman as user in question: podman run --name=ubi-init-test1 --cgroup-manager=systemd -it --rm --systemd=true ubi8-init

Actual results:
Error: writing file `/sys/fs/cgroup/user.slice/user-992.slice/user/cgroup.subtree_control`: No such file or directory: OCI runtime command not found error

 ls -l /sys/fs/cgroup/user.slice/user-992.slice/user/cgroup.subtree_control
-rw-r--r--. 1 gitlab-runner gitlab-runner 0 Nov 13 14:57 /sys/fs/cgroup/user.slice/user-992.slice/user/cgroup.subtree_control



Expected results:
working container / running container


Additional info:
we also added the following, but it seems to have no effect/not being honored:

$ cat /etc/systemd/system/user@.service.d/delegate.conf
[Service]
Delegate=memory pids 

-> delegation seems to not work


it can be made to work if the active session scope is chowned (by root) to the user id in question
example:
/usr/bin/chown -R 992 /sys/fs/cgroup/user.slice/user-992.slice/session-225.scope

podman info
host:
  arch: amd64
  buildahVersion: 1.15.1
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.20-2.module+el8.3.0+8221+97165c3f.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.20, commit: 77ce9fd1e61ea89bd6cdc621b07446dd9e80e5b6'
  cpus: 2
  distribution:
    distribution: '"rhel"'
    version: "8.3"
  eventLogger: file
  hostname: vie02s807
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 987
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 992
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 4.18.0-240.1.1.el8_3.x86_64
  linkmode: dynamic
  memFree: 182124544
  memTotal: 4118835200
  ociRuntime:
    name: crun
    package: crun-0.14.1-2.module+el8.3.0+8221+97165c3f.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.14.1
      commit: 598ea5e192ca12d4f6378217d3ab1415efeddefa
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/user/992/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.4-2.module+el8.3.0+8221+97165c3f.x86_64
    version: |-
      slirp4netns version 1.1.4
      commit: b66ffa8e262507e37fca689822d23430f3357fe8
      libslirp: 4.3.1
      SLIRP_CONFIG_VERSION_MAX: 3
  swapFree: 1072676864
  swapTotal: 1073737728
  uptime: 46h 9m 29.12s (Approximately 1.92 days)
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
  - registry.gitlab.com
store:
  configFile: /home/gitlab-runner/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.1.2-3.module+el8.3.0+8221+97165c3f.x86_64
      Version: |-
        fuse-overlayfs: version 1.1.0
        FUSE library version 3.2.1
        using FUSE kernel interface version 7.26
  graphRoot: /home/gitlab-runner/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 2
  runRoot: /run/user/992/containers
  volumePath: /home/gitlab-runner/.local/share/containers/storage/volumes
version:
  APIVersion: 1
  Built: 1600877882
  BuiltTime: Wed Sep 23 18:18:02 2020
  GitCommit: ""
  GoVersion: go1.14.7
  OsArch: linux/amd64
  Version: 2.0.5

Comment 1 Matthew Heon 2020-11-13 14:12:59 UTC

Adding Giuseppe to CC - Giuseppe, mind taking a peek at this one?

Comment 2 Giuseppe Scrivano 2020-11-13 17:19:46 UTC

I've not managed to get the pid controller enabled for unprivileged users on RHEL 8.3.

I've tried with a conf file like:

$ cat /etc/systemd/system/user@.service.d/delegate.conf
[Service]
Delegate=memory pids 


but it seems to not make any effect.

Zbigniew, is it supposed to work on RHEL like on Fedora?

Comment 3 Tom Sweeney 2020-11-13 23:16:45 UTC

Giuseppe can you take a look please?

Comment 4 philipp.vymazal 2020-11-17 09:50:10 UTC

*bump

looking at the timestamps of the messages, it may be that some may have been "missed".

@Sweeney: Giuseppe already took a look it seems -> see his currently open question.
it would be good to know if this should work (according to the redhat blog: https://www.redhat.com/en/blog/world-domination-cgroups-rhel-8-welcome-cgroups-v2) -> it seems it should work? or if we are doing something wrong or this is a bug. 

best regards, 
philipp

Comment 5 Tom Sweeney 2020-11-17 22:30:21 UTC

Will try to needinfo Giuseppe at his right email.

Comment 6 Giuseppe Scrivano 2020-11-18 19:59:20 UTC

@Tom, I saw your message.  If you look above it, I've commented (https://bugzilla.redhat.com/show_bug.cgi?id=1897579#c2).

The issue happens because the pids controller is not enabled for the unprivileged user and even the explicit config file doesn't seem to enable it.

We need to check with the systemd folks why it behaves different than on Fedora.  If we cannot enable at least the pids and memory controller we cannot use rootless cgroups even with cgroup v2.

Comment 8 philipp.vymazal 2020-12-11 13:11:32 UTC

hello, 

thanks Giuseppe Scrivano for the bump, wanted to ask already if there has been any progress in this regard. we would like to use rootless containers with ubi8-init but this is a showstopper. knowing already if it *should* work on rhel8 (and thus might in the future) -> would already be a big help in deciding howto proceed with this project. 

best regards, 
philipp

Comment 9 Tom Sweeney 2020-12-11 20:06:38 UTC

Giuseppe anything further here?

Scott, I've cc'd you here to in case you've $.02.

Comment 10 Giuseppe Scrivano 2020-12-12 15:29:29 UTC

hi Philipp,

the issue is still being investigated with the systemd team.

Regards,
Giuseppe

Comment 11 philipp.vymazal 2020-12-21 10:30:43 UTC

Hello Giuseppe, 

Thanks! Will bump in the new year again -> any time (gu)estimates if possible are welcome as well. 
Until then, wishing marry xmas and happy new year! 

Best Regards, 
Philipp

Comment 12 John Meyer 2020-12-30 02:36:55 UTC

Hi all,

I was running into this issue as well and I think I discovered a workaround - assuming /etc/systemd/system/user@.service.d/delegate.conf is configured as above.  Simply running sudo systemctl daemon-reload after (every) boot enables the configured controllers.  Seems systemctl daemon-reload is loading /etc/systemd/system/user@.service.d/delegate.conf, but this same file is not loaded on boot.  I have tested this on a fresh 8.3 system as well as an up to date centos stream 8 vm.  Both see the same issue and respond to the same workaround.  Hopefully this is helpful - I am very new to cgroups v2.

Comment 13 philipp.vymazal 2020-12-30 09:21:35 UTC

Hello John,

I can confirm that this workaround also works for me. Thanks, this is very helpfull indeed! Will update again when i tested some more, next year ;)
Happy new year!

Comment 14 Zbigniew Jędrzejewski-Szmek 2020-12-30 10:53:24 UTC

Hi Giuseppe,

sorry, I don't know about the details in RHEL — I have literally never seen the spec file for RHEL or centos or now centos stream.
I think Michal is the right person to ask. He'll be back from PTO next week, so he should see the needinfo then.

Comment 19 philipp.vymazal 2021-02-02 11:22:08 UTC

hello together, 

just bumping to see if mayne new info has become available (outside this report) + so the report isnt lost in the void :)

thanks in advance!

Comment 20 John Meyer 2021-02-02 21:44:48 UTC

Philipp,

I tried setting up a daemon reload on boot via a unit file, but that was not successful (at least when set to execute Before=systemd-logind.service).  In that process I did discover this post: https://unix.stackexchange.com/a/625079.  That gets me a working system at boot, but I am somewhat worried about the performance concerns of enabling the CPU and IO accounting.  I tried it without these, but it did not work.  In a support case I asked RedHat about other options that would not enable CPU and IO accounting, but nothing has come through just yet.

Hopefully we get more info soon,

John

Comment 21 philipp.vymazal 2021-02-04 14:59:56 UTC

Hello John,

Thanks again for your input! I did run accross that post as well and use it currently. Not yet too worried about the overhead from enabling CPU and IO accounting, both are features I am/was kinda waiting for the be available in RHEL too. I suppose i may do some flame graphs or similiar to see if it has a significant impact, but for my use-cases, that is most likely not a real problem/real consideration (yet). 

Currently also considering moving to Fedora for container/appnodes, since I really want to be more bleeding edge in regards to podman. Building podman and its ecosystem may be too much effort, looking really towards podman 2.2.x, which i hope/maybe will be in 8.4? Then again, podman 3.0 is even more interesting and I doubt that it will see the light in Redhat8 (?).

Hoping for more info too, 

Philipp

Comment 22 Daniel Walsh 2021-02-04 17:37:21 UTC

podman 2.2.1 will be in RHEL8.3.1 in a couple of weeks.

podman 3.0.0 will be in RHEL8.4 In May.

Comment 23 philipp.vymazal 2021-02-04 20:33:14 UTC

(In reply to Daniel Walsh from comment #22)
> podman 2.2.1 will be in RHEL8.3.1 in a couple of weeks.
> 
> podman 3.0.0 will be in RHEL8.4 In May.


That is unexpected and most welcome news, the best kind of news!

Comment 24 Pierre Ossman 2021-02-25 08:27:37 UTC

What is the current status here? We're being bitten by this bug and is causing deployment issues with jenkins.

And what will be the fix? That we won't need to run daemon-reload? Or that we also won't need the delegation configuration?

Comment 25 Tom Sweeney 2021-02-25 15:51:35 UTC

Giuseppe or Dan, can you handle the questions in Pierre's last comment please?

Comment 26 Giuseppe Scrivano 2021-02-25 15:58:25 UTC

I believe the same question was answered here: https://github.com/containers/podman/issues/9410

Comment 27 Tom Sweeney 2021-03-02 22:28:57 UTC

*** Bug 1932739 has been marked as a duplicate of this bug. ***

Comment 28 Jindrich Novy 2021-03-04 07:59:27 UTC

*** Bug 1732957 has been marked as a duplicate of this bug. ***

Comment 29 David Taylor 2021-05-19 10:32:41 UTC

Having updated to RHEL 8.4 (and podman 3.0.1) I am still seeing this issue.

The workaround sort of works, but I end up with some worrying messages in dmesg:

[    4.098612] systemd[1]: /etc/systemd/system/user-0.slice:7: Failed to assign slice user.slice to unit user-0.slice, ignoring: Invalid argument
[    4.100042] systemd[1]: /etc/systemd/system/user-.slice.d/override.conf:4: Failed to assign slice user.slice to unit user-0.slice, ignoring: Invalid argument

Comment 30 Tom Sweeney 2021-08-06 15:41:23 UTC

*** Bug 1989481 has been marked as a duplicate of this bug. ***

Comment 31 Michal Sekletar 2021-12-20 13:00:50 UTC

I think this works today (with cgroupv2) as expected. I've tried following with latest systemd on RHEL-8 (systemd-239-54.el8.x86_64),

# Switch to cgroup-v2
grubby --update-kernel=ALL --args='systemd.unified_cgroup_hierarchy'
reboot

# Setup test user account and cgroup controller delegation
useradd test
echo redhat | passwd test --stdin
mkdir /etc/systemd/system/user@.service.d
cat > /etc/systemd/system/user@.service.d/delegate.conf<<EOF
[Service]
Delegate=cpu cpuset io memory pids
EOF
systemctl daemon-reload

# SSH as test user and run unprivileged container
ssh test@localhost
podman login registry.redhat.io
podman run --name=ubi-init-test1 --cgroup-manager=systemd -it --rm --systemd=true ubi8-init

In case you can still reproduce the issue please reopen the BZ.

Comment 32 Victoria 2022-01-21 18:34:47 UTC

I don't understand why this bug has been closed.  Can we reopen it please.

I am using cgroup v2 and I created the delegate.conf file (as shown in Michal's comment 31 above, 2021-12-20 13:00:50 UTC).  Upon reboot, my system is missing the PID cgroup controller.

If I run systemctl daemon-reload, the controller might be created -- but not always.  I cannot force the daemon-reload to fail to create the controller, and I do not know what to look for to find out why the failure is occurring.

Mine is an up to date RHEL8.5 system with systemd 239-51.el8_5.3

Is there a cgroup/systemd person who can enlighten me?

1. systemctl daemon-reload shouldn't have to be run after every reboot to force a re-read of /etc/systemd/system/user@.service.d/delegate.conf
2. systemctl deamon-reload shouldn't randomly fail to create the cgroup controllers listed in delegate.conf when it is manually invoked 


Reload fails after reboot:

$ cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/user@$(id -u).service/cgroup.controllers
$ podman run hello-world
Error: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: open /sys/fs/cgroup/user.slice/user-49028.slice/user/user.slice/libpod-ffce46ed07dd23c218308f83ef7d77b82a1fde6e4f8ecfcc64682051ff763c09.scope/pids.max: no such file or directory: OCI runtime attempted to invoke a command that was not found
$ sudo systemctl daemon-reload
$ cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/user@$(id -u).service/cgroup.controllers
$ podman run hello-world
Error: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: open /sys/fs/cgroup/user.slice/user-49028.slice/user/user.slice/libpod-c6cc46685210cbaef6ad3c835570a97629079bd14de0ff8f5223350e5e239449.scope/pids.max: no such file or directory: OCI runtime attempted to invoke a command that was not found
$ sudo systemctl reboot 

Reload works after reboot:
$ cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/user@$(id -u).service/cgroup.controllers
$ podman run hello-world
Error: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: open /sys/fs/cgroup/user.slice/user-49028.slice/user/user.slice/libpod-f67a71068a3c983b7249776800184563a6cd9570a3e8df3ff3ecfb3b6c680561.scope/pids.max: no such file or directory: OCI runtime attempted to invoke a command that was not found
$ sudo systemctl daemon-reload
$ cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/user@$(id -u).service/cgroup.controllers
cpuset io memory pids
$ podman run hello-world

Hello from Docker!
[...]

Comment 33 Gonzalo Vera 2022-01-22 21:10:03 UTC

I stumbled upon the same bug with 8.5 and systemd 239-51. 
Needed to reload twice to get it working.

Comment 34 Ladislav Kolacek 2022-01-26 16:43:50 UTC

Hi, could this issue be related also to our problem with using podman-remote cpu/memory limits on RHEL8?

I tested it on OpenStack VM with RHEL8.5 installed. Package versions:
* kernel-4.18.0-348.el8.x86_64
* systemd-239-51.el8.x86_64
* podman-3.3.1-9.module+el8.5.0+12697+018f24d7.x86_64

I tested it on OpenStack VM with RHEL8.6 installed. Package versions:
* kernel-4.18.0-359.el8.x86_64
* systemd-239-55.el8.x86_64
* podman-3.4.5-0.5.module+el8.6.0+13916+cd3e3727.x86_64

and also on Vagrant VM with Fedora 34. Package versions:
* systemd-248-2.fc34.x86_64
* podman-3.4.2-1.fc34.x86_64
* On Fedora34 podman-remote command with limit options will not fail. 



For both RHEL8, I'll receive the following error for the podman-remote build:

lkolacek@lkolacek-ThinkPad-T590 osbs-test-sandwich (layered-scratch-build *$%=) $ podman --remote --connection  remote-host-rhel8.6  build   --memory=2G  --cpu-quota=1  --memory-swap=0 --no-cache --pull-always --squash -t test-image3 .
Key Passphrase: 
STEP 1/5: FROM internal-registry/foo-ubi8
Trying to pull internal-registry/foo-ubi8:latest...
Getting image source signatures
Copying blob sha256:9a8d4c274a6f2a29bb5ecd9aea13502fc64e2087e6c51fd43b45a1414aa49dbb
Copying blob sha256:d9e72d058dc507f406dc9377495e3d29ce17596f885c09d0aba6b21e10e27ce6
Copying blob sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b
Copying blob sha256:9a8d4c274a6f2a29bb5ecd9aea13502fc64e2087e6c51fd43b45a1414aa49dbb
Copying blob sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b
Copying blob sha256:d9e72d058dc507f406dc9377495e3d29ce17596f885c09d0aba6b21e10e27ce6
Copying config sha256:d10ac7253c35a20a1ac5ee54aaed8354180b7d9a7a81a74c889366e22adfd699
Writing manifest to image destination
Storing signatures
STEP 2/5: LABEL component="foo"       name="footest"       version="1.0.foo"
STEP 3/5: RUN yum install -y stress-0.18.9-1.4.el8.x86_64.rpm
error running container: error from /usr/bin/runc creating container for [/bin/sh -c yum install -y stress-0.18.9-1.4.el8.x86_64.rpm]: time="2022-01-26T10:31:48-05:00" level=warning msg="unable to get oom kill count" error="no directory specified for memory.oom_control"
time="2022-01-26T10:31:48-05:00" level=error msg="container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: cannot set memory limit: container could not join or create cgroup"
: exit status 1
Error: error building at STEP "RUN yum install -y stress-0.18.9-1.4.el8.x86_64.rpm": error while running runtime: exit status 1



I received same error if I used limit only for memory or cpu. When I removed limits completely podman-remote build worked as expected.

Thank you in advance for any help.

Comment 35 Giuseppe Scrivano 2022-02-02 08:37:21 UTC

(In reply to Michal Sekletar from comment #31)
> I think this works today (with cgroupv2) as expected. I've tried following
> with latest systemd on RHEL-8 (systemd-239-54.el8.x86_64),
> 
> # Switch to cgroup-v2
> grubby --update-kernel=ALL --args='systemd.unified_cgroup_hierarchy'
> reboot
> 
> # Setup test user account and cgroup controller delegation
> useradd test
> echo redhat | passwd test --stdin
> mkdir /etc/systemd/system/user@.service.d
> cat > /etc/systemd/system/user@.service.d/delegate.conf<<EOF
> [Service]
> Delegate=cpu cpuset io memory pids
> EOF
> systemctl daemon-reload
> 
> # SSH as test user and run unprivileged container
> ssh test@localhost
> podman login registry.redhat.io
> podman run --name=ubi-init-test1 --cgroup-manager=systemd -it --rm
> --systemd=true ubi8-init
> 
> In case you can still reproduce the issue please reopen the BZ.

Does it also work if you set a CPU limit for the container?

Comment 38 Tom Sweeney 2022-02-08 19:19:19 UTC

@amarirom can you please provide details of the environment that you're seeing the reproducer and what is happening when the issue occurs please?

Comment 39 Miroslav Hostinsky 2022-03-17 17:19:49 UTC

I added lines to the file /etc/systemd/system/user@.service.d/delegate.conf (as specified above)
For me it started to work after adding to additional file "/etc/systemd/system/user-.slice.d/override.conf" following:

[Slice]
Slice=user.slice

CPUAccounting=yes
MemoryAccounting=yes
IOAccounting=yes
TasksAccounting=yes

And reloading "systemctl daemon-reload".

Comment 40 Victoria 2022-03-30 16:43:59 UTC

Is there anything I can do/run to help debug or really fix this?  See Comment #32 for my original report.  

I have four identical (WRT hardware) servers, and I have installed and configured podman on three of them so far, and each one fails to start the cgroup controllers after reboot.  *Most* of the time I can systemctl daemon-reload so that the user@.service.d/delegate.conf file is re-read, but I prefer to have a lights-out operation where I don't have to manually do this.  (Adding a daemon-reload or two to my root crontab has not proven to be consistently successful on a different piece of hardware.) 

My next step is to add the override.conf as described in Comment #39, but I am unsure of the ramifications of implementing that.

OS: RHEL8.5 
kernel: 4.18.0-348.20.1.el8_5.x86_64
systemd: 239-51.el8_5.5 
podman info (when cgroup controllers missing):
host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers: []
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.32-1.module+el8.5.0+13852+150547f7.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.32, commit: 4b12bce835c3f8acc006a43620dd955a6a73bae0'
  cpus: 12
  distribution:
    distribution: '"rhel"'
    version: "8.5"
  eventLogger: file
  hostname: [...]

podman info (when cgroup controllers exist after systemctl daemon-reload):
host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpuset
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
[...]


Any input is appreciated.

Comment 41 Frederic Giloux 2022-04-29 09:01:45 UTC

For what it's worth beside what is mentioned here I also needed to disable the rtkit-daemon to get the cpu controller freed up, cf: https://access.redhat.com/solutions/6582021

Comment 43 Tom Sweeney 2022-06-02 18:49:38 UTC

There's a Podman BZ that I'm about to close as a dupe of this.  In a comment there: https://bugzilla.redhat.com/show_bug.cgi?id=2091505#c11, "Peter K" found:

"By chance we discovered that doing (as root) systemctl stop, start, enable on gpm.service "often" results in a system (RHEL8) that works as expected (the missing cgroup directories are correctly created.)

It seems clear to me that the systemd in RHEL8(.5,.6) does not properly handle it's cgroup-v2 duties / is buggy."

Comment 44 Tom Sweeney 2022-06-02 18:52:43 UTC

*** Bug 2091505 has been marked as a duplicate of this bug. ***

Comment 45 Peter K 2022-06-03 07:15:24 UTC

To add some of my main inputs from the BZ closed as duplicate by Tom above:

Minimum reproducer is simply: minimum vanilla RHEL8(.5 or .6) with podman and cgroup-v2 enabled then "podman run hello-world" will fail.

The constant seems to be systemd version not podman version (podman as old as 2.x on Fedora will get this right and podman as new as 4.x on RHEL will fail).

Both known workarounds (daemon-reload after changing a config and just stop/start/reload of unrelated gpm service) also seem to indicate a ordering/race type of bug in systemd.

Comment 50 Ben Webber 2022-09-09 21:16:34 UTC

I just experienced this issue on Rocky Linux 8.6 with systemd-239-58.el8_6.4 and the workaround that worked for me was creating /etc/systemd/system/user-.slice.d/override.conf with contents:

[Slice]
Slice=user.slice

CPUAccounting=yes
MemoryAccounting=yes
IOAccounting=yes
TasksAccounting=yes

And then rebooting as mentioned in comment 39.

As a side note, I did not need to create /etc/systemd/system/user@.service.d/delegate.conf or disable the rtkit-daemon

Comment 51 steveb 2022-11-25 09:57:07 UTC

My error on RockyLinux 8.6 was


Nov 24 10:41:25 noetchosts-control podman[14388]: Error: runc: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: open /sys/fs/cgroup/user.slice/user-1001.slice/user/user.slice/libpod-9318cccf13b9261bd8471fee063cc20bab965848d6ec98d4a6328d8ec8c06b1a.scope/pids.max: no such file or directory: OCI runtime attempted to invoke a command that was not found


Ben Webber's fix from comment 50 above worked, but experiments showed only a systemctl daemon-reload was required, not a reboot. And the fix survives a subsequent reboot.

$systemctl --version
systemd 239 (239-58.el8_6.8)
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=legacy

Comment 52 Zachary Sistrunk 2022-12-20 17:28:47 UTC

Confirming that I'm still running into this issue on RHEL 8.6. The workaround from https://access.redhat.com/solutions/6964319 does work, but is the underlying cause perhaps fixed with RHEL9? I don't have access to it myself to check.

> rpm --query redhat-release
redhat-release-8.6-0.1.el8.x86_64
> systemctl --version
systemd 239 (239-58.el8_6.7)
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=legacy

Comment 53 Blake Henderson 2023-01-27 12:59:08 UTC

Hello,

- The issue does seem to be resolved at least in RHEL 9.1, I no longer have a 9.0 system to test

/////Begin Test/////
- Red Hat release

    $ rpm -q redhat-release
    redhat-release-9.1-1.9.el9.x86_64

- systemd version

    $ systemctl --version
    systemd 250 (250-12.el9_1.1)
    +PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified

- There is no override file

    $ ls -al /etc/systemd/system | grep user
    drwxr-xr-x.  2 root root 4096 Jun 13  2022 multi-user.target.wants

- The container does start as rootless user 

    $ podman run -it --rm ubi8
    [root@a2e950cb6012 /]# 
/////End Test/////

Thanks,
  Blake

Comment 54 Richard Shaw 2023-06-22 13:44:21 UTC

Seeing this on RHEL 8.6, using the pid workaround.

Comment 55 Derrick Ornelas 2023-06-22 21:27:05 UTC

I was able to reproduce this on a fresh install of RHEL 8.8 following the steps from Comment 31

yum -y install crun
grubby --update-kernel=ALL --args='systemd.unified_cgroup_hierarchy'
reboot

useradd test
(set password)


# rpm -q systemd podman runc crun
systemd-239-74.el8_8.x86_64
podman-4.4.1-12.module+el8.8.0+18735+a32c1292.x86_64
runc-1.1.4-1.module+el8.8.0+18060+3f21f2cc.x86_64
crun-1.8.4-2.module+el8.8.0+18669+fa5aca5a.x86_64

# mkdir /etc/systemd/system/user@.service.d
# cat > /etc/systemd/system/user@.service.d/delegate.conf<<EOF
[Service]
Delegate=cpu cpuset io memory pids
EOF

# reboot


ssh test@localhost

$ podman info
host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers: []
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
	package: conmon-2.1.6-1.module+el8.8.0+18098+9b44df5f.x86_64
	path: /usr/bin/conmon
	version: 'conmon version 2.1.6, commit: 8c4ab5a095127ecc96ef8a9c885e0e1b14aeb11b'
  cpuUtilization:
	idlePercent: 97.45
	systemPercent: 0.93
	userPercent: 1.61
  cpus: 2
  distribution:
	distribution: '"rhel"'
	version: "8.8"
  eventLogger: file
  hostname: rhel8
  idMappings:
	gidmap:
	- container_id: 0
	  host_id: 1000
	  size: 1
	- container_id: 1
	  host_id: 100000
	  size: 65536
	uidmap:
	- container_id: 0
	  host_id: 1000
	  size: 1
	- container_id: 1
	  host_id: 100000
	  size: 65536
  kernel: 4.18.0-477.13.1.el8_8.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 3635621888
  memTotal: 4112072704
  networkBackend: cni
  ociRuntime:
	name: runc
	package: runc-1.1.4-1.module+el8.8.0+18060+3f21f2cc.x86_64
	path: /usr/bin/runc
	version: |-
	  runc version 1.1.4
	  spec: 1.0.2-dev
	  go: go1.19.4
	  libseccomp: 2.5.2
  os: linux
  remoteSocket:
	path: /run/user/1000/podman/podman.sock
  security:
	apparmorEnabled: false
	capabilities: CAP_SYS_CHROOT,CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID
	rootless: true
	seccompEnabled: true
	seccompProfilePath: /usr/share/containers/seccomp.json
	selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
	executable: /usr/bin/slirp4netns
	package: slirp4netns-1.2.0-2.module+el8.8.0+18060+3f21f2cc.x86_64
	version: |-
	  slirp4netns version 1.2.0
	  commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
	  libslirp: 4.4.0
	  SLIRP_CONFIG_VERSION_MAX: 3
	  libseccomp: 2.5.2
[...]
store:
  configFile: /home/test/.config/containers/storage.conf
  containerStore:
	number: 0
	paused: 0
	running: 0
	stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/test/.local/share/containers/storage
  graphRootAllocated: 18238930944
  graphRootUsed: 2716815360
  graphStatus:
	Backing Filesystem: xfs
	Native Overlay Diff: "true"
	Supports d_type: "true"
	Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
	number: 0
  runRoot: /run/user/1000/containers
[...]


$ podman --runtime=runc run --name=ubi-init-test1 --cgroup-manager=systemd -it --rm --systemd=true registry.access.redhat.com/ubi8-init
Trying to pull registry.access.redhat.com/ubi8-init:latest...
Getting image source signatures
Checking if image destination supports signatures
Copying blob 04fdd1866203 done  
Copying blob 0fa65fe5c23e done  
Copying config 67ab454674 done  
Writing manifest to image destination
Storing signatures
Error: runc: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: openat2 /sys/fs/cgroup/user.slice/user-1000.slice/user/user.slice/libpod-b8321c1ccaf23ecdbf2b114fcd6fc4967370cdc6d48eb05a1610845a3024646a.scope/pids.max: no such file or directory: OCI runtime attempted to invoke a command that was not found


$ podman --runtime=crun run --name=ubi-init-test1 --cgroup-manager=systemd -it --rm --systemd=true registry.access.redhat.com/ubi8-init
Error: OCI runtime error: crun: the requested cgroup controller `pids` is not available


$ systemctl cat user 
# /usr/lib/systemd/system/user@.service
#  SPDX-License-Identifier: LGPL-2.1+
#
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=User Manager for UID %i
After=systemd-user-sessions.service
After=user-runtime-dir@%i.service
Requires=user-runtime-dir@%i.service

[Service]
User=%i
PAMName=systemd-user
Type=notify
ExecStart=-/usr/lib/systemd/systemd --user
Slice=user-%i.slice
KillMode=mixed
Delegate=pids memory
TasksMax=infinity
TimeoutStopSec=120s

# /etc/systemd/system/user@.service.d/delegate.conf
[Service]
Delegate=cpu cpuset io memory pids


But, based on Comment 50 and https://access.redhat.com/solutions/5913671 I'm able to get this to work with just the Slice options

$ head -999 /etc/systemd/system/user@.service.d/delegate.conf  /etc/systemd/system/user-.slice.d/override.conf
head: cannot open '/etc/systemd/system/user@.service.d/delegate.conf' for reading: No such file or directory
==> /etc/systemd/system/user-.slice.d/override.conf <==
[Slice]
Slice=user.slice

CPUAccounting=yes
MemoryAccounting=yes
IOAccounting=yes
TasksAccounting=yes


$ podman --runtime=crun run --name=ubi-init-test1 --cgroup-manager=systemd -it --rm --systemd=true registry.access.redhat.com/ubi8-init
systemd 239 (239-74.el8_8) running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=legacy)
Detected virtualization container-other.
Detected architecture x86-64.

Welcome to Red Hat Enterprise Linux 8.8 (Ootpa)!

Set hostname to <a09fec2711c7>.
Initializing machine ID from container UUID.
[...]


Michal, can you help us understand what is expected here? Are the slice settings the right answer for RHEL 8 cgroupsv2 + podman? Something to keep in mind is that this bug reported started with systemd in a container (a somewhat rare use-case), but AFAICT this actually prevents running any rootless container with cgroupsv2 on RHEL 8 (with podman's default pids-limit value).

Comment 59 Kazuo Moriwaka 2023-08-02 08:06:12 UTC

Hello, 

In my environment, with systemd LogLevel=debug, systemd outputs following error. This means systemd failed initialization of cgroup v2.
systemd[1]: user-1000.slice: Failed to enable controller memory for /user.slice/user-1000.slice (/sys/fs/cgroup/user.slice/user-1000.slice/cgroup.subtree_control): No such file or directory

This problem seems to be related systemd's cgroup v2 initalization problem.
https://github.com/systemd/systemd/issues/9512

And it is fixed in following upstream commit.
https://github.com/poettering/systemd/commit/fed374a05e6f0e4864eac99ab29c062a52e0d97a

Thanks,

Comment 61 RHEL Program Management 2023-09-21 11:08:22 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 62 RHEL Program Management 2023-09-21 11:13:21 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.