Bug 1747933

Summary: systemd does not work with podman and cgroupsV2
Product: [Fedora] Fedora Reporter: Lukas Slebodnik <lslebodn>
Component: crunAssignee: Giuseppe Scrivano <gscrivan>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 31CC: bbaude, dwalsh, frantisek.kluknavsky, gscrivan, jnovy, lsm5, mheon, santiago, splinux25
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: crun-0.9.1-1.fc31 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-19 14:30:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lukas Slebodnik 2019-09-02 09:50:15 UTC
Description of problem:
systemd does not work with podman in fedora 31 due to switching to cgroupsV2

Version-Release number of selected component (if applicable):
sh$ rpm -q podman crun
podman-1.5.1-2.17.dev.gitce64c14.fc31.x86_64
crun-0.8-1.fc31.x86_64

How reproducible:
Deterministic

Steps to Reproduce:
1. dnf install -y podman
2. podman pull registry.access.redhat.com/rhel7-init
3. 
podman run --name test -d registry.access.redhat.com/rhel7-init:latest && sleep 10 && podman exec test systemctl status

Actual results:
sh# podman run --name test -d registry.access.redhat.com/rhel7-init:latest && sleep 10 && podman exec test systemctl status
c8567461948439bce72fad3076a91ececfb7b14d469bfa5fbc32c6403185beff
Failed to get D-Bus connection: Operation not permitted
Error: non zero exit code: 1: OCI runtime error

Expected results:
sh# podman run --name test -d registry.access.redhat.com/rhel7-init:latest && sleep 10 && podman exec test systemctl status
6cda1824877a36e019c80528048d24ee5152c38bcb0cec7625f863669dd2881a

● 6cda1824877a
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Mon 2019-09-02 09:47:06 UTC; 10s ago
   CGroup: /machine.slice/libpod-6cda1824877a36e019c80528048d24ee5152c38bcb0cec7625f863669dd2881a.scope
           ├─ 1 /sbin/init
           ├─29 systemctl status
           └─system.slice
             ├─systemd-journald.service
             │ └─18 /usr/lib/systemd/systemd-journald
             └─dbus.service
               └─26 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation

Additional info:

Workaround is to disable cgroupsV2 with kernel command line parameter(systemd.unified_cgroup_hierarchy=0)

Comment 1 Daniel Walsh 2019-09-02 11:12:01 UTC
Lukas, any idea what is beling blocked?

Could you try with a --privileged container, to see if it is security blocking the creation?

Comment 2 Giuseppe Scrivano 2019-09-02 11:17:57 UTC
It requires support from systemd as well.  I don't think the version shipped with rhel7 has cgroups v2 support.

Could you try with a rhel8 image?

Also, exec with systemd containers is known to be broken on cgroups v2.  On cgroups v2 it is not possible to join a parent node, since systemd modifies the cgroup hierarchy, the exec will fail with "Device or resource busy".  I am not sure yet how to solve this issue

Comment 3 Lukas Slebodnik 2019-09-02 12:08:14 UTC
(In reply to Giuseppe Scrivano from comment #2)
> It requires support from systemd as well.  I don't think the version shipped
> with rhel7 has cgroups v2 support.
> 
> Could you try with a rhel8 image?
> 
> Also, exec with systemd containers is known to be broken on cgroups v2.  On
> cgroups v2 it is not possible to join a parent node, since systemd modifies
> the cgroup hierarchy, the exec will fail with "Device or resource busy".  I
> am not sure yet how to solve this issue


yep,

sh# podman run --name test -d registry.access.redhat.com/ubi8-init:latest && sleep 10 && podman exec test systemctl status
e01001c8e5513b603dc8d752a22789f8d945f27367ed336f4e1b151eec0e5253
Error: writing file '/sys/fs/cgroup//machine.slice/libpod-e01001c8e5513b603dc8d752a22789f8d945f27367ed336f4e1b151eec0e5253.scope/cgroup.procs': Device or resource busy: OCI runtime error

But that's quite problematic if new podman cannot run some older (rhel7/fedora/ random image from net)
with systemd. People will either disable cgroupsV2 or even will not use podman at all.

Comment 4 Lukas Slebodnik 2019-09-02 12:35:11 UTC
(In reply to Daniel Walsh from comment #1)
> Lukas, any idea what is beling blocked?
> 
> Could you try with a --privileged container, to see if it is security
> blocking the creation?

I think Giuseppe already provided an explanation but just for the record.
There is not any difference with --privileged.

Comment 5 Giuseppe Scrivano 2019-09-02 12:47:47 UTC
> But that's quite problematic if new podman cannot run some older
> (rhel7/fedora/ random image from net)
> with systemd. People will either disable cgroupsV2 or even will not use
> podman at all.

the issue only happens when the container payload tries to access cgroups v1.  It is a known issue, for example cgroups v2 adoption was/is also blocked by the Java VM that reads cgroups stats.

There is not really much Libpod can do.  Cgroups are a kernel interface, so either the container payload supports cgroups v2 or you'll need to use cgroups v1.

Comment 6 Lukas Slebodnik 2019-09-02 12:52:39 UTC
Please enhance documentation (details about systemd would be good as well.

Comment 7 Lukas Slebodnik 2019-09-02 13:06:40 UTC
Moreover I tried with rawhide container which definitely has right version of systemd and it did not help either

sh-5.0# mkdir temp
sh-5.0# cat >temp/Dockerfile <<EOF
FROM fedora:rawhide

CMD ["/sbin/init"]

STOPSIGNAL SIGRTMIN+3

RUN dnf update -y --best && dnf clean all

#mask systemd-machine-id-commit.service - partial fix for https://bugzilla.redhat.com/show_bug.cgi?id=1472439


RUN systemctl mask systemd-remount-fs.service dev-hugepages.mount sys-fs-fuse-connections.mount systemd-logind.service getty.target console-getty.service systemd-udev-trigger.service systemd-udevd.service systemd-random-seed.service systemd-machine-id-commit.service

RUN dnf -y install procps-ng && dnf clean all
EOF

sh-5.0# podman build -t fedora-init-cgroupsv2 temp/

//snip

sh-5.0# podman run --name test -d fedora-init-cgroupsv2 && sleep 10 && podman exec test systemctl status
0eefd01dfaa8d9cc5b9abe4c46f60dbc7301eb0916e2c65cac074064310763f6
Error: writing file '/sys/fs/cgroup//machine.slice/libpod-0eefd01dfaa8d9cc5b9abe4c46f60dbc7301eb0916e2c65cac074064310763f6.scope/cgroup.procs': Device or resource busy: OCI runtime error

Comment 8 Giuseppe Scrivano 2019-09-02 13:19:13 UTC
opened a PR here: https://github.com/containers/libpod/pull/3922

The error you are seeing is coming from exec.  It is a known issue with joining an existing cgroups v2, and I am still unsure how to fix it correctly.  Basically, we cannot join the initial cgroup path as it will have sub directories, so we will need to join a subdirectory.

Comment 9 Giuseppe Scrivano 2019-09-02 13:42:48 UTC
also opened a PR for crun to address the exec issue: https://github.com/containers/crun/pull/81

Comment 10 Fedora Update System 2019-09-11 21:55:12 UTC
FEDORA-2019-e53d9e7494 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-e53d9e7494

Comment 11 Fedora Update System 2019-09-12 14:44:54 UTC
crun-0.9-1.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-e53d9e7494

Comment 12 Fedora Update System 2019-09-13 14:45:46 UTC
FEDORA-2019-f73801f1f2 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-f73801f1f2

Comment 13 Fedora Update System 2019-09-14 01:40:34 UTC
crun-0.9.1-1.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-f73801f1f2

Comment 14 Fedora Update System 2019-09-19 14:30:14 UTC
crun-0.9.1-1.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.