RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1999925 - Support running older container images with older systemd
Summary: Support running older container images with older systemd
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: podman
Version: 9.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Giuseppe Scrivano
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Depends On: 1760645
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-01 03:51 UTC by Alex Jia
Modified: 2021-09-06 10:05 UTC (History)
29 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1760645
Environment:
Last Closed: 2021-09-06 10:05:03 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-95781 0 None None None 2021-09-01 03:51:48 UTC

Description Alex Jia 2021-09-01 03:51:04 UTC
+++ This bug was initially created as a clone of Bug #1760645 +++

Description

Running a centos 7 container with systemd fails to run on Fedora 31

Steps to reproduce the issue:

    podman run -ti centos7-with-systemd /sbin/init

    output shows the following:

Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[!!!!!!] Failed to mount API filesystems, freezing.

    This occurs when running as root and rootless

Describe the results you received:

Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[!!!!!!] Failed to mount API filesystems, freezing.

Describe the results you expected:

Expect systemd to run

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

podman version
Version: 1.6.1
RemoteAPI Version: 1
Go Version: go1.13
OS/Arch: linux/amd64

Output of podman info --debug:

podman info --debug
debug:
compiler: gc
git commit: ""
go version: go1.13
podman version: 1.6.1
host:
BuildahVersion: 1.11.2
CgroupVersion: v2
Conmon:
package: conmon-2.0.1-1.fc31.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.1, commit: 5e0eadedda9508810235ab878174dca1183f4013'
Distribution:
distribution: fedora
version: "31"
MemFree: 18496446464
MemTotal: 67443789824
OCIRuntime:
package: crun-0.10.1-1.fc31.x86_64
path: /usr/bin/crun
version: |-
crun version 0.10.1
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
SwapFree: 0
SwapTotal: 0
arch: amd64
cpus: 16
eventlogger: journald
hostname: kubhost
kernel: 5.3.0-1.fc31.x86_64
os: linux
rootless: true
slirp4netns:
Executable: /usr/bin/slirp4netns
Package: slirp4netns-0.4.0-20.1.dev.gitbbd6f25.fc31.x86_64
Version: |-
slirp4netns version 0.4.0-beta.3+dev
commit: bbd6f25c70d5db2a1cd3bfb0416a8db99a75ed7e
uptime: 186h 43m 44.97s (Approximately 7.75 days)
registries:
blocked: null
insecure: null
search:

    docker.io
    registry.fedoraproject.org
    quay.io
    registry.access.redhat.com
    registry.centos.org
    store:
    ConfigFile: /home/greg/.config/containers/storage.conf
    ContainerStore:
    number: 13
    GraphDriverName: vfs
    GraphOptions: {}
    GraphRoot: /home/greg/.local/share/containers/storage
    GraphStatus: {}
    ImageStore:
    number: 21
    RunRoot: /run/user/1000
    VolumePath: /home/greg/.local/share/containers/storage/volumes

Additional environment details (AWS, VirtualBox, physical, etc.):
running ZFS

--- Additional comment from Daniel Walsh on 2019-10-11 13:02:07 UTC ---

Systemd guys, is there anything that could be done to make systemd on Centos work on a cgroupv2 file system?

--- Additional comment from Pasi Karkkainen on 2019-10-14 14:21:10 UTC ---

at least with docker when using centos7/systemd container one needs to mount the /sys/fs/cgroup volume:

docker run -d -v /sys/fs/cgroup:/sys/fs/cgroup:ro centos7-with-systemd /sbin/init

Is it the same with podman?

--- Additional comment from Matthew Heon on 2019-10-14 14:25:43 UTC ---

No, Podman will automatically detect a container was run with systemd as the entrypoint and add the volume (among other changes necessary to make systemd run well in a container)

--- Additional comment from Martin Vala on 2019-10-15 04:39:53 UTC ---

i have same problem on fc31. Is there anything, that can be done? It is quite blocker for me. Thanks you

--- Additional comment from space88man on 2019-10-21 23:40:20 UTC ---

Data point: F31(upgraded from F30) cannot run a systemd-based F31 container (even though the latter is supposed to be cgroupv2-aware).

Actually I get a different situation from the reporter, and it is the same with CentOS 7 / CentOS 8.

I don't see any output from the container itself, instead conmon stalls at

INFO[0000] Running conmon under slice machine.slice and unitName libpod-conmon-807c8d059ee7276a163930b6de94e90d456398003bc5c8311f001dfa3a1b0f07.scope 

After a couple of minutes:

DEBU[0241] ExitCode msg: "container creation timeout: internal libpod error"
Error: container creation timeout: internal libpod error


Unlike the reporter, systemd inside the container doesn't even get a chance to complain.

--- Additional comment from Adam Williamson on 2019-10-21 23:50:50 UTC ---

It would be useful to be precise about exactly what versions of relevant packages are on both ends there (on the host system and in the container), as there's been quite a lot of change in this area late in F31. There is a known bug https://bugzilla.redhat.com/show_bug.cgi?id=1763868 .

--- Additional comment from space88man on 2019-10-22 00:00:11 UTC ---

Brand new container

F31 Host:
podman-1.6.1-5.fc31.x86_64 / podman-1.6.2-2.fc31.x86_64
conmon-2.0.1-1.fc31.x86_64
crun-0.10.2-1.fc31.x86_64
systemd-243-3.gitef67743.fc31.x86_64

F31 Container: --entrypoint /bin/bash

systemd-243-2.gitfab6f01.fc31.x86_64

--- Additional comment from Matthew Heon on 2019-10-22 13:14:43 UTC ---

What is the exact command you are using to run the F31 container with systemd in it?

--- Additional comment from space88man on 2019-10-22 14:15:12 UTC ---

podman --log-level DEBUG run --rm -it --entrypoint /sbin/init fedora:31

Reaches:
DEBU[0000] /usr/bin/conmon messages will be logged to syslog 
DEBU[0000] running conmon: /usr/bin/conmon               args="[--api-version 1 -s -c a60d3436aca8e3c8633db2dfa60f186679eb6ed61a0a38ea4a2970ccaa10c531 -u a60d3436aca8e3c8633db2dfa60f186679eb6ed61a0a38ea4a2970ccaa10c531 -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/a60d3436aca8e3c8633db2dfa60f186679eb6ed61a0a38ea4a2970ccaa10c531/userdata -p /var/run/containers/storage/overlay-containers/a60d3436aca8e3c8633db2dfa60f186679eb6ed61a0a38ea4a2970ccaa10c531/userdata/pidfile -l k8s-file:/var/lib/containers/storage/overlay-containers/a60d3436aca8e3c8633db2dfa60f186679eb6ed61a0a38ea4a2970ccaa10c531/userdata/ctr.log --exit-dir /var/run/libpod/exits --socket-dir-path /var/run/libpod/socket --log-level debug --syslog -t --conmon-pidfile /var/run/containers/storage/overlay-containers/a60d3436aca8e3c8633db2dfa60f186679eb6ed61a0a38ea4a2970ccaa10c531/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev,metacopy=on --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg a60d3436aca8e3c8633db2dfa60f186679eb6ed61a0a38ea4a2970ccaa10c531]"
INFO[0000] Running conmon under slice machine.slice and unitName libpod-conmon-a60d3436aca8e3c8633db2dfa60f186679eb6ed61a0a38ea4a2970ccaa10c531.scope 

...stall for a few minutes...

DEBU[0240] Cleaning up container a60d3436aca8e3c8633db2dfa60f186679eb6ed61a0a38ea4a2970ccaa10c531 
DEBU[0240] Tearing down network namespace at /var/run/netns/cni-a5ee8c98-8d56-9658-cc99-086c88757c71 for container a60d3436aca8e3c8633db2dfa60f186679eb6ed61a0a38ea4a2970ccaa10c531 
INFO[0240] Got pod network &{Name:nostalgic_haslett Namespace:nostalgic_haslett ID:a60d3436aca8e3c8633db2dfa60f186679eb6ed61a0a38ea4a2970ccaa10c531 NetNS:/var/run/netns/cni-a5ee8c98-8d56-9658-cc99-086c88757c71 Networks:[] RuntimeConfig:map[podman:{IP: PortMappings:[] Bandwidth:<nil> IpRanges:[]}]} 
INFO[0240] About to del CNI network podman (type=bridge) 
DEBU[0240] unmounted container "a60d3436aca8e3c8633db2dfa60f186679eb6ed61a0a38ea4a2970ccaa10c531" 
DEBU[0240] unable to remove container a60d3436aca8e3c8633db2dfa60f186679eb6ed61a0a38ea4a2970ccaa10c531 after failing to start and attach to it 
DEBU[0240] ExitCode msg: "container creation timeout: internal libpod error" 
DEBU[0240] [graphdriver] trying provided driver "overlay" 
DEBU[0240] cached value indicated that overlay is supported 
DEBU[0240] cached value indicated that metacopy is being used 
DEBU[0240] backingFs=xfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=true 
Error: container creation timeout: internal libpod error

--- Additional comment from Daniel Walsh on 2019-10-22 14:24:23 UTC ---

Are you running this rootless or rootfull?

If rootless does it work when running as root.

--- Additional comment from Daniel Walsh on 2019-10-22 14:25:58 UTC ---

Works for me. 

$ rpm -q podman
podman-1.6.2-2.fc31.x86_64


$ podman run --rm -it --entrypoint /sbin/init fedora:31
Trying to pull docker.io/library/fedora:31...
Getting image source signatures
Copying blob 619d35b2bf84 done
Copying config 98c519110e done
Writing manifest to image destination
Storing signatures
systemd v243-2.gitfab6f01.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization podman.
Detected architecture x86-64.

Welcome to Fedora 31 (Container Image)!

Set hostname to <08a30fbf62e3>.
Initializing machine ID from random generator.
[  OK  ] Started Dispatch Password…ts to Console Directory Watch.
[  OK  ] Started Forward Password …uests to Wall Directory Watch.
[  OK  ] Reached target Local File Systems.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Slices.
[  OK  ] Reached target Swap.
[  OK  ] Listening on Process Core Dump Socket.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
         Starting Rebuild Dynamic Linker Cache...
         Starting Journal Service...
         Starting Create System Users...
[  OK  ] Started Create System Users.
[  OK  ] Started Rebuild Dynamic Linker Cache.
[  OK  ] Started Journal Service.
         Starting Flush Journal to Persistent Storage...
[  OK  ] Started Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Create Volatile Files and Directories.
         Starting Rebuild Journal Catalog...
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Started Rebuild Journal Catalog.
         Starting Update is Completed...
[  OK  ] Started Update is Completed.
[  OK  ] Reached target System Initialization.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timers.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
         Starting Permit User Sessions...
[  OK  ] Started Permit User Sessions.
[  OK  ] Reached target Multi-User System.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Started Update UTMP about System Runlevel Changes.

--- Additional comment from Matthew Heon on 2019-10-22 14:46:14 UTC ---

You say this system was upgraded from F30 - was this container preexisting, or created after the upgrade

--- Additional comment from space88man on 2019-10-22 14:47:21 UTC ---

Completely new container, running as root, at the stall, strace'ing I see

120414 futex(0x5649d6ac0d30, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} <unfinished ...>
120342 <... futex resumed>)             = -1 ETIMEDOUT (Connection timed out)
120342 futex(0x56437ad58d30, FUTEX_WAKE_PRIVATE, 1) = 1
120309 <... futex resumed>)             = 0
120342 futex(0xc0003152c8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
120309 nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
120682 <... futex resumed>)             = 0
120342 <... futex resumed>)             = 1
120682 nanosleep({tv_sec=0, tv_nsec=3000},  <unfinished ...>
120342 futex(0x56437ad659e8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
120309 <... nanosleep resumed>NULL)     = 0
120308 <... futex resumed>)             = 0
120342 <... futex resumed>)             = 1
120309 nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
120308 futex(0x56437ad659e8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
120682 <... nanosleep resumed>NULL)     = 0
120342 futex(0x56437ad6a920, FUTEX_WAIT_PRIVATE, 0, {tv_sec=4, tv_nsec=999010290} <unfinished ...>
120682 futex(0xc0003152c8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
120309 <... nanosleep resumed>NULL)     = 0
120309 futex(0x56437ad58d30, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} <unfinished ...>
120422 <... futex resumed>)             = -1 ETIMEDOUT (Connection timed out)
120422 futex(0x5649d6ac0d30, FUTEX_WAKE_PRIVATE, 1) = 1
120414 <... futex resumed>)             = 0
120422 futex(0xc00006a848, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
120414 sched_yield( <unfinished ...>
120422 <... futex resumed>)             = 1
120415 <... futex resumed>)             = 0
120414 <... sched_yield resumed>)       = 0
120422 futex(0xc00006b9c8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
120415 futex(0xc00009a4c8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
120414 futex(0x5649d6ac0c30, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
120423 <... futex resumed>)             = 0
120422 <... futex resumed>)             = 1
120419 <... futex resumed>)             = 0
120415 <... futex resumed>)             = 1
120414 <... futex resumed>)             = 0
120423 futex(0xc00006b9c8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
120422 futex(0x5649d6ad29a0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=4, tv_nsec=998981396} <unfinished ...>
120419 futex(0xc00009a4c8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
120415 futex(0xc00006a848, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>

--- Additional comment from space88man on 2019-10-22 14:49:25 UTC ---

Also something interesting here...

120398 execve("/usr/bin/conmon", ["/usr/bin/conmon", "--api-version", "1", "-s", "-c", "cf865286f1ae24faa69fd5371ad757fa"..., "-u", "cf865286f1ae24faa69fd5371ad757fa"..., "-r", "/usr/bin/crun", "-b", "/var/lib/containers/storage/over"..., "-p", "/var/run/containers/storage/over"..., "-l", "k8s-file:/var/lib/containers/sto"..., "--exit-dir", "/var/run/libpod/exits", "--socket-dir-path", "/var/run/libpod/socket", "--log-level", "debug", "--syslog", "-t", "--conmon-pidfile", "/var/run/containers/storage/over"..., "--exit-command", "/usr/bin/podman", "--exit-command-arg", "--root", "--exit-command-arg", "/var/lib/containers/storage", ...], 0xc000106480 /* 7 vars */ <unfinished ...>
120318 <... clone resumed>)             = 120398
120318 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
120318 close(17)                        = 0
120318 read(16, "", 8)                  = 0
120318 close(16)                        = 0
120318 epoll_ctl(4, EPOLL_CTL_DEL, 15, 0xc00074cd2c) = 0
120318 close(15)                        = 0
120318 futex(0xc000314bc8, FUTEX_WAKE_PRIVATE, 1) = 1
120315 <... futex resumed>)             = 0
120318 gettid()                         = 120318
120318 openat(AT_FDCWD, "/proc/self/task/120318/attr/exec", O_WRONLY|O_CLOEXEC <unfinished ...>
120315 nanosleep({tv_sec=0, tv_nsec=3000},  <unfinished ...>
120318 <... openat resumed>)            = 15
120318 epoll_ctl(4, EPOLL_CTL_ADD, 15, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=884056568, u64=139686105424376}}) = -1 EPERM (Operation not permitted)
120315 <... nanosleep resumed>NULL)     = 0
120318 epoll_ctl(4, EPOLL_CTL_DEL, 15, 0xc00074cd44 <unfinished ...>
120315 futex(0xc0004a6f48, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
120318 <... epoll_ctl resumed>)         = -1 EPERM (Operation not permitted)
120396 <... futex resumed>)             = 0
120315 <... futex resumed>)             = 1
120396 futex(0xc0004a6f48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
120318 write(15, "", 0)                 = 0
120315 read(14,  <unfinished ...>
120318 close(15 <unfinished ...>
120315 <... read resumed>0xc0001cc000, 512) = -1 EAGAIN (Resource temporarily unavailable)
120318 <... close resumed>)             = 0
120315 futex(0xc000314bc8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
120318 close(10)                        = 0
120318 close(13)                        = 0
120318 ioctl(2, TCGETS, {B38400 -opost -isig -icanon -echo ...}) = 0
120318 write(2, "\33[36mINFO\33[0m[0000] Running conm"..., 161) = 161
120318 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 10
120318 setsockopt(10, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
120318 connect(10, {sa_family=AF_UNIX, sun_path="/var/run/dbus/system_bus_socket"}, 34 <unfinished ...>
120342 <... futex resumed>)             = -1 ETIMEDOUT (Connection timed out)
120342 futex(0xc000314bc8, FUTEX_WAKE_PRIVATE, 1) = 1
120315 <... futex resumed>)             = 0
120342 madvise(0xc000600000, 2097152, MADV_NOHUGEPAGE <unfinished ...>
120318 <... connect resumed>)           = 0
120342 <... madvise resumed>)           = 0
120315 futex(0xc000314bc8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
120342 madvise(0xc000698000, 8192, MADV_FREE <unfinished ...>
120318 epoll_ctl(4, EPOLL_CTL_ADD, 10, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=884056568, u64=139686105424376}} <unfinished ...>
120342 <... madvise resumed>)           = 0
120308 <... epoll_pwait resumed>[{EPOLLOUT, {u32=884056568, u64=139686105424376}}], 128, -1, NULL, 3) = 1
120318 <... epoll_ctl resumed>)         = 0
120308 epoll_pwait(4,  <unfinished ...>
120342 futex(0x56437ad6a920, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
120318 getsockname(10,  <unfinished ...>
120342 <... futex resumed>)             = 1
120313 <... futex resumed>)             = 0
120342 futex(0xc0004fa148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
120318 <... getsockname resumed>{sa_family=AF_UNIX}, [112->2]) = 0
120313 futex(0x56437ad6a920, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=1742780} <unfinished ...>
120318 getpeername(10,  <unfinished ...>
120398 <... execve resumed>)            = 0
120318 <... getpeername resumed>{sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, [112->30]) = 0
120398 brk(NULL <unfinished ...>
120318 getuid( <unfinished ...>
120398 <... brk resumed>)               = 0x9bc000
120318 <... getuid resumed>)            = 0
120398 arch_prctl(0x3001 /* ARCH_??? */, 0x7fffece7f960 <unfinished ...>
120318 getpid( <unfinished ...>
120398 <... arch_prctl resumed>)        = -1 EINVAL (Invalid argument)
120318 <... getpid resumed>)            = 120308
120318 getuid( <unfinished ...>

--- Additional comment from Daniel Walsh on 2019-10-22 14:57:15 UTC ---

Are you sure you are fully upd2date on all packages.

rpm -q podman crun conmon fuse-overlayfs
podman-1.6.2-2.fc31.x86_64
crun-0.10.2-1.fc31.x86_64
conmon-2.0.1-1.fc31.x86_64
fuse-overlayfs-0.6.5-2.fc31.x86_64

--- Additional comment from space88man on 2019-10-22 15:04:37 UTC ---

Yes - I am matching these versions of the RPMs.

I am getting this on 3 separate F30 -> F31 upgraded machines. On one of the machines I started with a new /var/lib/containers and seeing the same stall.

--- Additional comment from space88man on 2019-10-22 15:23:33 UTC ---

So this works on a new F31 virtual machine (running a fedora:31 systemd container as root).

I get the same result as @Daniel Walsh - now to figure out why the upgraded system fails to launch this container.

--- Additional comment from space88man on 2019-10-22 15:40:43 UTC ---

@Daniel Walsh I know the cause: I must remove oci-systemd-hook oci-register-machine oci-umount

During the upgrade from F30->F31 these packages were upgraded; the new F31 VM does not have these packages installed. 

These packages seem to interfere with the proper functioning of podman/conmon.

For reference these packages had to be removed:

oci-register-machine-0-11.git66fa845.fc31.x86_64
oci-systemd-hook-0.2.0-2.git05e6923.fc31.x86_64
oci-umount-2.5-3.gitc3cda1f.fc31.x86_64

--- Additional comment from Zbigniew Jędrzejewski-Szmek on 2019-10-22 20:30:37 UTC ---

(In reply to Daniel Walsh from comment #1)
> Systemd guys, is there anything that could be done to make systemd on Centos
> work on a cgroupv2 file system?

cgroupv2 support was added in systemd-230. centos7 has systemd-219, so no support.

What we do in systemd-nspawn is check the guest to guess if it supports cgroupsv2.
If it has systemd >= 230, it does. See
https://github.com/systemd/systemd/blob/master/src/nspawn/nspawn.c#L445-L480.
If the guest has no support for v2, we try to mount v1. This is not very elegant,
but we couldn't come up with an approach that would allow us to use cgroupv2,
but not break old images.

--- Additional comment from Daniel Walsh on 2019-10-23 11:40:35 UTC ---

@space88man

I thought oci-register-machine-0-11.git66fa845.fc31.x86_64 was removed years ago. We no longer support it.
oci-systemd-hook is not used by podman and would not be used by docker-ce or moby-engine.  So it really serves no purpose.

oci-umount is really only for use with devicemapper back end.  But not sure why that would cause issues with cgroup v2?
Are you sure you needed to remove it.

Zbigniew Jędrzejewski-Szmek maybe we should do the same for podman.

--- Additional comment from space88man on 2019-10-23 12:30:57 UTC ---

@Daniel Walsh

oci-umount / oci-register-machine are red herrings - they don't affect fedora:31 centos:8 containers.

The blocker seems to be oci-systemd-hook: after reinstalling just that package I am seeing the stall in conmon as previously described. The problematic version is oci-systemd-hook-0.2.0-2.git05e6923.fc31.x86_64.rpm.

--- Additional comment from Matthew Heon on 2019-10-23 13:40:18 UTC ---

There was another bug related to Conmon hanging if an OCI hook exited non-0 - that could be what's going on here. I think it was resolved, but I forget which project the patch went into?

--- Additional comment from Matthew Heon on 2019-10-23 13:45:58 UTC ---

`conmon-2.0.1-1.fc31.x86_64`

This appears to be the problem. Conmon 2.0.2 is released upstream with a fix. It's pending in Bodhi at https://bodhi.fedoraproject.org/updates/FEDORA-2019-6353777bbd

Can you try the command:
`podman --runtime /bin/false run --rm  alpine true`

If that fails, you have the same issue as the one I'm describing, and 2.0.2 will fix it once it makes it to stable.

--- Additional comment from space88man on 2019-10-23 23:24:41 UTC ---

@Matthew Heon: I can confirm that what I was seeing with oci-systemd-hook was in fact due to the conmon issue. Thanks.

--- Additional comment from WRH on 2020-05-15 14:57:59 UTC ---

Experiencing the same thing on F32 (fresh install):

$ sudo podman run --log-level debug -t --name centos --rm centos:7.2.1511 /sbin/init
...
DEBU[0000] Started container f573efe8280c33bf35380b34bb45c5c98c892b3834d0356582e09c44006e436d
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[!!!!!!] Failed to mount API filesystems, freezing.


The problem doesn't appear to be linked to the presence of stale oci-* packages:

$ rpm -qa | rg oci | wc -l
0


$ rpm -q systemd podman crun conmon fuse-overlayfs

systemd-245.4-1.fc32.x86_64
podman-1.9.1-1.fc32.x86_64
crun-0.13-2.fc32.x86_64
conmon-2.0.15-1.fc32.x86_64
fuse-overlayfs-1.0.0-1.fc32.x86_64

--- Additional comment from Matthew Heon on 2020-05-15 15:19:44 UTC ---

We've investigated whether it is possible to support older versions of systemd requiring cgroups v1 (the versions shipping in CentOS/RHEL 7 being most notable here) on cgroups v2 hosts, and it doesn't seem to be reasonable. The complexity involved is quite significant, and adding support for this is not on our priorities list at present. It seems like the best option to resolve this is to either swap the base image to RHEL/Cent 8.x (which have a new enough systemd to support cgroups v2) or disable cgroups v2 on the host (which does have the disadvantage of removing ability to set resource limits for rootless containers, among a few other things).

--- Additional comment from Zbigniew Jędrzejewski-Szmek on 2020-05-15 15:31:02 UTC ---

What about doing what is described in https://bugzilla.redhat.com/show_bug.cgi?id=1760645#c19 ?

--- Additional comment from Matthew Heon on 2020-05-15 16:31:43 UTC ---

We investigated that, but discarded it as prohibitively difficult. Giuseppe, who appears to already be on CC, was the one who looked into this, and might have more details.

--- Additional comment from Daniel Walsh on 2020-05-15 18:49:38 UTC ---

Seems like a big risk to have podman examining the first process inside of a container and attempting to figure out a specific version. systemd-nspawn,
might expect the container to be running systemd, but we have no such assumption, and figuring this out for different distributions would be very difficult.

--- Additional comment from Zbigniew Jędrzejewski-Szmek on 2020-05-16 18:28:35 UTC ---

> Seems like a big risk to have podman examining the first process inside of a container and attempting to figure out a specific version

In general, I'd agree. But systemd is easier in this regard, because it never allowed support for
cgroupsv2 to be compiled out or otherwise disabled. Additionally, systemd is always installed
in the same locations. And the functionality in question is too big to be backported. Effectively,
this means that a very simple check for existence of libsystemd-shared-nnn. in one location
is enough to cover this case.

Anyway, I'm not saying that this is appropriate for podman... just that the check is relatively
simple in case of systemd in the container. Feel free to ignore my comment.

--- Additional comment from Giuseppe Scrivano on 2020-05-18 06:48:24 UTC ---

(In reply to Matthew Heon from comment #28)
> We investigated that, but discarded it as prohibitively difficult. Giuseppe,
> who appears to already be on CC, was the one who looked into this, and might
> have more details.

to give more details, I think it is difficult to set it up using the OCI config file and it would require changes in the specs.

We can easily implement it in crun using an annotation.

It seems only the name=systemd hierarchy is needed to run an old version of systemd on a cgroup v2 system: https://github.com/containers/crun/pull/357

I am also not sure whether we should add any auto-detection logic into Podman, or it is enough to document the annotation:

podman run --annotation=run.oci.systemd.force_cgroup_v1=/sys/fs/cgroup --rm -ti centos:7 /usr/lib/systemd/systemd

Another limitation is that it seems to work only for root, as rootless cannot mount a new hierarchy.

--- Additional comment from Jan Pazdziora on 2020-05-18 09:34:14 UTC ---

I agree that explicit annotations might be more stable approach than automagically guessing based on the ENTRYPOINT. In https://github.com/freeipa/freeipa-container and in the rhel7/ipa-server container image, the ENTRYPOINT is a bash script which populates the data volume and only then execs systemd.

--- Additional comment from Daniel Walsh on 2020-05-18 19:48:33 UTC ---

I am fine with doing this via annotation, although label might be better, since DockerV2 spec does not support Annotations.

Then you could embed the Lable in the Container/Image and the user would not need to do anything special to get this to work.

If the image/container has a systemd V1 label, then crun does the right thing.

--- Additional comment from Jan Pazdziora on 2020-05-19 06:13:23 UTC ---

Sure, labels would work too.

--- Additional comment from Giuseppe Scrivano on 2020-05-19 07:24:27 UTC ---

the label must be handled by podman, the OCI runtime doesn't have access to this information, we'll need both.

I think we still need an annotation to force this behaviour when the image doesn't specify the label, or using --rootfs.

--- Additional comment from Daniel Walsh on 2020-05-19 18:02:53 UTC ---

I am not sure what the difference is?

I can still do

podman run --label run.oci.systemd.force_cgroup_v1=/sys/fs/cgroup ...
or 
podman run --label run.oci.systemd.force_cgroup_v1=/sys/fs/cgroup --rootfs ...

But if the centos7-init and rhel7-init containers have this label backed into the image
then podman run ... will just work.

--- Additional comment from Matthew Heon on 2020-05-19 18:07:45 UTC ---

I think Giuseppe is saying that labels are not passed to the OCI runtime, only annotations. Images contain labels, not annotations.

This is a tricky/annoying distinction. Podman (and Docker before it) allows labels as arbitrary key-value metadata. The OCI runtime spec separately allows annotations as arbitrary key-value metadata. Unfortunately, these are entirely distinct - Podman uses labels largely for filtering in `ps`/`images` and annotations to trigger specific OCI runtime behaviour (like this). Labels are inherited from images, but annotations are not (images don't have annotations metadata).

--- Additional comment from Daniel Walsh on 2020-05-19 19:45:43 UTC ---

OCI Images support annotations, but I don't think Dockerfile does at this time.

So you can do a buildah commit --annotation x=y container.

--- Additional comment from James Cassell on 2020-05-19 19:50:43 UTC ---

Is it reasonable to make an annotations <-> labels mapping, or would that break things?

--- Additional comment from Matthew Heon on 2020-05-19 20:01:10 UTC ---

I shied away from that when initially writing Podman because of the differing uses (I didn't want a chance of someone setting a label to identify their container and then accidentally triggering weird behavior in the OCI runtime) but we could potentially look into mapping some labels that we know are valid, well-supported OCI runtime annotations.

--- Additional comment from Daniel Walsh on 2020-09-15 20:23:30 UTC ---

Matt and Giuseppe, I say we go forward with this for Fedora. 
We could make up a hacky label

LABEL ANNOTATION:X=y, which tells podman to set the annotation X=y when running the OCI runtime.

--- Additional comment from Ben Cotton on 2020-11-03 15:38:42 UTC ---

This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

--- Additional comment from Adam Williamson on 2020-11-03 22:53:23 UTC ---

Discussion above seems to indicate this isn't obsolete, taking a guess that Rawhide is appropriate.

--- Additional comment from Daniel Walsh on 2021-01-29 10:20:15 UTC ---

Ok the question is how important is this?  RHEL8.4 will have crun and people might be encouraged to move to cgroups V2 (at least we have
been talking to customers about it.)  RHEL9 will be cgroups v2 by default.

Will the importance of RHEL7 systemd based init containers still exists?  Or will everyone have moved on to RHEL8 and RHEL9 images?

--- Additional comment from Pasi Karkkainen on 2021-01-29 15:16:17 UTC ---

EL7 still has many years of support lifetime left, thus lots of people are still using EL7 containers images, including systemd.
At least I am :)

--- Additional comment from Ben Cotton on 2021-02-09 15:12:49 UTC ---

This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.

--- Additional comment from Daniel Walsh on 2021-06-11 15:59:18 UTC ---

Giuseppe can you make the change to CRUN, and we just pass down the annotation. We can specify the annotation in a OCI Image and or the user would have to specify it.

Just need crun to support it, and then to document it in man pages.

--- Additional comment from Giuseppe Scrivano on 2021-08-04 08:55:19 UTC ---

this is implemented in crun: https://github.com/containers/crun/pull/357

Comment 1 Alex Jia 2021-09-01 03:55:07 UTC
[root@kvm-08-guest02 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux release 9.0 Beta (Plow) 

[root@kvm-08-guest02 ~]# rpm -q crun runc podman systemd kernel 
crun-1.0-1.module+el9beta+12444+200de489.x86_64 
runc-1.0.2-1.module+el9beta+12444+200de489.x86_64 
podman-3.3.1-6.module+el9beta+12444+200de489.x86_64 
systemd-249-4.el9.x86_64 
kernel-5.14.0-0.rc7.54.el9.x86_64

[root@kvm-08-guest02 ~]# grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,seclabel,nosuid,nodev,noexec,relatime 0 0

[root@kvm-08-guest02 ~]# podman run --rm -ti centos:7 /usr/lib/systemd/systemd
Resolved "centos" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull quay.io/centos/centos:7...
Getting image source signatures
Copying blob 2d473b07cdd5 done
Copying config 8652b9f0cb done
Writing manifest to image destination
Storing signatures
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[!!!!!!] Failed to mount API filesystems, freezing.

[root@kvm-08-guest02 ~]# podman run --annotation=run.oci.systemd.force_cgroup_v1=/sys/fs/cgroup --rm -ti centos:7 /usr/lib/systemd/systemd
Error: mount `cgroup` to `/sys/fs/cgroup/systemd`: Operation not permitted: OCI permission denied

Comment 2 Tom Sweeney 2021-09-01 12:39:27 UTC
Assigning to Giuseppe as he has the cloned bug too.

Comment 3 Giuseppe Scrivano 2021-09-01 14:17:33 UTC
you first need to mount /sys/fs/cgroup/systemd on the host:

# mkdir /sys/fs/cgroup/systemd
# mount none -t cgroup -o none,name=systemd /sys/fs/cgroup/systemd
# podman run --runtime /usr/bin/crun --annotation=run.oci.systemd.force_cgroup_v1=/sys/fs/cgroup --rm -ti centos:7 /usr/lib/systemd/systemd

Comment 4 Alex Jia 2021-09-02 02:45:27 UTC
(In reply to Giuseppe Scrivano from comment #3)
> you first need to mount /sys/fs/cgroup/systemd on the host:
> 
> # mkdir /sys/fs/cgroup/systemd
> # mount none -t cgroup -o none,name=systemd /sys/fs/cgroup/systemd
> # podman run --runtime /usr/bin/crun
> --annotation=run.oci.systemd.force_cgroup_v1=/sys/fs/cgroup --rm -ti
> centos:7 /usr/lib/systemd/systemd

Thank you Giuseppe! it works for me, we need to document this for end users.


Note You need to log in before you can comment on or make changes to this bug.