Bug 1913806

Summary: Can't start CentOS Stream 8 systemd-nspawn container on CentOS Stream 8
Product: Red Hat Enterprise Linux 8 Reporter: Gena Makhomed <makhomed>
Component: kernelAssignee: systemd-maint
kernel sub component: Namespace QA Contact: Frantisek Sumsal <fsumsal>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: urgent    
Priority: unspecified CC: agladkov, ajb, amyagi, bstinson, carl, dowdle, ebiederm, fhrdina, fweimer, jwboyer, markus.falb, phil, rhbug, riehecky, systemd-maint-list, toracat
Version: CentOS Stream   
Target Milestone: rc   
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-10 09:15:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gena Makhomed 2021-01-07 16:20:51 UTC
Description of problem:

Can't start CentOS Stream 8 systemd-nspawn container on CentOS Stream 8

Version-Release number of selected component (if applicable):

kernel-4.18.0-240.1.1.el8_3.x86_64   - CentOS 8.3 kernel, this bug absent
kernel-4.18.0-257.el8.x86_64         - CentOS Stream 8 kernel, this bug present
kernel-4.18.0-259.el8.x86_64         - CentOS Stream 8 kernel, this bug present

How reproducible:

always

Steps to Reproduce:

1. install CentOS Stream 8

1.1 disable SELinux

1.2. reboot

2. install systemd-container package:

yum install systemd-container

3. create container test:

dnf --installroot=/var/lib/machines/test --releasever=8 install systemd mc centos-stream-release

4. try to start container test:

machinectl start test

Actual results:

Job for systemd-nspawn failed because the control process exited with error code.
See "systemctl status systemd-nspawn" and "journalctl -xe" for details.

Expected results:

Expected start container test without any problems, as it now still works in RHEL 8.3 / CentOS 8.3 release.

Additional info:

Fragment of system journal:

Jan 07 18:15:06 centos-stream systemd[1]: Starting Container test...
Jan 07 18:15:06 centos-stream systemd-nspawn[4888]: Selected user namespace base 1492320256 and range 65536.
Jan 07 18:15:06 centos-stream systemd-nspawn[4888]: Failed to mount sysfs on /sys/full (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC ""): Operation not permitted
Jan 07 18:15:06 centos-stream systemd-nspawn[4888]: Failed to add new veth interfaces (ve-test:host0): No such process
Jan 07 18:15:06 centos-stream systemd[1]: systemd-nspawn: Main process exited, code=exited, status=1/FAILURE
Jan 07 18:15:06 centos-stream systemd[1]: systemd-nspawn: Failed with result 'exit-code'.
Jan 07 18:15:06 centos-stream systemd[1]: Failed to start Container test.

Comment 1 Gena Makhomed 2021-01-12 13:53:12 UTC
After installing on CentOS Stream 8 server old kernel from CentOS 8.3:

# yum install \
http://mirror.centos.org/centos/8/BaseOS/x86_64/os/Packages/kernel-4.18.0-240.1.1.el8_3.x86_64.rpm \
http://mirror.centos.org/centos/8/BaseOS/x86_64/os/Packages/kernel-core-4.18.0-240.1.1.el8_3.x86_64.rpm \
http://mirror.centos.org/centos/8/BaseOS/x86_64/os/Packages/kernel-modules-4.18.0-240.1.1.el8_3.x86_64.rpm

systemd-nspawn can start container.

Looks like this is not systemd bug, but this is CentOS Stream 8 kernel regression?

kernel-4.18.0-240.1.1.el8_3.x86_64   - CentOS 8.3 kernel, this bug absent
kernel-4.18.0-257.el8.x86_64         - CentOS Stream 8 kernel, this bug present
kernel-4.18.0-259.el8.x86_64         - CentOS Stream 8 kernel, this bug present

Comment 2 Gena Makhomed 2021-01-21 14:06:34 UTC
This bug also reproduced with latest kernel-4.18.0-269.el8.x86_64

# uname -a
Linux centos-stream 4.18.0-269.el8.x86_64 #1 SMP Tue Jan 12 17:55:05 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

# machinectl start test
Job for systemd-nspawn failed because the control process exited with error code.
See "systemctl status systemd-nspawn" and "journalctl -xe" for details.

# systemctl status systemd-nspawn
Jan 21 15:55:12 centos-stream systemd[1]: Starting Container test...
Jan 21 15:55:12 centos-stream systemd-nspawn[1235]: Selected user namespace base 1492320256 and range 65536.
Jan 21 15:55:12 centos-stream systemd-nspawn[1235]: Failed to mount sysfs on /sys/full (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC ""): Operation not permitted
Jan 21 15:55:12 centos-stream systemd-nspawn[1235]: Failed to add new veth interfaces (ve-test:host0): No such process
Jan 21 15:55:12 centos-stream systemd[1]: systemd-nspawn: Main process exited, code=exited, status=1/FAILURE
Jan 21 15:55:12 centos-stream systemd[1]: systemd-nspawn: Failed with result 'exit-code'.
Jan 21 15:55:12 centos-stream systemd[1]: Failed to start Container test.

Comment 3 David Tardon 2021-03-05 15:10:49 UTC
*** Bug 1935781 has been marked as a duplicate of this bug. ***

Comment 4 Frantisek Sumsal 2021-03-05 15:27:05 UTC
Reproducible also with the latest systemd upstream (v248-rc2-179-g3ee0cf339b at the time of writing) and kernel 4.18.0-291.el8.x86_64:

# mkdir test
# cd test
# lsinitrd --unpack
# ~/systemd/build/systemd-nspawn --version
systemd 248 (248)
+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS -OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP -LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified
# ~/systemd/build/systemd-nspawn -U --register=no
Spawning container test on /root/test.
Press ^] three times within 1s to kill container.
Selected user namespace base 1492320256 and range 65536.
Failed to mount cgroup (type cgroup) on /sys/fs/cgroup/rdma (MS_NOSUID|MS_NODEV|MS_NOEXEC "rdma"): Operation not permitted
Child died too early.

strace around the failing mount():
# strace -s 500 -ff ~/systemd/build/systemd-nspawn -U --register=no
...
[pid 33278] unshare(CLONE_NEWCGROUP <unfinished ...>
[pid 33276] <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8
[pid 33278] <... unshare resumed>)      = 0
[pid 33276] ppoll([{fd=8, events=POLLHUP}, {fd=7, events=POLLIN}], 2, NULL, NULL, 8 <unfinished ...>
[pid 33278] stat("/sys/fs", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 33278] mkdir("/sys/fs/cgroup", 0755) = -1 EEXIST (File exists)
[pid 33278] stat("/sys/fs/cgroup", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
[pid 33278] openat(AT_FDCWD, "/", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 12
[pid 33278] openat(12, "sys", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 14
[pid 33278] fstat(14, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
[pid 33278] close(12)                   = 0
[pid 33278] openat(14, "fs", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 12
[pid 33278] fstat(12, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 33278] close(14)                   = 0
[pid 33278] openat(12, "cgroup", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 14
[pid 33278] fstat(14, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
[pid 33278] close(12)                   = 0
[pid 33278] close(14)                   = 0
[pid 33278] openat(AT_FDCWD, "/sys/fs", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 12
[pid 33278] statx(12, "cgroup", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT, 0, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFDIR|0555, stx_size=0, ...}) = 0
[pid 33278] name_to_handle_at(12, "cgroup", {handle_bytes=128}, 0x7ffddb477e4c, AT_SYMLINK_FOLLOW) = -1 EOPNOTSUPP (Operation not supported)
[pid 33278] name_to_handle_at(12, "", {handle_bytes=128}, 0x7ffddb477e4c, AT_EMPTY_PATH) = -1 EOPNOTSUPP (Operation not supported)
[pid 33278] openat(12, "cgroup", O_RDONLY|O_CLOEXEC|O_PATH) = 14
[pid 33278] openat(AT_FDCWD, "/proc/self/fdinfo/14", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
[pid 33278] close(14)                   = 0
[pid 33278] newfstatat(12, "cgroup", {st_mode=S_IFDIR|0555, st_size=0, ...}, 0) = 0
[pid 33278] newfstatat(12, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0
[pid 33278] close(12)                   = 0
[pid 33278] openat(AT_FDCWD, "/sys/fs/cgroup", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 12
[pid 33278] mount("tmpfs", "/proc/self/fd/12", "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_STRICTATIME, "mode=755,size=4m,nr_inodes=1k,uid=0,gid=0") = 0
[pid 33278] close(12)                   = 0
[pid 33278] openat(AT_FDCWD, "/proc/self/cgroup", O_RDONLY|O_CLOEXEC) = 12
[pid 33278] fstat(12, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
[pid 33278] read(12, "12:memory:/\n11:hugetlb:/\n10:blkio:/\n9:rdma:/\n8:pids:/\n7:cpuset:/\n6:freezer:/\n5:perf_event:/\n4:cpu,cpuacct:/\n3:devices:/\n2:net_cls,net_prio:/\n1:name=systemd:/\n", 1024) = 158
[pid 33278] ioctl(12, TCGETS, 0x7ffddb477f70) = -1 ENOTTY (Inappropriate ioctl for device)
[pid 33278] getpid()                    = 1
[pid 33278] gettid()                    = 1
[pid 33278] futex(0x7fd8dc27c690, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid 33278] ioctl(12, TCGETS, 0x7ffddb477f70) = -1 ENOTTY (Inappropriate ioctl for device)
[pid 33278] ioctl(12, TCGETS, 0x7ffddb477f70) = -1 ENOTTY (Inappropriate ioctl for device)
[pid 33278] ioctl(12, TCGETS, 0x7ffddb477f70) = -1 ENOTTY (Inappropriate ioctl for device)
[pid 33278] ioctl(12, TCGETS, 0x7ffddb477f70) = -1 ENOTTY (Inappropriate ioctl for device)
[pid 33278] ioctl(12, TCGETS, 0x7ffddb477f70) = -1 ENOTTY (Inappropriate ioctl for device)
[pid 33278] ioctl(12, TCGETS, 0x7ffddb477f70) = -1 ENOTTY (Inappropriate ioctl for device)
[pid 33278] ioctl(12, TCGETS, 0x7ffddb477f70) = -1 ENOTTY (Inappropriate ioctl for device)
[pid 33278] ioctl(12, TCGETS, 0x7ffddb477f70) = -1 ENOTTY (Inappropriate ioctl for device)
[pid 33278] ioctl(12, TCGETS, 0x7ffddb477f70) = -1 ENOTTY (Inappropriate ioctl for device)
[pid 33278] ioctl(12, TCGETS, 0x7ffddb477f70) = -1 ENOTTY (Inappropriate ioctl for device)
[pid 33278] ioctl(12, TCGETS, 0x7ffddb477f70) = -1 ENOTTY (Inappropriate ioctl for device)
[pid 33278] read(12, "", 1024)          = 0
[pid 33278] close(12)                   = 0
[pid 33278] openat(AT_FDCWD, "/sys/fs/cgroup", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 12
[pid 33278] statx(12, "devices", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW|AT_NO_AUTOMOUNT, 0, 0x7ffddb477f20) = -1 ENOENT (No such file or directory)
[pid 33278] close(12)                   = 0
[pid 33278] stat("/sys/fs/cgroup", {st_mode=S_IFDIR|0755, st_size=40, ...}) = 0
[pid 33278] mkdir("/sys/fs/cgroup/devices", 0755) = 0
[pid 33278] openat(AT_FDCWD, "/sys/fs/cgroup/devices", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 12
[pid 33278] mount("cgroup", "/proc/self/fd/12", "cgroup", MS_NOSUID|MS_NODEV|MS_NOEXEC, "devices") = -1 EPERM (Operation not permitted)
[pid 33278] close(12)                   = 0
[pid 33278] writev(2, [{iov_base="Failed to mount cgroup (type cgroup) on /sys/fs/cgroup/devices (MS_NOSUID|MS_NODEV|MS_NOEXEC \"devices\"): Operation not permitted", iov_len=128}, {iov_base="\n", iov_len=1}], 2Failed to mount cgroup (type cgroup) on /sys/fs/cgroup/devices (MS_NOSUID|MS_NODEV|MS_NOEXEC "devices"): Operation not permitted
) = 129
[pid 33278] exit_group(1)               = ?
[pid 33276] <... ppoll resumed>)        = 1 ([{fd=8, revents=POLLHUP}])
[pid 33278] +++ exited with 1 +++
writev(2, [{iov_base="Child died too early.", iov_len=21}, {iov_base="\n", iov_len=1}], 2Child died too early.

Comment 6 Gena Makhomed 2021-09-10 06:48:29 UTC
Can't reproduce this bug with kernel-4.18.0-338.el8.x86_64

This bug already fixed.

And this issue should be closed.

But I don't have permissions to close this issue.

Comment 7 Frantisek Sumsal 2021-09-10 09:15:05 UTC
Thanks for the verification, Gena! Closing as requested - if the issue reappears, feel free to reopen this one (or open a new one).