Bug 1563025
| Summary: | docker container requires root or sysadmin priv to run systemd | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Mike Chang <mschang> |
| Component: | systemd | Assignee: | systemd-maint |
| Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-daemons |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.4 | CC: | antoine.tran, bblaskov, msekleta, qe-baseos-daemons, systemd-maint-list, systemd-maint |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1392526 | Environment: | |
| Last Closed: | 2021-02-15 07:38:15 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1392526 | ||
| Bug Blocks: | |||
|
Description
Mike Chang
2018-04-03 00:29:22 UTC
I would like to point out that the issue is about docker-ce and systemd, and not docker package. Of course I understand that RedHat maintain its own packages and not external ones. But in this particular issue, I guess it deals with systemd requiring too much privileged, to determine, rather than a fix in docker/docker-ce package. I can reproduce with systemd-219-42.el7_4.7.x86_64 / CentOS Linux release 7.4.1708 (Core) in host and in docker image / docker-ce-17.12.1.ce-1.el7.centos.x86_64. Does anyone known what is the fix in docker package? Is this a known defect and slated for a fix? This is a known defect in docker world: anyone who want to use systemd adds at best SYS_ADMIN capabilities, and at worse all privilege. Any google search about systemd/docker shows this. As to the second part, I don't know the answer. Thanks Antoine for the update. Any known workaround? Granting sysadmin is exposing a lot of privileges for systemd at startup. Such as a way to revoke priv from running container? Not giving SYS_ADMIN will make systemd fail, as shown in previous post: "[!!!!!!] Failed to mount API filesystems, freezing." There is currently no known workaround except giving this privilege. To my knowledge, I believe systemd does something that causes it to mount, and we use SYS_ADMIN to give it at least mount privilege. If you know systemd very well, maybe you can help me determine what are the exact mounts systemd is doing? So that we might solve this by doing docker run --tmpfs [MountPath] or docker run -v [MountPath]:[MountPath]:ro I just couldn't do strace into /sbin/init since it needs PID 1. (In reply to Antoine TRAN from comment #6) > Not giving SYS_ADMIN will make systemd fail, as shown in previous post: > "[!!!!!!] Failed to mount API filesystems, freezing." > > There is currently no known workaround except giving this privilege. To my > knowledge, I believe systemd does something that causes it to mount, and we > use SYS_ADMIN to give it at least mount privilege. If you know systemd very > well, maybe you can help me determine what are the exact mounts systemd is > doing? So that we might solve this by doing > docker run --tmpfs [MountPath] > or > docker run -v [MountPath]:[MountPath]:ro This is the table of API filesystems that systemd mounts during boot, https://github.com/systemd/systemd/blob/master/src/core/mount-setup.c#L77 Failing mount will be in that table. > I just couldn't do strace into /sbin/init since it needs PID 1. I think you should be able to start the container with bash as PID1, figure out PID of the bash process outside of the container, attach strace to that PID and then exec the systemd inside the shell process. Ok, thank you, I was able to trace exactly what systemd does with strace.
Without SYS_ADMIN:
name_to_handle_at(AT_FDCWD, "/sys", {handle_bytes=128}, 0x7ffeae9040c0, AT_SYMLINK_FOLLOW) = -1 EPERM (Operation not permitted)
name_to_handle_at(AT_FDCWD, "/dev", {handle_bytes=128}, 0x7ffeae9040c0, AT_SYMLINK_FOLLOW) = -1 EPERM (Operation not permitted)
access("/sys/fs/smackfs/", F_OK) = -1 ENOENT (No such file or directory)
name_to_handle_at(AT_FDCWD, "/dev/shm", {handle_bytes=128}, 0x7ffeae9040c0, AT_SYMLINK_FOLLOW) = -1 EPERM (Operation not permitted)
name_to_handle_at(AT_FDCWD, "/run", {handle_bytes=128}, 0x7ffeae9040c0, AT_SYMLINK_FOLLOW) = -1 EPERM (Operation not permitted)
name_to_handle_at(AT_FDCWD, "/sys/fs/cgroup/systemd", {handle_bytes=128}, 0x7ffeae9040c0, AT_SYMLINK_FOLLOW) = -1 EPERM (Operation not permitted)
name_to_handle_at(AT_FDCWD, "/sys/fs/pstore", {handle_bytes=128}, 0x7ffeae9040c0, AT_SYMLINK_FOLLOW) = -1 ENOSYS (Function not implemented)
access("/sys/firmware/efi", F_OK) = -1 ENOSYS (Function not implemented)
open("/dev/console", O_WRONLY|O_NOCTTY|O_CLOEXEC) = -1 ENOSYS (Function not implemented)
ioctl(4, TCGETS, 0x7ffeae904060) = -1 ENOSYS (Function not implemented)
ioctl(4, TIOCGWINSZ, 0x7ffeae904100) = -1 ENOSYS (Function not implemented)
writev(4, [{"[", 1}, {"\33[1;31m!!!!!!\33[0m", 17}, {"] ", 2}, {"Failed to mount API filesystems,"..., 42}, {"\n", 1}], 5) = -1 ENOSYS (Function not implemented)
With SYS_ADMIN:
name_to_handle_at(AT_FDCWD, "/sys", {handle_bytes=128}, 0x7ffc15b88c00, AT_SYMLINK_FOLLOW) = -1 EOPNOTSUPP (Operation not supported)
name_to_handle_at(AT_FDCWD, "/", {handle_bytes=128}, 0x7ffc15b88c04, AT_SYMLINK_FOLLOW) = -1 EOPNOTSUPP (Operation not supported)
stat("/sys", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat("/", {st_mode=S_IFDIR|0755, st_size=18, ...}) = 0
name_to_handle_at(AT_FDCWD, "/proc", {handle_bytes=128}, 0x7ffc15b88c00, AT_SYMLINK_FOLLOW) = -1 EOPNOTSUPP (Operation not supported)
name_to_handle_at(AT_FDCWD, "/", {handle_bytes=128}, 0x7ffc15b88c04, AT_SYMLINK_FOLLOW) = -1 EOPNOTSUPP (Operation not supported)
stat("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat("/", {st_mode=S_IFDIR|0755, st_size=18, ...}) = 0
name_to_handle_at(AT_FDCWD, "/dev", {handle_bytes=128 => 12, handle_type=1, f_handle=0x3e96c45a159b410700000000}, [1183], AT_SYMLINK_FOLLOW) = 0
name_to_handle_at(AT_FDCWD, "/", {handle_bytes=128}, 0x7ffc15b88c04, AT_SYMLINK_FOLLOW) = -1 EOPNOTSUPP (Operation not supported)
name_to_handle_at(AT_FDCWD, "/sys/kernel/security", {handle_bytes=128}, 0x7ffc15b88c00, AT_SYMLINK_FOLLOW) = -1 EOPNOTSUPP (Operation not supported)
name_to_handle_at(AT_FDCWD, "/sys/kernel", {handle_bytes=128}, 0x7ffc15b88c04, AT_SYMLINK_FOLLOW) = -1 EOPNOTSUPP (Operation not supported)
name_to_handle_at is a syscall blocked by docker-ce > 1.12 and provided by SYS_ADMIN. I guess RedHat maintained docker or docker-latest packages does not implement seccomp or use a less rectricted seccomp profile (https://github.com/justincormack/docker/blob/master/profiles/seccomp/seccomp_default.go).
For docker point of view, this is normal by design, to add at least SYS_ADMIN (that contain this syscall), for systemd.
I looked at systemd code: particularly at https://github.com/systemd/systemd/blob/v238/src/core/mount-setup.c
------------------------
r = path_is_mount_point(p->where, NULL, AT_SYMLINK_FOLLOW);
if (r < 0 && r != -ENOENT) {
log_full_errno(priority, r, "Failed to determine whether %s is a mount point: %m", p->where);
return (p->mode & MNT_FATAL) ? r : 0;
}
if (r > 0)
return 0;
/* Skip securityfs in a container */
if (!(p->mode & MNT_IN_CONTAINER) && detect_container() > 0)
return 0;
-------------------------
Do you think we can manage to exclude for container mode as this (move up the container check):
------------------------
/* Skip securityfs in a container */
if (!(p->mode & MNT_IN_CONTAINER) && detect_container() > 0)
return 0;
r = path_is_mount_point(p->where, NULL, AT_SYMLINK_FOLLOW);
if (r < 0 && r != -ENOENT) {
log_full_errno(priority, r, "Failed to determine whether %s is a mount point: %m", p->where);
return (p->mode & MNT_FATAL) ? r : 0;
}
if (r > 0)
return 0;
-------------------------
@Mike Chang: I believe I am asking the wrong entity for this kind of bug. I created an issue addressed for systemd developpers in https://github.com/systemd/systemd/issues/8657 , which is a Request For Enhancement. Thank you for your time, you can close this issue. @Mike Chang: according to systemd, this is fixed in their upstream version v238 (https://github.com/systemd/systemd/blob/v238/src/basic/mount-util.c). Currently, the latest available in CentOs is 219-42.el7_4.10. Do you happen to know if this version already integrate the upstream version (with all the backports, now that the release version is "-42"), and if this is not the case, when we will have the v238 available in Redhat family? Thank you. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |