Bug 1769299
| Summary: | systemd: systemd-nspawn disables pkey_alloc system call by default | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Florian Weimer <fweimer> |
| Component: | systemd | Assignee: | systemd-maint |
| Status: | CLOSED WONTFIX | QA Contact: | Frantisek Sumsal <fsumsal> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 8.2 | CC: | systemd-maint-list, zbyszek |
| Target Milestone: | rc | Flags: | msekleta:
mirror+
|
| Target Release: | 8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-06 07:30:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1770154 | ||
| Bug Blocks: | |||
|
Description
Florian Weimer
2019-11-06 10:47:05 UTC
what is pkey_alloc() used by? is this something any common lib implicitly invokes? glibc? hmm, judging from https://codesearch.debian.net/search?q=pkey_alloc&literal=1&perpkg=1 noone is using pkey actually? if noone is, then it's probably little tested hence there's a good chance it has issues. It's supposed to be used by in-memory databases in combination with DAX, to avoid corrupting persistent storage accidentally, and perhaps by certain cryptographic modules, to protect key material. I don't think there is much open-source software using it. but isn't this something that would-be users should enable explicitly with --system-call-filter=@pkey rather than enable for everyone by default? i.e. the whole syscall filter thing is an exercise in minimizing attack surface, and something little used like this probably tips the balance more to the side of "opt-in" rather than "opt-out". Out of curiosity: how did you run into this btw, where did you notice this? (btw, afaics docker blocks it too: https://docs.docker.com/engine/security/seccomp/) btw, none of the CPUs I am across support pkeys (they lack the "pku" flag in /proc/cpuinfo at least). Is this available in any current Intel CPUs? (In reply to Lennart Poettering from comment #4) > but isn't this something that would-be users should enable explicitly with > --system-call-filter=@pkey rather than enable for everyone by default? I don't think users should be aware of individual system calls, it should just work. I'm not sure why systemd-nspawn tries to arbitrarily block system calls by default. Reducing attack surface is one thing, but breaking the userspace ABI is not something that users will expect. > Out of curiosity: how did you run into this btw, where did you notice this? It shows up when running the glibc test suite. (In reply to Lennart Poettering from comment #5) > btw, none of the CPUs I am across support pkeys (they lack the "pku" flag in > /proc/cpuinfo at least). Is this available in any current Intel CPUs? Some Xeon Scalable Processors have support, but not all Skylake server processors. The EPERM vs ENOSYS difference causes failures even without pkeys support in the CPU. > The EPERM vs ENOSYS difference causes failures even without pkeys support in the CPU.
So I'd claim we are right with returning EPERM here. I mean, this is a security profile, and EPERM sounds like the more appropriate error for that. ENOSYS sounds like the error to return for "not implemented", but in this case it might very well be implemented, but it's forbidden due to the selected policy. Or to say this differently: the syscalls policies nspawn enforces are more like selinux' policies that prohibit access to APIs and objects, which also use EPERM not ENOSYS. Yes, you can use seccomp for anything, but philosophically these policies are really about security, not about hiding functionality, and I don't think we should lie about that, it just makes stuff hard to debug.
In systemd's own codebase, when when we use new fancy syscalls we generally assume EPERM could also mean "security policy doesn't allow this", and then implement a fallback to something else, much the same as for ENOSYS.
btw, docker's seccomp policies also return EPERM for blocked calls, exactly like we do.
(In reply to Lennart Poettering from comment #7) > > The EPERM vs ENOSYS difference causes failures even without pkeys support in the CPU. > > So I'd claim we are right with returning EPERM here. I mean, this is a > security profile, and EPERM sounds like the more appropriate error for that. > ENOSYS sounds like the error to return for "not implemented", but in this > case it might very well be implemented, but it's forbidden due to the > selected policy. They serve different purposes. EPERM is appropriate if you want things to fail (so that applications break), ENOSYS is appropriate if you want to trigger fallback (like utimensat_time64 → utime) or just disable the feature (because the application assumes the kernel is too old to support it). For a generic container runtime, there either have to be no filters by default (my preference), or filters for unknown system calls need to return ENOSYS. Everything else will break too many applications. If you have specific knowledge of the system call, you can return EPERM instead in a few cases (e.g. for clock_settime). But that's not really possible for an unknown system call. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |