Bug 1927148
Summary: | systemd-oomd.service fails to start on host with systemd.unified_cgroup_hierarchy=0 | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jan Pazdziora (Red Hat) <jpazdziora> |
Component: | systemd | Assignee: | systemd-maint |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | fedoraproject, filbranden, flepied, jpazdziora, kasong, lnykryn, luigic, msekleta, ssahani, s, systemd-maint, the.anitazha, yuwatana, zbyszek, z |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | systemd-248~rc3-1.fc35 systemd-248~rc4-3.fc34 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-03-25 00:18:31 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jan Pazdziora (Red Hat)
2021-02-10 08:33:04 UTC
The logs don't say so exactly, but it seems to be the same issue. *** This bug has been marked as a duplicate of bug 1926373 *** I don't think it's the same problem. Digging into the journal some more, I see Feb 16 09:42:17 ipa.example.test systemd-oomd[35]: Requires the unified cgroups hierarchy Feb 16 09:42:17 ipa.example.test systemd[1]: systemd-oomd.service: Main process exited, code=exited, status=1/FAILURE Feb 16 09:42:17 ipa.example.test systemd[1]: systemd-oomd.service: Failed with result 'exit-code'. Feb 16 09:42:17 ipa.example.test systemd[1]: Failed to start Userspace Out-Of-Memory (OOM) Killer. So it likes systemd-oomd.service fails on installations with systemd.unified_cgroup_hierarchy=0 instead of bailing out silently. Sadly, Fedora rawhide currently does not provision for me, so I cannot check if the problem happens on host directly as well. I was now able to verify that the problem happens with systemd-oomd on the host directl as well: 1. configure the system to boot with systemd.unified_cgroup_hierarchy=0 and reboot 2. verify: grep systemd.unified_cgroup_hierarchy=0 /proc/cmdline 3. check: journalctl | grep systemd-oomd ; systemctl status systemd-oomd.service Feb 22 10:21:47 machine.example.com systemd-oomd[599]: Requires the unified cgroups hierarchy Feb 22 10:21:47 machine.example.com systemd[1]: systemd-oomd.service: Main process exited, code=exited, status=1/FAILURE Feb 22 10:21:47 machine.example.com systemd[1]: systemd-oomd.service: Failed with result 'exit-code'. Feb 22 10:21:47 machine.example.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-oomd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' ● systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer Loaded: loaded (/usr/lib/systemd/system/systemd-oomd.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Mon 2021-02-22 05:22:16 EST; 9s ago Docs: man:systemd-oomd.service(8) Process: 856 ExecStart=/usr/lib/systemd/systemd-oomd (code=exited, status=1/FAILURE) Main PID: 856 (code=exited, status=1/FAILURE) Error: 95 (Operation not supported) Of course, one way to look at the situation is to say that using systemd.unified_cgroup_hierarchy=0 is so special and exceptional that the admin doing so should know they should disable systemd-oomd.service at the same time. The initial reproducer with containers however demonstrates that the admin configuring the host and admin/developer creating the payload might be completely different people. Yet another way to look at it is -- is systemd-oomd ever useful in containers? Yeah, we should not try to start oomd on v1 and when psi is not available. We should add
Condition* in the unit file. @anitazha?
> Yet another way to look at it is -- is systemd-oomd ever useful in containers?
In principle, it could. But I'm not sure if we want it at this point. I don't think
the current configuration would make it useful anyway.
Condition*= on systemd-oomd is a good idea since it's intended primarily for bare metal. It can always be manually overridden for containers. https://github.com/systemd/systemd/pull/18743 adds ConditionControlGroupController=v2. But now I'm not quite sure how to handle this. If we set ConditionControlGroupController=v2 in systemd-oomd.service, some users will be without a userspace oom killer. That's probably better than failing... Also see https://bugzilla.redhat.com/show_bug.cgi?id=1931181. I've put up https://github.com/systemd/systemd/pull/18751 with your suggestions. oomd was only ever intended for cgroupv2. I think EarlyOOM is still an option for a userspace oom killer if there are people not on v2 or have some other restriction. FEDORA-2021-ea92e5703f has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-ea92e5703f FEDORA-2021-ea92e5703f has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-ea92e5703f` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-ea92e5703f See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2021-ea92e5703f has been pushed to the Fedora 34 stable repository. If problem still persists, please make note of it in this bug report. Hi, I have just upgraded to 34 from 33. I plan to go to 35 in just over a month when 35 is available in production. I know its very late for 34 to report that I am getting the above error. I have done the upgrade which upgraded a few items but the problem is still present. The indications are that upgrade patches should have fixed this. Since what I can see I do not believe my systems will need to execute OOM when running (they really do not do very much) I think it will not cause me any issues. I will report if the problem, persists with 35 and will also report if it goes away. I will also report if my various other systems (nearly identical from OS/software point of view) do/do not have the issue also. If someone has a suggestion I should try I am more than happy to check it out. Hi all, mine was a trivial fix when I actually tried to see what was going on. The system created a new user called system-oom My self build routines replaced the /etc/passwd type files and it was missing. Simply added the user back in as required (from another system that was not "repaired" by me) and problem solved. Hope this might help someone. |