Bug 1927148

Summary:	systemd-oomd.service fails to start on host with systemd.unified_cgroup_hierarchy=0
Product:	[Fedora] Fedora	Reporter:	Jan Pazdziora (Red Hat) <jpazdziora>
Component:	systemd	Assignee:	systemd-maint
Status:	CLOSED ERRATA	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	rawhide	CC:	fedoraproject, filbranden, flepied, jpazdziora, kasong, lnykryn, luigic, msekleta, ssahani, s, systemd-maint, the.anitazha, yuwatana, zbyszek, z
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	systemd-248~rc3-1.fc35 systemd-248~rc4-3.fc34	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-03-25 00:18:31 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jan Pazdziora (Red Hat) 2021-02-10 08:33:04 UTC

Description of problem:

Running /usr/sbin/init in container now tries to run systemd-oomd.service which fails.

Version-Release number of selected component (if applicable):

systemd-247.3-1.fc34.x86_64

How reproducible:

Deterministic.

Steps to Reproduce:
1. Have Dockerfile:

FROM registry.fedoraproject.org/fedora:rawhide
RUN dnf install -y systemd && dnf clean all
ENTRYPOINT [ "/usr/sbin/init" ]

2. Build image: podman build -t systemd .
3. Run container: podman run --name systemd --rm -ti systemd

Actual results:

systemd v247.3-1.fc34 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization container-other.
Detected architecture x86-64.

Welcome to Fedora 34 (Container Image Prerelease)!

Set hostname to <c36e8450d79e>.
Couldn't move remaining userspace processes, ignoring: Input/output error
Queued start job for default target Graphical Interface.
-.slice: Failed to migrate controller cgroups from /user.slice/user-1000.slice/user/user.slice/podman-54072.scope, ignoring: Input/output error
[  OK  ] Created slice system-getty.slice.
[  OK  ] Created slice system-modprobe.slice.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Reached target Local File Systems.
[  OK  ] Reached target Network is Online.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Slices.
[  OK  ] Reached target Swap.
[  OK  ] Listening on Process Core Dump Socket.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
[  OK  ] Listening on User Database Manager Socket.
         Starting Rebuild Dynamic Linker Cache...
         Starting Journal Service...
         Starting Create System Users...
[  OK  ] Finished Create System Users.
[  OK  ] Started Journal Service.
         Starting Flush Journal to Persistent Storage...
[  OK  ] Finished Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
[  OK  ] Finished Rebuild Dynamic Linker Cache.
[  OK  ] Finished Create Volatile Files and Directories.
         Starting Rebuild Journal Catalog...
         Starting Userspace Out-Of-Memory (OOM) Killer...
         Starting Network Name Resolution...
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Finished Update UTMP about System Boot/Shutdown.
[FAILED] Failed to start Userspace Out-Of-Memory (OOM) Killer.
See 'systemctl status systemd-oomd.service' for details.
[  OK  ] Finished Rebuild Journal Catalog.
         Starting Update is Completed...
[  OK  ] Started Network Name Resolution.
[  OK  ] Reached target Host and Network Name Lookups.
[  OK  ] Finished Update is Completed.
[  OK  ] Reached target System Initialization.
[  OK  ] Started dnf makecache --timer.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timers.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
         Starting Home Area Manager...
         Starting User Login Management...
         Starting D-Bus System Message Bus...
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started User Login Management.
[  OK  ] Started Home Area Manager.
[  OK  ] Finished Home Area Activation.
         Starting Permit User Sessions...
[  OK  ] Finished Permit User Sessions.
[  OK  ] Started Console Getty.
[  OK  ] Reached target Login Prompts.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Finished Update UTMP about System Runlevel Changes.

Fedora 34 (Container Image Prerelease)
Kernel 5.10.10-200.fc33.x86_64 on an x86_64 (console)

Expected results:

No

[FAILED] Failed to start Userspace Out-Of-Memory (OOM) Killer.
See 'systemctl status systemd-oomd.service' for details.

Additional info:

Running podman exec -ti systemd systemctl status systemd-oomd.service shows

● systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer
     Loaded: loaded (/usr/lib/systemd/system/systemd-oomd.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2021-02-10 08:28:35 UTC; 4min 1s ago
       Docs: man:systemd-oomd.service(8)
    Process: 45 ExecStart=/usr/lib/systemd/systemd-oomd (code=exited, status=1/FAILURE)
   Main PID: 45 (code=exited, status=1/FAILURE)
      Error: 95 (Operation not supported)

Feb 10 08:28:35 c36e8450d79e systemd[1]: systemd-oomd.service: Scheduled restart job, restart counter is at 5.
Feb 10 08:28:35 c36e8450d79e systemd[1]: Stopped Userspace Out-Of-Memory (OOM) Killer.
Feb 10 08:28:35 c36e8450d79e systemd[1]: systemd-oomd.service: Start request repeated too quickly.
Feb 10 08:28:35 c36e8450d79e systemd[1]: systemd-oomd.service: Failed with result 'exit-code'.
Feb 10 08:28:35 c36e8450d79e systemd[1]: Failed to start Userspace Out-Of-Memory (OOM) Killer.

Comment 1 Zbigniew Jędrzejewski-Szmek 2021-02-13 14:40:38 UTC

The logs don't say so exactly, but it seems to be the same issue.

*** This bug has been marked as a duplicate of bug 1926373 ***

Comment 2 Jan Pazdziora (Red Hat) 2021-02-16 10:33:05 UTC

I don't think it's the same problem. Digging into the journal some more, I see

Feb 16 09:42:17 ipa.example.test systemd-oomd[35]: Requires the unified cgroups hierarchy
Feb 16 09:42:17 ipa.example.test systemd[1]: systemd-oomd.service: Main process exited, code=exited, status=1/FAILURE
Feb 16 09:42:17 ipa.example.test systemd[1]: systemd-oomd.service: Failed with result 'exit-code'.
Feb 16 09:42:17 ipa.example.test systemd[1]: Failed to start Userspace Out-Of-Memory (OOM) Killer.

So it likes systemd-oomd.service fails on installations with systemd.unified_cgroup_hierarchy=0 instead of bailing out silently.

Comment 3 Jan Pazdziora (Red Hat) 2021-02-16 10:34:08 UTC

Sadly, Fedora rawhide currently does not provision for me, so I cannot check if the problem happens on host directly as well.

Comment 4 Jan Pazdziora (Red Hat) 2021-02-22 10:42:49 UTC

I was now able to verify that the problem happens with systemd-oomd on the host directl as well:

1. configure the system to boot with systemd.unified_cgroup_hierarchy=0 and reboot
2. verify: grep systemd.unified_cgroup_hierarchy=0 /proc/cmdline
3. check: journalctl | grep systemd-oomd ; systemctl status systemd-oomd.service

Feb 22 10:21:47 machine.example.com systemd-oomd[599]: Requires the unified cgroups hierarchy
Feb 22 10:21:47 machine.example.com systemd[1]: systemd-oomd.service: Main process exited, code=exited, status=1/FAILURE
Feb 22 10:21:47 machine.example.com systemd[1]: systemd-oomd.service: Failed with result 'exit-code'.
Feb 22 10:21:47 machine.example.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-oomd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'

● systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer
     Loaded: loaded (/usr/lib/systemd/system/systemd-oomd.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Mon 2021-02-22 05:22:16 EST; 9s ago
       Docs: man:systemd-oomd.service(8)
    Process: 856 ExecStart=/usr/lib/systemd/systemd-oomd (code=exited, status=1/FAILURE)
   Main PID: 856 (code=exited, status=1/FAILURE)
      Error: 95 (Operation not supported)

Comment 6 Jan Pazdziora (Red Hat) 2021-02-22 10:45:55 UTC

Of course, one way to look at the situation is to say that using systemd.unified_cgroup_hierarchy=0 is so special and exceptional that the admin doing so should know they should disable systemd-oomd.service at the same time.

The initial reproducer with containers however demonstrates that the admin configuring the host and admin/developer creating the payload might be completely different people.

Yet another way to look at it is -- is systemd-oomd ever useful in containers?

Comment 7 Zbigniew Jędrzejewski-Szmek 2021-02-22 10:51:46 UTC

Yeah, we should not try to start oomd on v1 and when psi is not available. We should add
Condition* in the unit file. @anitazha?

> Yet another way to look at it is -- is systemd-oomd ever useful in containers?

In principle, it could. But I'm not sure if we want it at this point. I don't think
the current configuration would make it useful anyway.

Comment 8 Anita Zhang 2021-02-22 20:38:03 UTC

Condition*= on systemd-oomd is a good idea since it's intended primarily for bare metal. It can always be manually overridden for containers.

Comment 9 Zbigniew Jędrzejewski-Szmek 2021-02-22 21:24:25 UTC

https://github.com/systemd/systemd/pull/18743 adds ConditionControlGroupController=v2.

But now I'm not quite sure how to handle this. If we set ConditionControlGroupController=v2
in systemd-oomd.service, some users will be without a userspace oom killer. That's probably
better than failing...

Also see https://bugzilla.redhat.com/show_bug.cgi?id=1931181.

Comment 10 Anita Zhang 2021-02-23 08:27:14 UTC

I've put up https://github.com/systemd/systemd/pull/18751 with your suggestions.

oomd was only ever intended for cgroupv2. I think EarlyOOM is still an option for a userspace oom killer if there are people not on v2 or have some other restriction.

Comment 11 Fedora Update System 2021-03-23 14:06:49 UTC

FEDORA-2021-ea92e5703f has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-ea92e5703f

Comment 12 Fedora Update System 2021-03-24 02:44:02 UTC

FEDORA-2021-ea92e5703f has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-ea92e5703f`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-ea92e5703f

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 13 Fedora Update System 2021-03-25 00:18:31 UTC

FEDORA-2021-ea92e5703f has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 14 Luigi Cantoni 2021-09-15 00:27:46 UTC

Hi,
I have just upgraded to 34 from 33. I plan to go to 35 in just over a month when 35 is available in production. I know its very late for 34 to report that I am getting the above error.
I have done the upgrade which upgraded a few items but the problem is still present.
The indications are that upgrade patches should have fixed this.
Since what I can see I do not believe my systems will need to execute OOM when running (they really do not do very much) I think it will not cause me any issues.

I will report if the problem, persists with 35 and will also report if it goes away.
I will also report if my various other systems (nearly identical from OS/software point of view) do/do not have the issue also.

If someone has a suggestion I should try I am more than happy to check it out.

Comment 15 Luigi Cantoni 2021-10-05 06:41:59 UTC

Hi all,
mine was a trivial fix when I actually tried to see what was going on.
The system created a new user called system-oom
My self build routines replaced the /etc/passwd type files and it was missing.
Simply added the user back in as required (from another system that was not "repaired" by me) and problem solved.

Hope this might help someone.