Bug 1933494 - systemd-oomd kills the whole session if it's started from console
Summary: systemd-oomd kills the whole session if it's started from console
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: sway
Version: 34
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Till Hofmann
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 1935923
Blocks: 1913794
TreeView+ depends on / blocked
 
Reported: 2021-02-28 17:13 UTC by ojab
Modified: 2021-03-25 18:54 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github i3 i3 issues 4298 0 None open Launch applications/subshells in new cgroups 2021-03-23 10:58:43 UTC

Description ojab 2021-02-28 17:13:06 UTC
Description of problem:

I'm running my WM via `exec env $(systemctl --user show-environment) dbus-run-session -- sway` in console without any display manager and 
```
Feb 28 19:57:37 ojab-notebook systemd-oomd[903]: Swap used (7432306688) / total (8229220352) is more than 90.00%
Feb 28 19:57:37 ojab-notebook systemd-logind[1025]: Session 1 logged out. Waiting for processes to exit.
Feb 28 19:57:37 ojab-notebook systemd[1]: session-1.scope: systemd-oomd killed 514 process(es) in this unit.
Feb 28 19:57:37 ojab-notebook systemd[1]: getty@tty1.service: Deactivated successfully.
```
oomd kills the whole session instead of being more selective, which is not very handy for a laptop with the single purpose of running user session.


Version-Release number of selected component (if applicable):
Installed Packages
Name         : systemd
Version      : 248~rc2
Release      : 1.fc34
Architecture : x86_64

Name         : systemd-oomd-defaults
Version      : 248~rc2
Release      : 1.fc34
Architecture : x86_64


How reproducible:
Three times already and I don't see any other systemd-oomd actions (`earlyoom` killed firefox cnontent processes previously and I saw it), so I suppose always?

Steps to Reproduce:
1. Start user session as described above
2. Fill the memory with some stuff

Actual results:
The session is killed


Expected results:
Some memory-hungry processes like browser is killed and session with WM still runs.


Additional info:

Comment 1 Anita Zhang 2021-03-02 05:52:03 UTC
It sounds like sway doesn't separate applications into their own cgroups as KDE and GNOME do; this is a prerequisite for ideal operation. Per the feedback section in https://fedoraproject.org/wiki/Changes/EnableSystemdOomd you can spawn applications yourself into separate cgroups using `systemd-run` or you can opt out of the systemd-oomd policy by removing the systemd-oomd-defaults.

Comment 2 ojab 2021-03-02 11:38:02 UTC
So basically systemd-oomd will kill the whole session in case of memory pressure or low memory for anyone not using GNOME/KDE and it will disable earlyoom if it was configured?
IMHO it's not ideal and quite unexpected.

Comment 3 Anita Zhang 2021-03-05 19:59:45 UTC
I've filed an RFE for sway to organize processes into cgroups (https://bugzilla.redhat.com/show_bug.cgi?id=1935923).

The EarlyOOM package is still available, it just isn't installed by default. If cgroups end up not being supported in sway you can opt out of systemd-oomd and use EarlyOOM.

Comment 4 ojab 2021-03-05 20:50:56 UTC
Thanks.

I already configured memory-killers as they were before, but 
1. While I knew that systemd-oomd would be enabled in f34, it was unexpected to have earlyoom (potentially with it's own non-default settings) replaced by it. I understand that having two userspace oom-killers is not ideal and having default/unified system services on more setups is desired (I was planning to switch to systemd-oomd), but silently disabling manually installed/configured OOM handler is not good.
2. The session is killed without any notification pre/post kill. For me it looked like a `sway` bug (segfault or something) until I looked into `journalctl`, but I suppose that average cinnamon/mate/xfce/[pick any user-friendly DE except GNOME/KDE] user will not look into `journalctl` and just think `Oh well, fedora is just crashing sometimes`.

So while I agree that cgroups are the proper way forward to handle OOMs, I'm not sure that we're ready for that for workstation usage [except GNOME/KDE?].

`sway` is certainly affected, but I assume that official desktop spins [based on https://spins.fedoraproject.org/], i. e. XFCE, LXQT, MATE-Compiz, Cinnamon, LXDE, SOAS (wtf is that) and i3 (new for f34) should be tested/covered first.

Comment 5 Davide Cavalca 2021-03-08 17:43:17 UTC
It looks like sway specifically should get sorted out in #1932728. As for other DEs, as discussed in the Change it was decided to enable this by default to minimize fragmentation, and leaving the choice to the DEs to opt out of oomd if needed (https://fedoraproject.org/wiki/Changes/EnableSystemdOomd#Should_spins_that_don.27t_put_processes_in_separate_cgroups_be_excluded_from_this_change.3F). I expect the upcoming test day (https://fedoraproject.org/wiki/Test_Day:2021-03-18_Systemd-OOMd_Test_Week) will help flush out potential issues as well.

Comment 6 Davide Cavalca 2021-03-08 17:45:34 UTC
Note: I've reassigned this to systemd, as oomd is the standalone version of oomd, which is unrelated to systemd-oomd itself.

Comment 7 Zbigniew Jędrzejewski-Szmek 2021-03-09 13:45:30 UTC
I'll reassign this to sway, it needs to be handled there somehow. Until 1935923 is done, some docs to
disable systemd-oomd and enable earlyoom might be appropriate. I don't think we can handle this meaningfully
on the systemd side, since we don't really know too much about various guis.

Comment 8 hakavlad 2021-03-23 10:28:43 UTC
I think this is not a bug of systemd-oomd - it's just a wrong tool selection. 

If your DE is XFce (or Mate, LXDE etc) - use alternative killers. If you need userspace OOM killer with PSI support but your DE doesn't launch apps in separate cgroups you could use nohang-desktop (which supports PSI and doesn't kill the whole session).

Comment 9 Artem 2021-03-23 10:58:43 UTC
JFYI: cgroups support already requested for i3/Sway https://github.com/i3/i3/issues/4298

Comment 10 Orestis Floros 2021-03-23 14:13:40 UTC
FYI i3 and sway are different projects.

AFAICT this has not been reported to sway: https://github.com/swaywm/sway/issues

Comment 11 Artem 2021-03-23 14:15:02 UTC
FYI #2. Quote from Sway project:

> We are not accepting any new window management features unless they get implemented by i3.

Comment 12 hakavlad 2021-03-25 11:41:25 UTC
Same result with F34 XFCE:

[ 1448.161951] systemd-oomd[563]: Swap used (3865575424) / total (4115656704) is more than 90.00%
[ 1452.012366] polkitd[652]: Unregistered Authentication Agent for unix-session:6 (system bus name :1.139, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.utf8) (disconnected from bus)
[ 1452.013904] at-spi2-registryd[3882]: X connection to :0 broken (explicit kill or server shutdown).
[ 1452.355238] systemd[1]: session-6.scope: systemd-oomd killed 82 process(es) in this unit.
[ 1452.357987] systemd[1]: session-6.scope: Deactivated successfully.
[ 1452.358238] systemd[1]: session-6.scope: Consumed 20.134s CPU time.
[ 1452.358524] systemd-logind[675]: Session 6 logged out. Waiting for processes to exit.
[ 1452.359377] systemd-logind[675]: Removed session 6.
[ 1452.359533] unknown[3884]: xfce4-screensaver: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
[ 1452.359837] systemd[909]: dbus-:1.2-org.xfce.ScreenSaver@1.service: Main process exited, code=exited, status=1/FAILURE
[ 1452.360373] systemd[909]: dbus-:1.2-org.xfce.ScreenSaver@1.service: Failed with result 'exit-code'.
[ 1452.360506] systemd[909]: dbus-:1.26-org.a11y.atspi.Registry@1.service: Main process exited, code=exited, status=1/FAILURE
[ 1452.360638] systemd[909]: dbus-:1.26-org.a11y.atspi.Registry@1.service: Failed with result 'exit-code'.
[ 1456.406449] audit[4334]: CRED_ACQ pid=4334 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:xdm_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_permit acct="lightdm" exe="/usr/sbin/lightdm" hostname=? addr=? terminal=:0 res=success'
[ 1456.422679] lightdm[4334]: pam_unix(lightdm-greeter:session): session opened for user lightdm(uid=983) by (uid=0)
[ 1456.604325] systemd-logind[675]: New session c3 of user lightdm.
[ 1456.611381] systemd[1]: Created slice User Slice of UID 983.

And high CPU usage: 0.7-1%.

Comment 13 hakavlad 2021-03-25 11:45:37 UTC
And also there is no systemd-oomd entries in the journal except this one:

[ 1448.161951] systemd-oomd[563]: Swap used (3865575424) / total (4115656704) is more than 90.00%

Comment 14 Anita Zhang 2021-03-25 18:54:44 UTC
(In reply to hakavlad from comment #13)
> And also there is no systemd-oomd entries in the journal except this one:
> 
> [ 1448.161951] systemd-oomd[563]: Swap used (3865575424) / total
> (4115656704) is more than 90.00%

The relevant journal line in your case was "[ 1452.355238] systemd[1]: session-6.scope: systemd-oomd killed 82 process(es) in this unit.". There is a "ManagedOOMPreference=avoid" option you can set on the session-.scope so it's less likely to be targeted, but I think you'll want to file a separate RFE for XFCE if it's not separating processes out from from session-.scope.


Note You need to log in before you can comment on or make changes to this bug.