Bug 1964153 - systemd-oomd does not preventthat OOM killer is triggered and/or kills the wrong process
Summary: systemd-oomd does not preventthat OOM killer is triggered and/or kills the w...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 34
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-24 20:14 UTC by Sergio Belkin
Modified: 2022-06-08 00:55 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-08 00:55:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
journal -r --no-pager relevant output (733.74 KB, text/plain)
2021-05-24 20:14 UTC, Sergio Belkin
no flags Details

Description Sergio Belkin 2021-05-24 20:14:09 UTC
Created attachment 1786647 [details]
journal -r --no-pager relevant output

Created attachment 1786647 [details]
journal -r --no-pager relevant output

Description of problem:

systemd-oomd fails using pressure stall information (PSI)

Version-Release number of selected component (if applicable):

systemd 248 (v248.3-1.fc34)

How reproducible:


Steps to Reproduce:
1. sudo mkdir /etc/systemd/system/-.slice.d/
2. printf "[Slice]\nManagedOOMSwap=auto" | sudo tee /etc/systemd/system/-.slice.d/99-test.conf && sudo systemctl daemon-reload
3. systemd-run --user --scope /usr/bin/stress-ng --brk 2 --stack 2 --bigheap 2 --timeout 90s

Actual results:

OOM Killer is triggered and in once case I tested systemd-oomd also killed firefox.

Expected results:

systemd-oomd sends SIGKILLs to all processes under a selected cgroup when total memory pressure on all tasks exceeds 50% for 20 seconds. 

Additional info:

- systemd version:
systemd 248 (v248.3-1.fc34)
+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified

- kernel, memory, zram, oom info:

uname -a; free -m; zramctl --output-all; oomctl 
Linux munster.belkin.home 5.11.21-300.fc34.x86_64 #1 SMP Fri May 14 17:43:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
               total        used        free      shared  buff/cache   available
Mem:           31914        2540       25965         624        3408       28333
Swap:           8191        1240        6951
NAME       DISKSIZE  DATA  COMPR ALGORITHM STREAMS ZERO-PAGES  TOTAL MEM-LIMIT MEM-USED MIGRATED MOUNTPOINT
/dev/zram0       8G  1,2G 370,6M lzo-rle         8      53071 522,8M        0B     1,1G   142,8K [SWAP]
Dry Run: no
Swap Used Limit: 90.00%
Default Memory Pressure Limit: 60.00%
Default Memory Pressure Duration: 20s
System Context:
        Swap: Used: 1.2G Total: 7.9G
Swap Monitored CGroups:
Memory Pressure Monitored CGroups:
        Path: /user.slice/user-1000.slice/user
                Memory Pressure Limit: 50.00%
                Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.38 Total: 2min 24s
                Current Memory Usage: 4.2G
                Memory Min: 0B
                Memory Low: 0B
                Pgscan: 1332115063
                Last Pgscan: 1332115063

Also I haven't found any memory.oom.group file set to 1:

find /sys -name "memory.oom.group" -exec grep  -v '0$'  '{}' \;

https://www.freedesktop.org/software/systemd/man/systemd-oomd.html claims:

«When an action needs to happen, it will only be performed on the descendant cgroups of the enabled units. More precisely, only cgroups with memory.oom.group set to 1 and leaf cgroup nodes are eligible candidates. Action will be taken recursively on all of the processes under the chosen candidate.»

Comment 1 Chris Murphy 2021-05-25 01:30:06 UTC
We can tell this is the kernel OOM killer not systemd-oomd by two hints:

may 24 16:39:21 munster.belkin.home systemd[1]: user: A process of this unit has been killed by the OOM killer.
...
may 24 16:39:22 munster.belkin.home kernel: Out of memory: Killed process 76760 (stress-ng) total-vm:8651936kB, anon-rss:6268576kB, file-rss:0kB, shmem-rss:4kB, UID:1000 pgtables:16948kB oom_score_adj:1000

Looks like stress-ng itself adds a significant oom_score_adj which makes the kernel oom killer much more sensitive to killing this particular process compared to any others. I'm not sure there's much systemd-oomd can do about it, but I've cc'd Anita for an expert opinion. I'm uncertain if systemd-oomd will react fast enough to this particular class of memory hogging process that also has such a high oom_score_adj. Stress-ng is so unusually aggressive compared to the typical memory hungry process.

Comment 2 Chris Murphy 2021-05-25 01:33:27 UTC
>Also I haven't found any memory.oom.group file set to 1:
>find /sys -name "memory.oom.group" -exec grep  -v '0$'  '{}' \;

Same here, on two Fedora 34 Workstation edition systems. Not sure what's up here, Anita?

Comment 3 Anita Zhang 2021-05-25 06:18:11 UTC
> I'm uncertain if systemd-oomd will react fast enough to this particular class of memory hogging process that also has such a high oom_score_adj. Stress-ng is so unusually aggressive compared to the typical memory hungry process.

Pressure based killing wasn't designed to handle applications that quickly eat memory (esp. ones with oom_score_adj 1000) and I think for those processes invoking the kernel OOM killer makes sense. PSI counters are trailing so it has to take some time to build up pressure before a kill happens. I'm assuming this was from running the systemd-oomd test day test case which needs an update since the kill values have changed a little since then.

> Also I haven't found any memory.oom.group file set to 1

I don't think this is set for any Fedora spins by default. Setting memory.oom.group to 1 is something I would expect to see a container manager do when setting up containers such that the whole container will get killed if a process in it OOMs. It is also set when OOMPolicy=kill in systemd units.

> in once case I tested systemd-oomd also killed firefox.

I notice in your attached logs it shows systemd-oomd killing `<...>/app.slice/app-firefox-b3772ae6afbf47ef930a59419c36b515.scope`, which seems to imply that cgroupify is not splitting the firefox processes into their own cgroups. Do you have the uresourced RPM installed?

Comment 4 Chris Murphy 2021-05-25 20:41:49 UTC
uresourced is in the workstation-product group, which is only pulled in by Workstation edition.
https://pagure.io/fedora-comps/blob/main/f/comps-f34.xml.in#_5613

What Fedora edition/spin is installed?

Comment 5 Sergio Belkin 2021-05-26 16:00:05 UTC
(In reply to Anita Zhang from comment #3)
> > I'm uncertain if systemd-oomd will react fast enough to this particular class of memory hogging process that also has such a high oom_score_adj. Stress-ng is so unusually aggressive compared to the typical memory hungry process.
> 
> Pressure based killing wasn't designed to handle applications that quickly
> eat memory (esp. ones with oom_score_adj 1000) and I think for those
> processes invoking the kernel OOM killer makes sense. PSI counters are
> trailing so it has to take some time to build up pressure before a kill
> happens. I'm assuming this was from running the systemd-oomd test day test
> case which needs an update since the kill values have changed a little since
> then.


Yes I've run the test day case, I had a similar result from https://fedoraproject.org/wiki/Changes/EnableSystemdOomd#How_to_test

> 
> > Also I haven't found any memory.oom.group file set to 1
> 
> I don't think this is set for any Fedora spins by default.

Does this includes  to Fedora Workstation?

 Setting
> memory.oom.group to 1 is something I would expect to see a container manager
> do when setting up containers such that the whole container will get killed
> if a process in it OOMs. It is also set when OOMPolicy=kill in systemd units.



> 
> > in once case I tested systemd-oomd also killed firefox.
> 
> I notice in your attached logs it shows systemd-oomd killing
> `<...>/app.slice/app-firefox-b3772ae6afbf47ef930a59419c36b515.scope`, which
> seems to imply that cgroupify is not splitting the firefox processes into
> their own cgroups. Do you have the uresourced RPM installed?


What do you mean by "unresourced" RPM?

This is the info about firefox installed:

rpm -qi firefox
Name        : firefox
Version     : 88.0.1
Release     : 1.fc34
Architecture: x86_64
Install Date: dom 16 may 2021 13:29:13
Group       : Unspecified
Size        : 263508769
License     : MPLv1.1 or GPLv2+ or LGPLv2+
Signature   : RSA/SHA256, mar 11 may 2021 07:03:35, Key ID 1161ae6945719a39
Source RPM  : firefox-88.0.1-1.fc34.src.rpm
Build Date  : lun 10 may 2021 05:45:37
Build Host  : buildhw-x86-13.iad2.fedoraproject.org
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : https://www.mozilla.org/firefox/
Bug URL     : https://bugz.fedoraproject.org/firefox
Summary     : Mozilla Firefox Web browser
Description :
Mozilla Firefox is an open-source web browser, designed for standards
compliance, performance and portability.

Comment 6 Sergio Belkin 2021-05-26 16:03:48 UTC
(In reply to Chris Murphy from comment #4)
> uresourced is in the workstation-product group, which is only pulled in by
> Workstation edition.
> https://pagure.io/fedora-comps/blob/main/f/comps-f34.xml.in#_5613
> 
> What Fedora edition/spin is installed?

Is Fedora KDE Plasma Desktop

Comment 7 Chris Murphy 2021-05-26 22:51:33 UTC
(In reply to Sergio Belkin from comment #5)
> What do you mean by "unresourced" RPM?

$ rpm -q uresourced
uresourced-0.4.0-1.fc34.x86_64

benzea, is it safe to opt into uresourced on KDE? Or is there still some work pending?

Comment 8 Sergio Belkin 2021-05-26 23:04:49 UTC
(In reply to Chris Murphy from comment #7)
> (In reply to Sergio Belkin from comment #5)
> > What do you mean by "unresourced" RPM?
> 
> $ rpm -q uresourced
> uresourced-0.4.0-1.fc34.x86_64
> 
> benzea, is it safe to opt into uresourced on KDE? Or is there still some
> work pending?

Oops, I'm sorry Anita and Chris, I read too quickly... I thought that Anita was talking about of a kind of software not a specific package,
Well I confirm you that I haven't installed that package:

rpm -q uresourced
package uresourced is not installed

Comment 9 Benjamin Berg 2021-05-26 23:47:32 UTC
> benzea, is it safe to opt into uresourced on KDE? Or is there still some work pending?

I think that installing uresourced in the KDE spin makes sense. It should give the same improvements for KDE users as it does for GNOME.

Comment 10 Chris Murphy 2021-05-27 03:01:21 UTC
Opened a ticket for KDE SIG folks to consider.
https://pagure.io/fedora-kde/SIG/issue/81

Comment 11 Ben Cotton 2022-05-12 16:35:33 UTC
This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 12 Ben Cotton 2022-06-08 00:55:19 UTC
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07.

Fedora Linux 34 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.