Bug 1941170 - Systemd-oomd very aggressive in killing apps [NEEDINFO]
Summary: Systemd-oomd very aggressive in killing apps
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 34
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Anita Zhang
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-20 15:49 UTC by Isaac Bernadus
Modified: 2021-05-07 10:09 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-07 10:09:30 UTC
Type: Bug
the.anitazha: needinfo? (mikhail.v.gavrilov)


Attachments (Terms of Use)
htop (974.12 KB, image/png)
2021-04-07 14:20 UTC, Mikhail
no flags Details
system log (107 bytes, text/plain)
2021-04-07 14:21 UTC, Mikhail
no flags Details
system log (1.47 MB, text/plain)
2021-04-07 14:33 UTC, Mikhail
no flags Details

Description Isaac Bernadus 2021-03-20 15:49:58 UTC
Description of problem:

Systemd-oomd is very aggressive when it comes to memory management. In fedora 33 I've been able to run quite a few apps without a problem but in fedora 34, the apps get killed way too quickly. Here's an example of atom getting killed by systemd-oomd:

Mar 20 22:18:34 x505za systemd-oomd[1020]: Memory pressure for /user.slice/user-1000.slice/user@1000.service is greater than 10 for more than 10 seconds and there was reclaim activity
Mar 20 22:18:34 x505za systemd[1604]: app-gnome-atom-11930.scope: systemd-oomd killed 47 process(es) in this unit.
Mar 20 22:18:36 x505za systemd[1604]: app-gnome-atom-11930.scope: Deactivated successfully.
Mar 20 22:18:36 x505za systemd[1604]: app-gnome-atom-11930.scope: Consumed 7.557s CPU time

This event is triggered when around 70-80% of my memory is filled up despite still having space in swap.

Version-Release number of selected component (if applicable): systemd 248 (v248~rc4-1.fc34)


How reproducible:
Always

Steps to Reproduce:
1. Load up a bunch of apps to fill up memory
2. Wait for systemd-oomd to trigger reclaim activity
3.

Actual results:
Apps get killed very quickly

Expected results:
Apps to run normally until memory and swap is almost full

Additional info:

System Specs:

Ryzen 3 2200u
4GB RAM
Kernel 5.11.7-300.fc34.x86_64

Comment 1 Isaac Bernadus 2021-03-20 16:44:30 UTC
Seems like if the memory gets filled fast enough, systemd will even decide to kill Gnome

Comment 2 Davide Repetto 2021-03-23 14:59:38 UTC
Same problem here.
The fedora defaults are too aggressive. They make systemd-oomd very trigger-happy (ManagedOOMMemoryPressureLimit=10% for 10 seconds)

With 4GB and zram, you can barely use anything. With 2GB it is a task-massacre all the time.

The defaults suggested in the manual (60% & 30s) still prevent excessive spinning while working way more predictably.

Comment 3 iolo 2021-03-24 22:59:27 UTC
I've noticed this too while testing Fedora Workstation 34 this week. I'll leave Netbeans or Brave running to go do something else, and I've got maybe about 4 GB of free RAM out of 8 GB total at that point. Then, some time later, I will try to go back to Netbeans or Brave, or whatever it is, only to find that it's been killed. I've never had anything like this happen before.

Comment 4 Anita Zhang 2021-03-25 07:12:19 UTC
I will work on updating the pressure defaults now that the test week results have come in. I agree that the defaults are a bit aggressive, but that's what the test week and beta was meant to iron out.

Comment 5 Anita Zhang 2021-03-30 09:04:29 UTC
I've submitted https://src.fedoraproject.org/rpms/systemd/pull-request/58# to bump pressure defaults to 50% for 20s. Hopefully these more conservative values will perform better for most people.

Comment 6 Fedora Update System 2021-03-31 09:18:33 UTC
FEDORA-2021-8595b30af3 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-8595b30af3

Comment 7 Fedora Update System 2021-04-01 02:04:06 UTC
FEDORA-2021-8595b30af3 has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-8595b30af3`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-8595b30af3

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 8 Fedora Update System 2021-04-03 01:28:14 UTC
FEDORA-2021-8595b30af3 has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 9 Mikhail 2021-04-07 14:20:40 UTC
Created attachment 1769904 [details]
htop

Today oomd again killed my container.


$ cat /usr/lib/systemd/oomd.conf.d/10-oomd-defaults.conf
[OOM]
DefaultMemoryPressureDurationSec=20s


$ cat /usr/lib/systemd/system/user@.service.d/10-oomd-user-service-defaults.conf
[Service]
ManagedOOMMemoryPressure=kill
ManagedOOMMemoryPressureLimit=50%

Comment 10 Mikhail 2021-04-07 14:21:13 UTC
Created attachment 1769916 [details]
system log

Comment 11 Mikhail 2021-04-07 14:33:13 UTC
Created attachment 1769918 [details]
system log

Comment 12 Anita Zhang 2021-04-08 22:42:10 UTC
@Mikhail Was the system responsive and performing well at 54% pressure on the user service cgroup? Also can you try stopping systemd-oomd (sudo systemctl stop systemd-oomd) and recording what the highest tolerable pressure value was from `/sys/fs/cgroup/user.slice/user-$UID.slice/user@$UID.service/memory.pressure` while your container is running? We can't control for all workloads but it's worthwhile to see what pressure is tolerable or not.


Note You need to log in before you can comment on or make changes to this bug.