Hide Forgot
Description of problem: Systemd-oomd is very aggressive when it comes to memory management. In fedora 33 I've been able to run quite a few apps without a problem but in fedora 34, the apps get killed way too quickly. Here's an example of atom getting killed by systemd-oomd: Mar 20 22:18:34 x505za systemd-oomd[1020]: Memory pressure for /user.slice/user-1000.slice/user@1000.service is greater than 10 for more than 10 seconds and there was reclaim activity Mar 20 22:18:34 x505za systemd[1604]: app-gnome-atom-11930.scope: systemd-oomd killed 47 process(es) in this unit. Mar 20 22:18:36 x505za systemd[1604]: app-gnome-atom-11930.scope: Deactivated successfully. Mar 20 22:18:36 x505za systemd[1604]: app-gnome-atom-11930.scope: Consumed 7.557s CPU time This event is triggered when around 70-80% of my memory is filled up despite still having space in swap. Version-Release number of selected component (if applicable): systemd 248 (v248~rc4-1.fc34) How reproducible: Always Steps to Reproduce: 1. Load up a bunch of apps to fill up memory 2. Wait for systemd-oomd to trigger reclaim activity 3. Actual results: Apps get killed very quickly Expected results: Apps to run normally until memory and swap is almost full Additional info: System Specs: Ryzen 3 2200u 4GB RAM Kernel 5.11.7-300.fc34.x86_64
Seems like if the memory gets filled fast enough, systemd will even decide to kill Gnome
Same problem here. The fedora defaults are too aggressive. They make systemd-oomd very trigger-happy (ManagedOOMMemoryPressureLimit=10% for 10 seconds) With 4GB and zram, you can barely use anything. With 2GB it is a task-massacre all the time. The defaults suggested in the manual (60% & 30s) still prevent excessive spinning while working way more predictably.
I've noticed this too while testing Fedora Workstation 34 this week. I'll leave Netbeans or Brave running to go do something else, and I've got maybe about 4 GB of free RAM out of 8 GB total at that point. Then, some time later, I will try to go back to Netbeans or Brave, or whatever it is, only to find that it's been killed. I've never had anything like this happen before.
I will work on updating the pressure defaults now that the test week results have come in. I agree that the defaults are a bit aggressive, but that's what the test week and beta was meant to iron out.
I've submitted https://src.fedoraproject.org/rpms/systemd/pull-request/58# to bump pressure defaults to 50% for 20s. Hopefully these more conservative values will perform better for most people.
FEDORA-2021-8595b30af3 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-8595b30af3
FEDORA-2021-8595b30af3 has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-8595b30af3` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-8595b30af3 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2021-8595b30af3 has been pushed to the Fedora 34 stable repository. If problem still persists, please make note of it in this bug report.
Created attachment 1769904 [details] htop Today oomd again killed my container. $ cat /usr/lib/systemd/oomd.conf.d/10-oomd-defaults.conf [OOM] DefaultMemoryPressureDurationSec=20s $ cat /usr/lib/systemd/system/user@.service.d/10-oomd-user-service-defaults.conf [Service] ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=50%
Created attachment 1769916 [details] system log
Created attachment 1769918 [details] system log
@Mikhail Was the system responsive and performing well at 54% pressure on the user service cgroup? Also can you try stopping systemd-oomd (sudo systemctl stop systemd-oomd) and recording what the highest tolerable pressure value was from `/sys/fs/cgroup/user.slice/user-$UID.slice/user@$UID.service/memory.pressure` while your container is running? We can't control for all workloads but it's worthwhile to see what pressure is tolerable or not.
(In reply to Anita Zhang from comment #12) > @Mikhail Was the system responsive and performing well at 54% pressure on > the user service cgroup? Also can you try stopping systemd-oomd (sudo > systemctl stop systemd-oomd) and recording what the highest tolerable > pressure value was from > `/sys/fs/cgroup/user.slice/user-$UID.slice/user@$UID.service/memory. > pressure` while your container is running? We can't control for all > workloads but it's worthwhile to see what pressure is tolerable or not. $ cat /sys/fs/cgroup/user.slice/user-$UID.slice/user@$UID.service/memory.pressure some avg10=0.00 avg60=0.00 avg300=0.13 total=1698253169 full avg10=0.00 avg60=0.00 avg300=0.11 total=1515028054 $ journalctl -b -u systemd-oomd --no-pager -- Journal begins at Thu 2021-07-29 17:02:00 +05, ends at Wed 2021-09-08 00:51:09 +05. -- Sep 04 03:16:03 primary-ws systemd[1]: Starting Userspace Out-Of-Memory (OOM) Killer... Sep 04 03:16:03 primary-ws systemd[1]: Started Userspace Out-Of-Memory (OOM) Killer. Sep 08 00:23:23 primary-ws systemd-oomd[1552]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-887e6f17-fa6d-44cd-aa80-798d5c0c71ce.scope due to memory pressure for /user.slice/user-1000.slice/user@1000.service being 52.46% > 50.00% for > 20s with reclaim activity
^^^ This is F36 and systemd oomd still killing my terminal tabs.
(In reply to Mikhail from comment #13) > $ journalctl -b -u systemd-oomd --no-pager > -- Journal begins at Thu 2021-07-29 17:02:00 +05, ends at Wed 2021-09-08 > 00:51:09 +05. -- > Sep 04 03:16:03 primary-ws systemd[1]: Starting Userspace Out-Of-Memory > (OOM) Killer... > Sep 04 03:16:03 primary-ws systemd[1]: Started Userspace Out-Of-Memory (OOM) > Killer. > Sep 08 00:23:23 primary-ws systemd-oomd[1552]: Killed > /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome. > Terminal.slice/vte-spawn-887e6f17-fa6d-44cd-aa80-798d5c0c71ce.scope due to > memory pressure for /user.slice/user-1000.slice/user@1000.service being > 52.46% > 50.00% for > 20s with reclaim activity You're pretty close to the default limits set up for Fedora so if you're fine with the added pressure you may want to try bumping them for your system with an override like so: $ cat /etc/systemd/system/user@.service.d/99-oomd-override.conf [Service] ManagedOOMMemoryPressureLimit=65% $ sudo systemctl daemon-reload $ oomctl # check if new limit was applied The default values will likely be reworked once https://github.com/systemd/systemd/pull/20690 is merged. This will allow setting more tuned pressure values on slices within a user session rather than relying on one value for all of user@UID.service.
(In reply to Anita Zhang from comment #15) > > You're pretty close to the default limits set up for Fedora so if you're > fine with the added pressure you may want to try bumping them for your > system with an override like so: > > $ cat /etc/systemd/system/user@.service.d/99-oomd-override.conf Directory `user@.service.d` is absent on my system. $ ls /etc/systemd/system/user@.service.d ls: cannot access '/etc/systemd/system/user@.service.d': No such file or directory > [Service] > ManagedOOMMemoryPressureLimit=65% > $ sudo systemctl daemon-reload > $ oomctl # check if new limit was applied > > The default values will likely be reworked once > https://github.com/systemd/systemd/pull/20690 is merged. This will allow > setting more tuned pressure values on slices within a user session rather > than relying on one value for all of user@UID.service. As I am understand by default PressureLimit should be 50% $ cat /usr/lib/systemd/oomd.conf.d/10-oomd-defaults.conf [OOM] DefaultMemoryPressureDurationSec=20s $ cat /usr/lib/systemd/system/user@.service.d/10-oomd-user-service-defaults.conf [Service] ManagedOOMMemoryPressure=kill ManagedOOMMemoryPressureLimit=50% But oomctl show 60%, why? $ oomctl Dry Run: no Swap Used Limit: 90.00% Default Memory Pressure Limit: 60.00% Default Memory Pressure Duration: 20s System Context: Memory: Used: 55.3G Total: 62.6G Swap: Used: 104.5M Total: 63.9G Swap Monitored CGroups: Path: / Swap Usage: (see System Context) Memory Pressure Monitored CGroups: Path: /user.slice/user-1000.slice/user@1000.service Memory Pressure Limit: 50.00% Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 14s Current Memory Usage: 49.6G Memory Min: 250.0M Memory Low: 0B Pgscan: 85039860 Last Pgscan: 85039860
You(In reply to Mikhail from comment #16) > (In reply to Anita Zhang from comment #15) > Directory `user@.service.d` is absent on my system. You need to make it. Directories under /etc/systemd/system are managed by the system maintainer. > As I am understand by default PressureLimit should be 50% > > But oomctl show 60%, why? > > $ oomctl > Dry Run: no > Swap Used Limit: 90.00% > Default Memory Pressure Limit: 60.00% > Default Memory Pressure Duration: 20s > System Context: > Memory: Used: 55.3G Total: 62.6G > Swap: Used: 104.5M Total: 63.9G > Swap Monitored CGroups: > Path: / > Swap Usage: (see System Context) > Memory Pressure Monitored CGroups: > Path: /user.slice/user-1000.slice/user@1000.service > Memory Pressure Limit: 50.00% > Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 14s > Current Memory Usage: 49.6G > Memory Min: 250.0M > Memory Low: 0B > Pgscan: 85039860 > Last Pgscan: 85039860 Default memory pressure limit is 60% meaning if the unit doesn't override it, it will use 60%. But here since we ship a config for user@.service, the memory pressure limit is overridden to be 50% (it has it in the output above the "Pressure" line).
> You're pretty close to the default limits set up for Fedora so if you're fine with the added pressure you may want to try bumping them for your system with an override like so ManagedOOMMemoryPressureLimit=65% did't helps :( $ journalctl -b -u systemd-oomd --no-pager -- Journal begins at Thu 2021-10-07 03:47:38 +05, ends at Fri 2021-11-12 19:54:14 +05. -- Nov 12 14:42:28 primary-ws systemd[1]: Starting Userspace Out-Of-Memory (OOM) Killer... Nov 12 14:42:28 primary-ws systemd[1]: Started Userspace Out-Of-Memory (OOM) Killer. Nov 12 17:50:48 primary-ws systemd-oomd[1172]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-f92b7041-15da-41fb-8076-8221774567da.scope due to memory pressure for /user.slice/user-1000.slice/user@1000.service being 65.70% > 65.00% for > 20s with reclaim activity $ cat /sys/fs/cgroup/user.slice/user-$UID.slice/user@$UID.service/memory.pressure some avg10=3.68 avg60=31.49 avg300=25.59 total=424288160 full avg10=3.59 avg60=29.41 avg300=23.58 total=390639367 $ oomctl Dry Run: no Swap Used Limit: 90.00% Default Memory Pressure Limit: 60.00% Default Memory Pressure Duration: 20s System Context: Memory: Used: 60.4G Total: 62.6G Swap: Used: 16.5G Total: 71.9G Swap Monitored CGroups: Path: / Swap Usage: (see System Context) Memory Pressure Monitored CGroups: Path: /user.slice/user-1000.slice/user@1000.service Memory Pressure Limit: 65.00% Pressure: Avg10: 2.11 Avg60: 26.64 Avg300: 23.10 Total: 6min 30s Current Memory Usage: 25.5G Memory Min: 250.0M Memory Low: 0B Pgscan: 36702397 Last Pgscan: 36690917
How do I disable this completely? It's constantly killing apps despite plenty of RAM and swap space available. I do `sudo systemctl disable --now systemd-oomd.service` and it comes back when I restart. I don't want to use any userspace OOM killer.
(In reply to Mohamed Akram from comment #19) > How do I disable this completely? You can disable it completely with "systemctl mask systemd-oomd" (Masked services won't start even if you launch them manually.)
(In reply to Mohamed Akram from comment #19) > It's constantly killing apps despite plenty of RAM and swap space available. Hey this sounds like a legit bug? Do you still have the logs from this event? They should be visible in the journal by doing `journalctl -u systemd-oomd -g Killed`
Just wanted to chime in that I've had systemd-oomd kill my gnome shell four times in past two months on my current install of F35 (leading to a pretty jarring experience). I'm a pretty lay Linux user, will try to attach logs, let me know if anything else is helpful.
Created attachment 1885395 [details] journalctl -- oomd kills
>Mar 25 19:00:46 fedora systemd-oomd[1612]: Killed /user.slice/user-1000.slice/user@1000.service/session.slice/org.gnome.Shell@wayland.service due to memory used (16360116224) / total (16541884416) and swap used (7752622080) / total (8589930496) being more than 90.00% That something is being killed off at 90% swap usage makes sense, but not GNOME Shell. That's exchanging one big problem for another big problem, I'm not sure we can ever consider killing the desktop itself as a solution to the swap perf problem. And it makes me wonder if the only thing we can do is ensure resource control limits everything well enough that the user has the ability to choose what program needs to get killed off rather than doing it for them? As in, I'm wondering if oomd really should only be killing the most obvious candidates, and actually permit the less obvious candidates while still constraining the resources they can use to like 90% or whatever allows the shell+terminal to remain responsive enough (i.e. not perfect) such that the user doesn't reach for the power button. But instead reaches for top or systemd-cgtop to find out what's hogging resources, and decides whether to clobber it or not?
I agree. I might add though (and this is my first time on linux forums, so I'm not sure how this is generally addressed) that this problem can easily affect non-technical end users as well, which would likely not be comfortable with shell commands such as top or systemd-cgtop. In case it is helpful, I'll attach a screenshot of how macOS solves this problem through a GUI interface for force-quitting that includes current memory usage for each application. I think this is a nice way of empowering the user to make the decision. However, I could not find an existing linux/GNOME GUI that does something similar. In the meantime, perhaps improving the heuristics of the resource manager so core services like GNOME Shell are not killed would be helpful. And I think that so long as auto-killing is the active solution, it would be nice for any apps that are killed due to memory constraints for there to be a system alert informing the user as to this decision, since the event otherwise seems quite anomalous.
Created attachment 1885855 [details] macos force quit GUI screenshot
Created attachment 1886712 [details] journalctl -u systemd-oomd -g Killed | grep -v Boot My daughter is using Fedora 35 with Cinnamon Desktop and has been complaining about randomly being logged out. We'll see how things go with systemd-oomd disabled/masked.