Bug 2133829

Summary: systemd-oomd killing anaconda, plasma shell, Firefox in KDE Rawhide
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rawhideCC: acaringi, adscvr, agurenko, airlied, alciregi, bskeggs, bugzilla, bugzilla.redhat.com, fedora, grgoffe, hdegoede, hpa, jarodwilson, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, mikhail.v.gavrilov, ptalbert, steved, the.anitazha, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: openqa
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-21 17:01:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Adam Williamson 2022-10-11 14:25:54 UTC
Since Fedora-Rawhide-20221010.n.0 landed (just today; I assume the s390x outage somehow delayed the compose), I'm seeing a lot of openQA KDE test failures which seem to be caused by things getting OOM killed.

We run the openQA tests with 3G of RAM, and haven't had issues with this since the bug earlier in the cycle where memory usage was being calculated completely wrong. But now it's hitting the desktop_browser test quite frequently. Either Firefox itself gets OOM killed, or the test gets past that point, but later on, while it's doing stuff at a console, some key KDE component gets OOM killed in the background and auto-restarted, which makes the system switch to the graphical VT and breaks the test.

I also grabbed the KDE live image from the compose and tried installing it in a local VM with 2G of RAM; the installer got OOM killed near the end, then after that, there was a cycle of KDE components constantly getting OOM killed with the memory pressure logged as nearly 100%, even though `free -h` reported 1.0G 'available'.

I initially figured systemd or KDE changes would be to blame here, but neither systemd nor any KDE component seems to have changed in the 20221010.n.0 compose. The only vaguely relevant change seems to be the kernel, from kernel-6.1.0-0.rc0.20221007git4c86114194e6.5.fc38 to kernel-6.1.0-0.rc0.20221009gita6afa4199d3d.7.fc38 . Was there any change to the memory pressure calculations or anything in that span?

Comment 1 Adam Williamson 2022-10-13 08:04:54 UTC
I'm even seeing a few cases where it's apparently killing a console session with DNF running in it(!), e.g. here:
https://openqa.fedoraproject.org/tests/1505604#
If you watch the video of that test - https://openqa.fedoraproject.org/tests/1505604/video?filename=video.ogv - around the 18 second mark, it's in the middle of a `dnf install` command when suddenly it drops back to the login prompt. In the journal we see:
Oct 12 13:28:28 fedora systemd-oomd[543]: Killed /user.slice/user-0.slice/session-4.scope due to memory pressure for /user.slice/user-0.slice being 94.16% > 50.00% for > 20s with reclaim activity
You can download a tarball of /var/log at https://openqa.fedoraproject.org/tests/1505604/file/_advisory_update-var_log.tar.gz . Note systemd in Rawhide has a new journal format, so you can't read the journal files from F37 or earlier with `journalctl --file`; you have to do it in a Rawhide mock chroot or something. The test boots a clean KDE install, leaves it sitting at the login prompt, then switches to a text console to do some preparation; the OOM kill and test failure happened during this phase. It was run with 3G of RAM.

Comment 2 Chris Murphy 2022-10-27 01:35:14 UTC
Is it still happening?

Comment 3 Adam Williamson 2022-10-27 03:51:54 UTC
it was definitely still a big issue up to the 1014 kernel. I think the 1019 kernel may have made things better, but I want to watch it a bit longer.

Comment 4 Chris Murphy 2022-10-29 02:56:50 UTC
Possible dup bug 2138369

Comment 5 George R. Goffe 2022-10-29 17:49:18 UTC
Chris,

After reading your reporting, I find myself agreeing that BugĀ 2138369 is a dup of this bug.

I'm running the "latest" iso from dl.fedoraproject.org Fedora-Everything-netinst-x86_64-Rawhide-20221029.n.0.iso in a rescue environment. So far dnf seems to be working... No OOM yet. :-)

Thank you for  your VERY FINE work with this.

Best regards,

George...

Comment 6 amatej 2022-10-31 13:15:34 UTC
*** Bug 2138369 has been marked as a duplicate of this bug. ***

Comment 7 Adam Williamson 2022-10-31 17:52:52 UTC
This does seem to still be a problem in current Rawhide. e.g. the KDE live install tests for recent Rawhide composes usually fail, like this one today:
https://openqa.fedoraproject.org/tests/1555308#
that pretty clearly failed because first anaconda and then KDE got OOM-killed during the install process. That was with kernel-6.1.0-0.rc2.20221028git23758867219c.24.fc38 .

Tomorrow's compose with the non-debug rc3 kernel will likely work better, but there's clearly still a problem here.

Comment 8 Adam Williamson 2022-10-31 17:53:42 UTC
Note, that test ran with 3G of RAM. I can kick the kde-live-iso compose tests to 4G of RAM too I guess and that might help mitigate this. But up till this 6.1 kernel cycle this test used to run fine in 2G of RAM, even on debug kernels.

Comment 9 Chris Murphy 2022-10-31 17:57:27 UTC
This smells like bug 2119518

Comment 10 Adam Williamson 2022-10-31 18:25:38 UTC
No, it's not the same as that. The numbers aren't wildly insane like they were in that case. We don't see multiple millions of percent of memory pressure, we see, like, 65% or 70%.

It feels more like the calculation is being performed as intended, but it's just too aggressive. I don't know what the *change* was - whether the calculation has been made intentionally more aggressive, or there just really *is* more memory "pressure" happening for some reason, or what. But it seems fairly clearly to be killing stuff in situations that it doesn't really need to.

To be honest, I'm worried our current approach seems a bit too aggressive even on F37 with kernel 5.19, on my day-to-day laptop. I switched to a laptop with 8G of RAM (instead of 16G) recently, and only 6.5G of that is usable (1.5G used by the GPU, I think). I got a 6.5G ZRAM swap partition (this is a default install of Silverblue 37). Just the other day, my entire desktop got OOM killed because I was dumb enough to open a new browser tab while Sound Converter was converting some audio. If the system had waited 15 seconds for it to finish, things would probably have been fine. I run Evolution, Firefox, gedit, some terminals, and occasionally an extra app like sound converter. Evo uses quite a lot of memory because I have a lot of mail, but still, 8G of RAM isn't a small amount and it's weird that I run into OOM kills semi-often. More than once I've opened a new tab in Firefox, things ground to a halt for a bit, and then I realized Evo had disappeared because it got OOM killed...

I can just go get a laptop with 16G of RAM again, and probably will, one way or another. But I'm not convinced this OOM killing is really giving me a better experience than I'd have if I just left the kernel and swap partition to figure things out. I might turn it off for a while and see how things go...

Comment 11 Chris Murphy 2022-11-02 02:04:05 UTC
I tried this in a VM and couldn't reproduce it with Fedora-KDE-Live-x86_64-Rawhide-20221101.n.0.iso and kernel 6.1.0-0.rc3.27.fc38.x86_64

However https://openqa.fedoraproject.org/tests/1561371 used 6.1.0-0.rc3.27.fc38.x86_64 and it failed with this problem.

The log is in  _do_install_and_reboot-var_log.tar.gz 

Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: systemd-oomd killed some process(es) in this unit.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Killing process 2372 (anaconda) with signal SIGKILL.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Killing process 2384 (dbus-daemon) with signal SIGKILL.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Killing process 2387 (python3) with signal SIGKILL.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Killing process 2398 (python3) with signal SIGKILL.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Killing process 2404 (python3) with signal SIGKILL.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Killing process 2419 (python3) with signal SIGKILL.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Killing process 2420 (python3) with signal SIGKILL.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Killing process 2422 (python3) with signal SIGKILL.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Killing process 2423 (python3) with signal SIGKILL.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Killing process 2424 (python3) with signal SIGKILL.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Killing process 2427 (python3) with signal SIGKILL.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Killing process 3118 (rsync) with signal SIGKILL.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Killing process 3120 (rsync) with signal SIGKILL.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Killing process 3121 (rsync) with signal SIGKILL.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Failed to kill control group /user.slice/user-1000.slice/user/app.slice/app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope, ignoring: Operation not permitted
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Failed with result 'oom-kill'.
Nov  1 19:38:02 fedora systemd[1303]: app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope: Consumed 1min 470ms CPU time.
Nov  1 19:38:02 fedora systemd[1303]: selinux: avc:  op=setenforce lsm=selinux enforcing=0 res=1
Nov  1 19:38:02 fedora systemd-oomd[980]: Considered 16 cgroups for killing, top candidates were:
Nov  1 19:38:02 fedora systemd-oomd[980]: #011Path: /user.slice/user-1000.slice/user/app.slice/app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Pressure Limit: 0.00%
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pressure: Avg10: 93.01 Avg60: 36.96 Avg300: 9.09 Total: 28s
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Current Memory Usage: 1.9G
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Min: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Low: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pgscan: 2726119
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Last Pgscan: 2726119
Nov  1 19:38:02 fedora systemd-oomd[980]: #011Path: /user.slice/user-1000.slice/user/app.slice/app-org.kde.kalendarac
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Pressure Limit: 0.00%
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Current Memory Usage: 403.1M
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Min: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Low: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pgscan: 19614
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Last Pgscan: 19614
Nov  1 19:38:02 fedora systemd-oomd[980]: #011Path: /user.slice/user-1000.slice/user/app.slice/dbus-:1.2-com.redhat.imsettings
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Pressure Limit: 0.00%
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Current Memory Usage: 43.0M
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Min: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Low: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pgscan: 1811
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Last Pgscan: 1811
Nov  1 19:38:02 fedora systemd-oomd[980]: #011Path: /user.slice/user-1000.slice/user/app.slice/xdg-desktop-portal-gnome.service
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Pressure Limit: 0.00%
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Current Memory Usage: 28.0M
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Min: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Low: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pgscan: 9887
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Last Pgscan: 9887
Nov  1 19:38:02 fedora systemd-oomd[980]: #011Path: /user.slice/user-1000.slice/user/app.slice/app-org.kde.kdeconnect.daemon
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Pressure Limit: 0.00%
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Current Memory Usage: 21.2M
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Min: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Low: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pgscan: 1077
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Last Pgscan: 1077
Nov  1 19:38:02 fedora systemd-oomd[980]: #011Path: /user.slice/user-1000.slice/user/app.slice/app-kaccess
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Pressure Limit: 0.00%
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Current Memory Usage: 17.2M
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Min: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Low: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pgscan: 187
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Last Pgscan: 187
Nov  1 19:38:02 fedora systemd-oomd[980]: #011Path: /user.slice/user-1000.slice/user/app.slice/xdg-desktop-portal-gtk.service
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Pressure Limit: 0.00%
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Current Memory Usage: 14.1M
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Min: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Low: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pgscan: 382
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Last Pgscan: 382
Nov  1 19:38:02 fedora systemd-oomd[980]: #011Path: /user.slice/user-1000.slice/user/app.slice/app-org.freedesktop.problems.applet
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Pressure Limit: 0.00%
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Current Memory Usage: 5.7M
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Min: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Low: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pgscan: 1360
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Last Pgscan: 1360
Nov  1 19:38:02 fedora systemd-oomd[980]: #011Path: /user.slice/user-1000.slice/user/app.slice/ssh-agent.service
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Pressure Limit: 0.00%
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Current Memory Usage: 1.0M
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Min: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Low: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pgscan: 13
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Last Pgscan: 13
Nov  1 19:38:02 fedora systemd-oomd[980]: #011Path: /user.slice/user-1000.slice/user/app.slice/dbus-:1.2-org.freedesktop.secrets
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Pressure Limit: 0.00%
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Current Memory Usage: 1.0M
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Min: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Memory Low: 0B
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Pgscan: 0
Nov  1 19:38:02 fedora systemd-oomd[980]: #011#011Last Pgscan: 0
Nov  1 19:38:02 fedora systemd-oomd[980]: Killed /user.slice/user-1000.slice/user/app.slice/app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope due to memory pressure for /user.slice/user-1000.slice/user/app.slice being 92.71% > 50.00% for > 20s with reclaim activity


On my slow VM I see plasma-shell is using 22% memory by itself for quite a while but then it calms down to about 5%. I wonder if my slow VM allows enough time to go by for things to settle, whereas openqa is running on much beefier hardware possibly permitting it to slam right into this high memory usage. 

Anyway, maybe we need to let more time go by? Or bump both the % pressure and time threshold? Or look at the amount of reclaim to see if that's affecting IO in some way that translates into not making adequate forward progress? In my case, it definitely eventually finishes the installation.

Comment 12 Thorsten Leemhuis 2022-11-02 09:32:51 UTC
I wonder if Bug 2134971 is a duplicate of this or related. And to quote from Bug 2134971 Comment 4:

> The latest kernel that I cannot reproduce the problem on is kernel-6.1.0-0.rc0.20221007git4c86114194e6.5.fc38.
> The next one, kernel-6.1.0-0.rc0.20221009gita6afa4199d3d.7.fc38, inevitably leads to systemd-oomd starting to
> kill random applications and eventually gnome-session after some time.

Comment 13 Anita Zhang 2022-11-03 08:36:22 UTC
The logs only list up to 10 candidates but I think when one leaf cgroup has avg10 > 90 it's driving up the overall pressure for its parent cgroup. So in Chris's log /user.slice/user-1000.slice/user/app.slice/app-liveinst-da6d8311ca2146f4acd49e74909e2873.scope with avg10 93.01 is driving /user.slice/user-1000.slice/user/app.slice to 92.71 which is over the threshold of 50. This seems to hold true for https://bugzilla.redhat.com/show_bug.cgi?id=2134971 too, but since we only dump 10 of many candidates a lot of the kill situations show candidates with avg10 0.00 but parent cgroup with avg10 > 90. I suspect there is a candidate outside of the 10 we dump that has avg10 > 90. systemd-oomd uses the "full" metric for memory pressure so I don't think one cgroup should be driving the parent's pressure up that much.

Doing a `git log 4c86114194e6..a6afa4199d3d` on the linux repo, https://lore.kernel.org/all/20220915094200.139713-1-hch@lst.de/ seems the most suspicious as it's moving/modifying the PSI accounting code. If it's easy to test maybe try before and after patchset? I'm going to try and pull in one of our resctl developers to help.

Comment 14 Thorsten Leemhuis 2022-11-03 08:58:44 UTC
I tried bisecting and got pretty far, but the results make no sense to me, hence I assume something went sideways

0a78a376ef3c is fine, c6ea70604249 (which basically is the next commit) is not. But reverting that commit there doesn't solve the problem, which IOW means that the merge base (v6.0-rc2) is broken, too. That's odd.

Comment 15 Thorsten Leemhuis 2022-11-03 10:05:48 UTC
To quote myself from Bug 2134971 Comment 10:

> I can reproduce the problem easily on master as of today (8e5423e991e8), but it looks like (tests are running for ~15 minutes now, normally they triggered the problem in a minute or two) the problem vanishes by reverting 4088a47e78f9 ("btrfs: add manual PSI accounting for compressed reads") on top of master.

Comment 16 Chris Murphy 2022-11-04 02:17:44 UTC
Patch posted we could test in Rawhide if desired...
https://lore.kernel.org/regressions/Y2Q+y8t9PV5nrjud@cmpxchg.org/

Comment 17 Thorsten Leemhuis 2022-11-04 07:38:18 UTC
*** Bug 2134971 has been marked as a duplicate of this bug. ***

Comment 18 Thorsten Leemhuis 2022-11-04 07:39:14 UTC
The proposed patch fixes the issue for me: https://lore.kernel.org/regressions/5f7bac77-c088-6fb7-ccb5-bef9267f7186@leemhuis.info/

Comment 19 David Tardon 2022-11-08 09:11:00 UTC
*** Bug 2140664 has been marked as a duplicate of this bug. ***

Comment 20 Adam Williamson 2022-11-17 21:10:08 UTC
so, Justin did backport the fix for this last week. I had already worked around the issue in openQA (by disabling systemd-oomd), but some manual testing does suggest the fix probably works. I'll try removing the workaround on openQA stg and check we don't get problems again.

Comment 21 Adam Williamson 2022-11-21 17:01:21 UTC
Bugs here didn't come back when I removed the workarounds, so I'm pretty sure the fix is good. Thanks!

Comment 22 George R. Goffe 2022-12-09 13:53:11 UTC
Adam,

I just finished installing a fresh spin Fedora-Everything-netinst-x86_64-Rawhide-20221129.n.0.iso into a VBox VM and then upgraded with about half of the groups. I was continually being oom-killed. I stopped systemd-oomd.service and had no further problems.

It looks like the oom-killer bug is still with us?

Best regards,

George...

Comment 23 Adam Williamson 2022-12-09 16:18:42 UTC
Not this specific bug, no. If this bug was back, openQA tests would have started blowing up.

If you can reproduce it killing things in a situation where it definitely should not, file a new bug with details - the relevant log messages, information on how much RAM is actually being used, and so on.