Bug 2140664

Summary: oomd kills too often with kernel-6.1
Product: [Fedora] Fedora Reporter: ojab <bugzilla.redhat.com>
Component: systemdAssignee: systemd-maint
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 37CC: dtardon, fedoraproject, filbranden, flepied, lnykryn, msekleta, ryncsn, ssahani, s, systemd-maint, yuwatana, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-08 09:11:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description ojab 2022-11-07 14:43:18 UTC
Description of problem:

I intalled kernel-6.1 to test laptop acpi backlight and found out that systemd-oom often kills firefox and other apps without any apparent reason, while with kernel <= 6.0 it works just fine. In journal it looks like:
Killed /system.slice/packagekit.service due to memory pressure for /system.slice being 84.61% > 50.00% for > 20s with reclaim activity

and looking into PSI, it's overly high even on the overall idle system:
/proc/pressure/memory:some avg10=87.29 avg60=87.07 avg300=85.25 total=26956843013
/proc/pressure/memory:full avg10=80.50 avg60=78.87 avg300=75.88 total=24901061065

Version-Release number of selected component (if applicable):
kernel-6.1.0-0.rc3.20221102git8f71a2b3f435.29.fc38.x86_64

How reproducible:
Too often, but from time to time

Steps to Reproduce:
1. Run firefox
2.
3.

Actual results:
It's killed from time to time

Expected results:
It should be killed way less often

Additional info:
I _guess_ maybe MGLRU leads to incorrect/scewed PSI information and oomd acts on it.

Comment 1 ojab 2022-11-07 14:51:11 UTC
OTOH disabling MGLRU
$ cat /sys/kernel/mm/lru_gen/enabled
0x0000
(I _guess_ it's more-or-less disabled that way) doesn't help:
/proc/pressure/memory:some avg10=87.37 avg60=84.65 avg300=84.53 total=27381680608
/proc/pressure/memory:full avg10=79.59 avg60=73.70 avg300=73.74 total=25270369033
after 5 minutes or so

Comment 2 ojab 2022-11-07 14:57:02 UTC
After reboot to kernel-6.0.6-300.fc37.x86_64 with the same usage pattern:
/proc/pressure/memory:some avg10=0.00 avg60=0.00 avg300=0.00 total=0
/proc/pressure/memory:full avg10=0.00 avg60=0.00 avg300=0.00 total=0

and system overall is way more responsible

Comment 3 David Tardon 2022-11-08 09:11:00 UTC

*** This bug has been marked as a duplicate of bug 2133829 ***