Bug 1829942
Summary: | Multiple OOMKs with 5.6.7. Never OOMed on 5.x, x < 6. Regression? | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Arcadiy Ivanov <arcadiy> | ||||
Component: | kernel | Assignee: | systemd-maint | ||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 31 | CC: | airlied, bskeggs, hdegoede, ichavero, itamar, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, lnykryn, masami256, mchehab, mjg59, msekleta, ssahani, s, steved, systemd-maint, y9t7sypezp, zbyszek | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-11-24 18:47:34 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Arcadiy Ivanov
2020-04-30 15:50:16 UTC
Thanks for your report. There is usually a list of kernel modules at the end of a Call Trace. Specifically, why is the kernel reporting that it is tainted? [235793.944852] CPU: 3 PID: 1 Comm: systemd Tainted: G U OE 5.6.7-200.fc31.x86_64 #1 ^^^^^^^ Can you reproduce the problem without any non-Fedora kernel modules? It looks like systemd-journald triggered the oom-killer, but with an oom_score_adj of -250, another process was selected for termination. Do you have an ABRT problem report for it? [235795.348134] systemd[1]: systemd-journald.service: State 'stop-watchdog' timed out. Terminating. [235805.287209] systemd[1]: systemd-journald.service: Main process exited, code=dumped, status=6/ABRT ^^^^^^^^^^^ (In reply to Steve from comment #1) > Thanks for your report. There is usually a list of kernel modules at the end > of a Call Trace. > > Specifically, why is the kernel reporting that it is tainted? > > [235793.944852] CPU: 3 PID: 1 Comm: systemd Tainted: G U OE > 5.6.7-200.fc31.x86_64 #1 > ^^^^^^^ > > Can you reproduce the problem without any non-Fedora kernel modules? This is simple NVidia module and I'll see if I can. (In reply to Steve from comment #2) > It looks like systemd-journald triggered the oom-killer, but with an > oom_score_adj of -250, another process was selected for termination. > > Do you have an ABRT problem report for it? > > [235795.348134] systemd[1]: systemd-journald.service: State 'stop-watchdog' > timed out. Terminating. > [235805.287209] systemd[1]: systemd-journald.service: Main process exited, > code=dumped, status=6/ABRT > > ^^^^^^^^^^^ No I can't sorry, its not available anywhere, probably because abrt crashed as well. *** Bug 1831380 has been marked as a duplicate of this bug. *** Looking at the process list you are running a lot of big programs, but with 64GB of RAM that should not be a problem. Looking at the logs you provided, the clue seems to be the first line: [235793.944847] systemd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 So it is not the kernel itself which is deciding it needs to run the oom-killer, it is systemd. I hope the systemd maintainers will have a better idea of when / why systemd is doing this, so lets move this over to systemd. (In reply to Hans de Goede from comment #6) > I hope the systemd maintainers will have a better idea of when / why systemd > is doing this, so lets move this over to systemd. I don't think it's systemd. I just was able to trigger it in the most bizarre way imaginable. Here is my swappiness: $ sysctl vm.swappiness vm.swappiness = 1 After I've encountered the above bug I've added 32GB of swap in addition to my 64GB of RAM. I was just minutes ago building customer's code and noticed in my KDE System Load Viewer swap bar sitting high. I scrolled over and 8GB of swap has been consumed building docker images with at the most 20GB of RAM being utilized. I go "wth?!" and `sudo swapoff -a` while the build is ongoing. Swapoff freezes. I'm looking at the graph of swap utilization and it's going down having swap `total` follow `used` as the `used` drops. But the `used` drops very slowly as if there is a huge pressure to keep things in swap. It takes tens of seconds for used swap to shed 2GB out of 8GB used. And then OOM killer kills the `swapoff` (!!!) and a whole bunch of browser tabs, and browser plugins and Slack... And the RAM utilization never exceeded 20GB, if that. I'm attaching the journal showing all the things killed and this time it wasn't systemd. Created attachment 1686110 [details]
Swapoff OOMKs half the processes with 40GB of RAM to spare
``` $ journalctl -xe | grep "invoked oom" May 07 05:24:05 ai-karellen-lap kernel: docker-squash invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 May 07 05:24:05 ai-karellen-lap kernel: Telegram invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 May 07 05:24:05 ai-karellen-lap kernel: slack invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 ``` Furthermore, I'm fairly certain that Telegram and Slack don't really "decide" to "invoke" OOMK. This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |