Bug 1941340
Summary: | systemd-oomd kills wrong application | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Vasiliy Glazov <vascom2> | ||||
Component: | systemd | Assignee: | systemd-maint | ||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 34 | CC: | bberg, bugzilla, davide, ego.cordatus, fedoraproject, filbranden, flepied, hakavlad, kasong, lnykryn, mcatanza, mikhail.v.gavrilov, msekleta, sergiobelkin, ssahani, s, systemd-maint, the.anitazha, yuwatana, zbyszek | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | systemd-248.2-1.fc34 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-05-11 02:08:04 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Vasiliy Glazov
2021-03-21 17:56:24 UTC
Could you attach a complete 'journalctl -b -o short-monotonic --no-hostname' to the bug? Termination by systemd-oomd looks like this: Mar 17 16:27:03 fmac.local systemd[1590]: vte-spawn-8733851b-87fc-4a37-a4f8-14a2b062fc1d.scope: A process of this unit has been killed by the OOM killer. Termination by kernel oom killer looks like this: Mar 17 16:39:13 fmac.local kernel: threaded-ml invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 Created attachment 1765136 [details]
Complete reauested log
If it was killed not by systemd-oomd then why?
Sorry, I got confused myself which sorta demonstrates that it's a bit confusing generally. systemd will report both kernel OOM killer events, and systemd-oomd kill events. From your journal, these are contiguous lines in the log: [92304.697105] kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user/app.slice/app-org.kde.konsole-946621aa8ede4246a97ab327297b2018.scope,task=tail,pid=196255,uid=1000 [92304.697120] kernel: Out of memory: Killed process 196255 (tail) total-vm:36708724kB, anon-rss:29702452kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:71464kB oom_score_adj:0 [92920.942473] systemd[1540]: app-org.kde.konsole-946621aa8ede4246a97ab327297b2018.scope: A process of this unit has been killed by the OOM killer. [92920.985128] systemd[1]: user: A process of this unit has been killed by the OOM killer. [92920.988629] systemd-oomd[1538]: Swap used (8270032896) / total (8489267200) is more than 90.00% [92920.988956] systemd[1540]: app-org.kde.konsole-946621aa8ede4246a97ab327297b2018.scope: systemd-oomd killed 4 process(es) in this unit. The first two are the kernel oom killer reporting killing tail. And the next four lines are systemd-oomd killing four more processes in that same scope. What I'm a bit confused about is the 3rd entry, why systemd takes ~620 seconds to report the first two lines. It's not a problem per se, I'm just wondering why the delay. But yeah the systemd-oomd processes killed, we don't know. Only the scope is reported. [98738.133138] systemd[1540]: app-firefox-f1d1b036af0d4676920c9f7ef1aa816a.scope: systemd-oomd killed 113 process(es) in this unit. This is a known issue, bug 1931934. Except for the firefox one, they're all app-org.kde.konsole...scope units suggesting they're all Konsole initiated processes. No other scopes were killed by systemd-oomd. The rest were killed by the kernel oom killer. [92304.696684] kernel: InputThread invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [97542.177706] kernel: G1 Young RemSet invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [98066.861569] kernel: minidlnad invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [99115.688344] kernel: tail invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0 The first three I think are examples of what we want to avoid. The kernel oom killer just isn't reliable, and my thought is possibly systemd-oomd is being too conservative in those cases, and didn't kill something else soon enough, such that the kernel oom killer got involved. If the stress-ng test was involved in these cases, it might expose the limits of that particular synthetic test. Anita please correct anything I've explained incorrectly, gotta learn somehow :) So systemd-oomd not kills any? It is not expected behaviour too. Maybe systemd-oomd checks memory with min interval 1 sec and therefore cannot respond quickly. earlyoom checks memory with interval 0.1-1s and easily can quick handle such stresses due to low interval and therefore there were no such problems with earlyoom. >why systemd takes ~620 seconds to report the first two lines
Seems like stdout.flush() was not performed by systemd-oomd after corrective action. - I think this is separate bug, if so. This problem is easy to fix.
>The kernel oom killer just isn't reliable This is only true for the vanilla kernel. There are patches[1] that completely solve the problem (sorry for the offtop). [1] https://github.com/hakavlad/le9-patch Also I see in htop that systemd-oomd always use 0.7-1.3% of one CPU core. And 0.5% by ps command. I have fast AMD Ryzen 7 Pro 4750G. (In reply to Chris Murphy from comment #3) > Sorry, I got confused myself which sorta demonstrates that it's a bit > confusing generally. systemd will report both kernel OOM killer events, and > systemd-oomd kill events. > > From your journal, these are contiguous lines in the log: > > [92304.697105] kernel: > oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0, > global_oom,task_memcg=/user.slice/user-1000.slice/user/app. > slice/app-org.kde.konsole-946621aa8ede4246a97ab327297b2018.scope,task=tail, > pid=196255,uid=1000 > [92304.697120] kernel: Out of memory: Killed process 196255 (tail) > total-vm:36708724kB, anon-rss:29702452kB, file-rss:0kB, shmem-rss:0kB, > UID:1000 pgtables:71464kB oom_score_adj:0 > [92920.942473] systemd[1540]: > app-org.kde.konsole-946621aa8ede4246a97ab327297b2018.scope: A process of > this unit has been killed by the OOM killer. > [92920.985128] systemd[1]: user: A process of this unit has > been killed by the OOM killer. > [92920.988629] systemd-oomd[1538]: Swap used (8270032896) / total > (8489267200) is more than 90.00% > [92920.988956] systemd[1540]: > app-org.kde.konsole-946621aa8ede4246a97ab327297b2018.scope: systemd-oomd > killed 4 process(es) in this unit. > > The first two are the kernel oom killer reporting killing tail. And the next > four lines are systemd-oomd killing four more processes in that same scope. > What I'm a bit confused about is the 3rd entry, why systemd takes ~620 > seconds to report the first two lines. It's not a problem per se, I'm just > wondering why the delay. But yeah the systemd-oomd processes killed, we > don't know. Only the scope is reported. > > [98738.133138] systemd[1540]: > app-firefox-f1d1b036af0d4676920c9f7ef1aa816a.scope: systemd-oomd killed 113 > process(es) in this unit. > > This is a known issue, bug 1931934. > > Except for the firefox one, they're all app-org.kde.konsole...scope units > suggesting they're all Konsole initiated processes. No other scopes were > killed by systemd-oomd. The rest were killed by the kernel oom killer. > > [92304.696684] kernel: InputThread invoked oom-killer: > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 > [97542.177706] kernel: G1 Young RemSet invoked oom-killer: > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 > [98066.861569] kernel: minidlnad invoked oom-killer: > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 > [99115.688344] kernel: tail invoked oom-killer: > gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0 > > > The first three I think are examples of what we want to avoid. The kernel > oom killer just isn't reliable, and my thought is possibly systemd-oomd is > being too conservative in those cases, and didn't kill something else soon > enough, such that the kernel oom killer got involved. If the stress-ng test > was involved in these cases, it might expose the limits of that particular > synthetic test. > > Anita please correct anything I've explained incorrectly, gotta learn > somehow :) Your description of the situation looks right, Chris. And it is strange that systemd was unresponsive until 600 seconds later. I wonder if the system hung due to the kernel OOM kill? On my laptop when the kernel OOM killer kills `tail` systemd will report it within a second. The way it's set up now systemd-oomd is quite conservative; if there are multiple tail processes in different cgroups trying to eat up memory systemd-oomd will not kill them all immediately. There is currently a ~15 second delay between each systemd-oomd kill to prevent kill storms and to give the system time to recover. In addition to lowering the systemd-oomd interval polling time from 1 second, I will discuss with our resource control team about removing the post kill delay. These two things should improve the experience a bit. And then the other thing I have on my list is improving some of the logging related to a systemd-oomd kill. (In reply to Vasiliy Glazov from comment #4) > So systemd-oomd not kills any? > It is not expected behaviour too. After the kernl OOM killer killed `tail`, after those 600 seconds swap usage remained high and systemd-oomd did kill 4 processes under app-org.kde.konsole-946621aa8ede4246a97ab327297b2018.scope. So it's not ideal but it is doing something. The response can definitely be improved as I mention above. (In reply to Vasiliy Glazov from comment #8) > Also I see in htop that systemd-oomd always use 0.7-1.3% of one CPU core. > And 0.5% by ps command. > I have fast AMD Ryzen 7 Pro 4750G. This is related to polling pgscan for all the cgroup nodes under the monitored cgroup (in this case all the cgroup nodes under user@.service). I plan to optimize this by only polling all the nodes when the limit is about to be hit, but since no one had actually brought this up as an issue I didn't fix it immediately. Sorry! My PR to address some of the perf issues and generally make swap monitoring more responsive was merged https://github.com/systemd/systemd/pull/19126 FEDORA-2021-775e2373a1 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-775e2373a1 FEDORA-2021-775e2373a1 has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-775e2373a1` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-775e2373a1 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2021-775e2373a1 has been pushed to the Fedora 34 stable repository. If problem still persists, please make note of it in this bug report. |