Bug 1977610 - systemd-oomd kills all terminal tabs when one process misbehaves
Summary: systemd-oomd kills all terminal tabs when one process misbehaves
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: konsole5
Version: 37
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Rex Dieter
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-30 07:34 UTC by Tomáš Trnka
Modified: 2023-12-05 21:01 UTC (History)
17 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2023-12-05 21:01:22 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
KDE Software Compilation 439805 0 NOR UNCONFIRMED RFE: move individual tabbed/windowed processes into their own system scope 2021-07-13 14:49:48 UTC

Description Tomáš Trnka 2021-06-30 07:34:58 UTC
systemd-oomd is unhelpfully overzealous when it comes to shell-based workflows. If a process launched from a shell in a graphical terminal causes an OOM condition, systemd-oomd will pull the plug on not just that process, but also the parent interactive shell as well as on other unrelated shells and processes running in other tabs.

Just as an example, this is a typical Konsole window of mine:
├─app-org.kde.konsole-35f4d4d460414ab18dff107b3812647f.scope 
│ ├─ 4321 /usr/bin/konsole -session 10d9dec663000162221503500000038400042_1624953255_33552
│ ├─ 4455 /bin/bash
│ ├─ 4459 /bin/bash
│ ├─ 4465 /bin/bash
│ ├─ 4474 /bin/bash
│ ├─ 4482 /bin/bash
│ └─10756 wish /usr/bin/gitk -- master

I am a developer, so I occasionally happen to run something that eats up all the RAM. That's just a fact of life. It would be great if that offending process got killed before my machine grinds to a halt, so using PSI to respond to such situations sounds like a great idea. I would thus prefer improving systemd-oomd instead of uninstalling it completely (which is the only solution I see right now).

Note that while it would theoretically be possible to create one cgroup per Konsole tab and thus somewhat limit the mayhem caused by systemd-oomd, this still doesn't protect the parent bash at all. Prefixing every single shell command with "systemd-run" is not a sustainable solution.

Can we perhaps teach systemd-oomd to optionally just kill the offending process instead of a whole cgroup?

Comment 1 Chris Murphy 2021-07-13 14:29:54 UTC
Since oomd works by cgroups v2 accounting to determine resource shortages, it's not possible for it to know there's an offending process. It only knows if there's a cgroup that's producing the pressure, and it kills everything in that cgroup. What's needed is enhancement for the program, Konsole in this case, so split out its processes into separate cgroups.

Comment 2 Tomáš Trnka 2021-07-13 14:44:21 UTC
Unfortunately, no amount of enhancing Konsole is going to solve the issue, because that would still mean the interactive shell would be in the same cgroup with its children. One could theoretically modify Bash to run every single command in a separate cgroup, but I'm not sure how practical that approach is. (Such a modification would also need to be done to every major shell, which sounds like quite a lot of work just to workaround the peculiarities of systemd-oomd).

oomd can still keep using cgroups for the accounting, but it IMHO shouldn't be too hard to teach it not to kill the whole cgroup at once, but just kill one of its member processes at a time (perhaps the one with the highest RSS or so).

Comment 3 Chris Murphy 2021-07-13 14:54:52 UTC
It works as I've described in GNOME Terminal, and I think Konsole should do something similar so I opened an upstream RFE, and referenced the code used in GNOME.
https://gitlab.gnome.org/GNOME/vte/-/blob/master/src/systemd.cc

Comment 4 Chris Murphy 2021-07-13 15:04:30 UTC
Also, while oomd is one particular example of depending on cgroups and systemd slice/scope organization, that's not the only reason for this organization. Perhaps more important is ability to limit the resources a process is using via the cgroup assignment. It can have memory, IO, and cpu usage restricted, which may postpone or avoid the need to kill it.

Comment 5 Fabrice Salvaire 2022-01-29 22:13:51 UTC
Nowadays, such Linux behaviour is really great when you launch a program like a scientific computing that will eat memory... !!!

An enhancement would be to crash a whole datacenter just because a student wrote a faulty program...

This feature is WRONG:

* if it doesn't target the faulty process

* Firefox is unable to garbage collect is GB of RAM and do an emergency exit properly

Of course, it is less shameful than pressing the on/off button just because the Linux kernel is overshooted.

Comment 6 Dr M C Nelson 2022-04-21 00:11:17 UTC
I second Fabrice Salvaire's comment of 2022-01-29.

The oomd behavior, is simply wrong.  This effectively disables my use of my computer for processing large data sets.

We MUST have a simple way to exclude a particular command from being killed by OOMD, on a "no matter what" basis.

To repeat,  this zealous uncontrollable behavior of OOMD absolutely, unequivocally ruins the OS for a large of swath of practical use cases for scientific users.

PLEASE FIX IT!!!

Comment 7 Dr M C Nelson 2022-04-21 00:28:43 UTC
After poking around, I find there is a control for eligibility.  A convenient way to set that (for a specific process) would help, and making it easier to find in the documentation would help.  Else, the only solution seems to be to turn it off.

Comment 8 Ben Cotton 2022-05-12 16:25:58 UTC
This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 9 Ben Cotton 2022-06-08 00:42:20 UTC
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07.

Fedora Linux 34 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 10 Chris Murphy 2022-06-08 13:38:43 UTC
Still open upstream, reopening here and changing to Rawhide.

Comment 11 Ben Cotton 2022-08-09 13:11:49 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 37 development cycle.
Changing version to 37.

Comment 12 Aoife Moloney 2023-11-23 00:05:45 UTC
This message is a reminder that Fedora Linux 37 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 37 on 2023-12-05.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '37'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 37 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 13 Aoife Moloney 2023-12-05 21:01:22 UTC
Fedora Linux 37 entered end-of-life (EOL) status on None.

Fedora Linux 37 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.