Bug 2177722

Summary: systemd-oomd extremely aggressive in killing non-DE logins (sshd or console) and any commands running in them
Product: [Fedora] Fedora Reporter: Andre Robatino <robatino>
Component: systemdAssignee: systemd-maint
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 38CC: agurenko, awilliam, davide, fedoraproject, filbranden, gmarr, lnykryn, michel, msekleta, ngompa13, redhat, ryncsn, systemd-maint, the.anitazha, yuwatana, zbyszek
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard: AcceptedFreezeException
Fixed In Version: systemd-253.2-1.fc39 systemd-253.2-1.fc38 systemd-251.14-2.fc37 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-04-01 00:17:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2083913    
Attachments:
Description Flags
output of "journalctl -b" on the 2GB F38 VM
none
output of "journalctl -b" on 4GB F37 bare metal none

Description Andre Robatino 2023-03-13 13:58:34 UTC
Description of problem:
When using an ssh or console login on a machine with low RAM, systemd-oomd often will kill the login together with anything running in it, especially if the command is resource-intensive. When I perform the same activity while logged into GNOME, this doesn't happen. Examples are dnf transactions and rsync of very large files. See https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org/thread/MRDIDUKOELOY5KBYNPFTFTY6QVBB7EOD/ . In that I mainly talk about how it affects a F37 bare metal machine with 4GB RAM, but I've seen it in a F38 VM with 2GB RAM as well. Removing the systemd-oomd-defaults package and rebooting, so no cgroups are monitored, eliminates the problem. Nominating this as a FE since the Beta and Final Release Criteria all appear to involve behavior while logged into a release-blocking DE (GNOME or KDE). It basically makes non-DE logins unusable since one can't risk doing anything significant in them due to the risk of it being aborted.

Version-Release number of selected component (if applicable):
systemd-253-1.fc38.x86_64

How reproducible:
sometimes

Comment 1 Andre Robatino 2023-03-13 16:07:17 UTC
Ther perverse thing about this behavior is that it's MORE likely when one is trying to minimize resource use by avoiding a DE. It happens just as easily when running as multiuser. No sane OOM killer should behave like that.

Comment 2 Geoffrey Marr 2023-03-13 19:30:49 UTC
Discussed during the 2023-03-13 blocker review meeting: [0]

The decision to delay the classification of this as a blocker bug was made as it's hard to make any decision here without some details of any proposed fix. We will leave this until there's actually a necessary decision to make (i.e. a proposed fix during the freeze period).

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2023-03-13/f38-blocker-review.2023-03-13-16.00.txt

Comment 3 Zbigniew Jędrzejewski-Szmek 2023-03-14 15:27:22 UTC
As a test, I tried a dnf upgrade in a VM with 4 GB RAM (3923876kB),
login via ssh.
I was monitoring memory usage with dstat while running dnf.
Memory used never seems to go above ~520 MB.
If you see a failure on a machine with 4GB, some other factor must be involved.
Your report is very sparse on details, so it's really non-actionable.

Based on the mail thread, you seem to be disabling zram on the machine.
Why? It is something designed to help with such scenarios.

dnf5 would presumably also help by loading less metadata.

Comment 4 Andre Robatino 2023-03-14 15:40:16 UTC
My F37 bare metal machine has 4GB. My F38 VirtualBox VM has 2GB. I don't know why the problem is more likely in the first case. In any case, I'm certainly not disabling zram since I don't even know how to do that. Both are clean installs with default partitioning and swap (except for preserving /home on the F37 box). Since the free command shows that I'm never using more than a small fraction of either memory or swap while these commands are running, I concluded that swap is not an issue in my case. Anyway, after removing systemd-oomd-defaults and rebooting (so systemd-oomd isn't monitoring anything), the issue is completely fixed in both cases, so I know that's the cause. I suggest you try 2GB instead, since I was able to see processes killed in multiuser that are not when run in GNOME (although it's still not as frequent as in the F37 case). I didn't see any problems either when I was using 4GB in the F38 VM.

Comment 5 Zbigniew Jędrzejewski-Szmek 2023-03-14 16:19:29 UTC
'rpm -q zram-generator-defaults' ?
'free -h' ?

How was the machine installed, via the usual installer? Maybe we're not installing zram-generator-defaults in some scenarios.

> I suggest you try 2GB instead

No change, it still works here without any issue.
(I used 'sudo dnf cache clean && sudo dnf distro-sync --disablerepo=updates-testing --disablerepo=updates-testing-modular -y' to force a non-trivial transaction.)

Comment 6 Andre Robatino 2023-03-14 16:55:34 UTC
[root@fedora ~]# rpm -q zram-generator-defaults
zram-generator-defaults-1.1.2-4.fc38.noarch
[root@fedora ~]# free -h
               total        used        free      shared  buff/cache   available
Mem:           1.9Gi       215Mi       1.0Gi       1.0Mi       690Mi       1.5Gi
Swap:          1.9Gi          0B       1.9Gi
[root@fedora ~]#

The F38 VM was installed from an early F38 Live Workstation image, using 2GB RAM. I install a large number of package groups and individual packages after installation, so my dnf update transactions are larger than they would normally be. I currently have 4108 packages installed. However, the same type of dnf transactions always work reliably when run in GNOME, and the free command shows I'm never using more than a small fraction of either memory or swap.

Comment 7 Andre Robatino 2023-03-14 17:24:41 UTC
Another thing that fails for me on the F37 machine is rsync of a single very large file ~20 GB from another machine. It would never get more than ~85% done before being killed, unless I removed systemd-oomd-defaults and rebooted. It would also work with systemd-oomd-defaults installed and monitoring, if I initiated the rsync from the F37 machine, in a GNOME session. I haven't tried this on the VM.

Comment 8 Zbigniew Jędrzejewski-Szmek 2023-03-14 17:37:17 UTC
Can you attach 'journalctl -b' from a failed run?

Comment 9 Andre Robatino 2023-03-14 17:46:20 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #8)
> Can you attach 'journalctl -b' from a failed run?

Just so I understand, you mean run "journalctl -b" on the F37 machine after a failed attempt to do the rsync to that machine? (I'm guessing that whatever is causing this, the problem seems to be the same on F37 and F38, so F37 testing should be enough.)

Comment 10 Zbigniew Jędrzejewski-Szmek 2023-03-14 17:54:45 UTC
Yes.

(FWIW, I can't get oomd to kill any processes in this small VM. Instead, the kernel oom killer is quicker. I'd like to know what exactly is happening.)

Comment 11 Andre Robatino 2023-03-14 18:36:33 UTC
Well, I reinstalled systemd-oomd-defaults and rebooted (in multiuser, to use even less resources) and was able to do both the rsync and a relatively small dnf update without seeing them killed. Normally I boot in graphical but only access via ssh. I update the large file every day so I'll leave systemd-oomd enabled and monitoring, and check every day, and attach the output of "journalctl -b" as soon as I see either the rsync or dnf killed. It was happening frequently, though it just started within the last few weeks, so it shouldn't take long to see again.

Comment 12 Andre Robatino 2023-03-14 18:42:03 UTC
BTW, the F37 machine was also a Live Workstation install, and also has zram-generator-defaults installed. Default partitioning and swap except for preserving the contents of /home by using the instructions in https://fedoraproject.org/wiki/QA:Testcase_partitioning_custom_btrfs_preserve_home .

Comment 13 Andre Robatino 2023-03-14 19:30:44 UTC
Created attachment 1950747 [details]
output of "journalctl -b" on the 2GB F38 VM

I booted the F38 VM in multiuser, logged in as root, and ran the command "dnf --allowerasing install systemd-oomd-defaults". I was logged out and I believe that was the last command and I never got the prompt back (the reason I'm uncertain is that I also lost the command history from the current login). Anyway, I immediately saved the output of "journalctl -b". I then confirmed that systemd-oomd-defaults is installed, "dnf check" shows no problems, and oomctl shows that systemd-oomd is monitoring.

Comment 14 Andre Robatino 2023-03-14 19:40:35 UTC
Just to be clear, I believe the last thing I saw before being automatically logged out was the dnf transaction occurring (it also had to remove some MATE packages that were incompatible with systemd-oomd-defaults). I immediately logged back in again and saved the output of "journalctl -b".

Comment 15 Zbigniew Jędrzejewski-Szmek 2023-03-14 22:10:32 UTC
The dnf upgrade seems to restart a bunch of services, and packagekit, NetworkManager, firewalld, setroubleshoot
are active, so there's a lot of memory pressure.

Mar 14 15:05:45 fedora kernel: Adding 1999868k swap on /dev/zram0.  Priority:100 extents:1 across:1999868k SSDscFS
...
Mar 14 15:14:13 fedora systemd-oomd[555]: Considered 5 cgroups for killing, top candidates were:
Mar 14 15:14:13 fedora audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=getty@tty1 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Mar 14 15:14:13 fedora systemd-oomd[555]:         Path: /user.slice/user-0.slice/session-1.scope
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Pressure Limit: 0.00%
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Pressure: Avg10: 63.76 Avg60: 35.93 Avg300: 11.40 Total: 39s
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Current Memory Usage: 993.8M
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Min: 0B
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Low: 0B
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Pgscan: 95168
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Last Pgscan: 95168
Mar 14 15:14:13 fedora systemd-oomd[555]:         Path: /user.slice/user-0.slice/user/init.scope
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Pressure Limit: 0.00%
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Current Memory Usage: 3.6M
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Min: 0B
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Low: 0B
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Pgscan: 0
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Last Pgscan: 0
Mar 14 15:14:13 fedora systemd-oomd[555]:         Path: /user.slice/user-0.slice/user/session.slice/dbus-broker.service
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Pressure Limit: 0.00%
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Current Memory Usage: 2.3M
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Min: 0B
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Low: 0B
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Pgscan: 128
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Last Pgscan: 128
Mar 14 15:14:13 fedora systemd-oomd[555]:         Path: /user.slice/user-0.slice/user/background.slice
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Pressure Limit: 0.00%
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Current Memory Usage: 212.0K
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Min: 0B
Mar 14 15:14:13 fedora systemd[1]: getty: Deactivated successfully.
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Low: 0B
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Pgscan: 67
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Last Pgscan: 67
Mar 14 15:14:13 fedora systemd-oomd[555]:         Path: /user.slice/user-0.slice/user/app.slice/dbus.socket
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Pressure Limit: 0.00%
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Current Memory Usage: 0B
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Min: 0B
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Memory Low: 0B
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Pgscan: 0
Mar 14 15:14:13 fedora systemd-oomd[555]:                 Last Pgscan: 0
Mar 14 15:14:13 fedora systemd-oomd[555]: Killed /user.slice/user-0.slice/session-1.scope due to memory pressure for /user.slice/user-0.slice being 67.40% > 50.00% for > 30s with reclaim activity

Hmm.

Comment 16 Andre Robatino 2023-03-14 23:27:14 UTC
Actually, the first command I ran after logging in was "dnf remove earlyoom" which finished successfully. THEN I ran "dnf --allowerasing install systemd-oomd-defaults", then was logged out before getting the prompt back. So disregard any earlyoom messages in the log, earlyoom was removed before the trouble happened. I had it installed because I want to have both the GNOME and MATE package groups installed simultaneously, but can't because some of the packages are incompatible. After removing systemd-oomd-defaults I had reinstalled the incompatible MATE packages. I hope it's eventually possible to have a single OOM killer that's compatible with all DEs.

Comment 17 Andre Robatino 2023-03-16 00:02:32 UTC
Should have mentioned that I only started seeing this problem in the last few weeks, it never happened before. Sorry I can't be more specific.

Comment 18 Adam Williamson 2023-03-21 00:04:50 UTC
Discussed at 2023-03-20 blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2023-03-20/f38-blocker-review.2023-03-20-16.00.html . This was accepted as a freeze exception issue on the basis that it could well affect first update after install, so it cannot be fully addressed with an update.

Comment 19 Andre Robatino 2023-03-21 18:00:14 UTC
Created attachment 1952472 [details]
output of "journalctl -b" on  4GB F37 bare metal

I was unable to reproduce the logout in F37 when rsyncing a very large file, but I got it six times while doing a dnf update one letter at a time. First, I updated systemd from 251.13-5 to 251.13-6 (which was successful) but did not reboot, so am not sure which version was tested. Then I basically did dnf update one letter at a time (using RPMs copied from another machine), and got six logouts total, which all pretty much look the same in the log. They usually happen just after the cleanups, but before the verifies, and "dnf check" had no output for any of them, but there could be unfinished scriptlets.

Comment 20 Zbigniew Jędrzejewski-Szmek 2023-03-21 19:14:32 UTC
We discussed this issue in our upstream meeting. Unfortunately we didn't have Anita who knows the most about oomd.
I'll try to figure out what is going on. https://github.com/systemd/systemd/issues/26922 is a part of the puzzle.

Comment 21 Anita Zhang 2023-03-24 05:46:42 UTC
In F37 and prior the config was killing based on swap and pressure on user-*.slice/user@.service. In https://src.fedoraproject.org/fork/anitazha/rpms/systemd/c/7665e1796f915dedbf8e014f0a78f4f576d609bb it was changed to pressure only on system.slice and all slices under user.slice. The relevant point here is that this change now includes user-*.slice/session-*.scope which is the critical session bits you're seeing killed here.

That session scope should be omitted. I'll put up another PR to update the config. The config that I intended with the initial PR was for all slices under user.slice/user-*.slice/user@.service to be monitored, not for all slices under user.slice. Sorry for the mistake here.

Comment 22 Fedora Update System 2023-03-29 20:52:41 UTC
FEDORA-2023-d6cccb2d11 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-d6cccb2d11

Comment 23 Fedora Update System 2023-03-29 20:55:12 UTC
FEDORA-2023-d6cccb2d11 has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 24 Fedora Update System 2023-03-29 20:57:35 UTC
FEDORA-2023-466bb274db has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-466bb274db

Comment 25 Adam Williamson 2023-03-29 22:42:08 UTC
So, it's great to tweak this for F38, but...a lot of the recent testing/feedback Andre did was on F37. Which doesn't track with there only being a problem in F38, introduced by a change between 37 and 38.

Can you take a closer look at Andre's feedback and logs from F37 test scenarios and see if there is anything we can do on F37 as well?

Comment 26 Andre Robatino 2023-03-29 22:58:07 UTC
As I recall, I first noticed it a few weeks ago on F37, where it happens more often, then not long after noticed it in my F38 VM, but not happening as often, despite having less RAM in F38. I know there are different systemd versions in F37 and F38 but it should be easy to check when Anita's change went into each version. I have the new version in my F38 VM and will give karma after running at least 1 or 2 large dnf transactions. Will also check F37 when there's a build for that.

Comment 27 Adam Williamson 2023-03-29 23:09:21 UTC
Ahh. I was reading Anita's comment as meaning the change she mentioned only went to F38+, but it looks like it *did* also go to F37 as an update.

Best I can tell it landed in systemd-251.5-607.fc37 , which was https://bodhi.fedoraproject.org/updates/FEDORA-2022-7cf1869073 , which went stable last October, quite some time ago...

Comment 28 Andre Robatino 2023-03-29 23:16:07 UTC
I can't explain that. I suddenly started seeing it happen frequently in F37, a lot more recently than last October.

Comment 29 Fedora Update System 2023-03-30 02:20:36 UTC
FEDORA-2023-466bb274db has been pushed to the Fedora 38 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-466bb274db

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 30 Zbigniew Jędrzejewski-Szmek 2023-03-30 07:25:28 UTC
Yes, the change was also backported to F37.
If the latest update helps in F38, I'll copy the changes over to F37 too.

Comment 31 Fedora Update System 2023-03-31 17:53:14 UTC
FEDORA-2023-fa5bdca92d has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2023-fa5bdca92d

Comment 32 Fedora Update System 2023-04-01 00:17:14 UTC
FEDORA-2023-466bb274db has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 33 Fedora Update System 2023-04-01 02:16:59 UTC
FEDORA-2023-fa5bdca92d has been pushed to the Fedora 37 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-fa5bdca92d`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-fa5bdca92d

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 34 Fedora Update System 2023-04-02 02:00:56 UTC
FEDORA-2023-fa5bdca92d has been pushed to the Fedora 37 stable repository.
If problem still persists, please make note of it in this bug report.