Bug 1929856

Summary: systemd-oomd kills anaconda in the middle of system install when installing from KDE live image with 2GB RAM
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: systemdAssignee: Michel Lind <michel>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 34CC: bugzilla, fedoraproject, filbranden, flepied, geraldo.simiao.kutz, kasong, lnykryn, michel, msekleta, robatino, ssahani, s, systemd-maint, the.anitazha, yuwatana, zbyszek, z
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: openqa
Fixed In Version: systemd-oomd-defaults-247.3-3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-19 08:01:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1829022, 1913794    
Attachments:
Description Flags
journalctl
none
oomctl data when invoking liveinst none

Description Adam Williamson 2021-02-17 19:15:01 UTC
All openQA install tests on the KDE live image failed in today's Branched. In all cases, the logs show it's because anaconda was killed by systemd-oomd.

Note how anaconda runs on the live images, as it may matter, I don't know: it uses consolehelper, so the process starts by running `liveinst` as a *regular user*, which eventually results via consolehelper machinations in anaconda running as *root*.

The system journal shows this:

Feb 17 09:27:25 localhost-live systemd-oomd[915]: Memory pressure for /user.slice/user-1000.slice/user is greater than 4 for more than 10 seconds and there was reclaim activity

The user journal shows this:

Feb 17 09:27:25 localhost-live systemd[1271]: app-liveinst-675e1db00e51458ea9b5b7462b78d4fb.scope: systemd-oomd killed 74 process(es) in this unit.
Feb 17 09:27:26 localhost-live systemd[1271]: app-liveinst-675e1db00e51458ea9b5b7462b78d4fb.scope: Succeeded.
Feb 17 09:27:26 localhost-live systemd[1271]: app-liveinst-675e1db00e51458ea9b5b7462b78d4fb.scope: Consumed 20.409s CPU time.

The test runs with 2GB of RAM. Granted that's a bit low, but it's not unusual for VMs, and the install has not failed due to memory pressure before (except for the issue with debug kernels and KASAN last month). I can't actually find our 'official' system requirements anywhere any more, but AFAIK last time I checked, 2GB of RAM was mentioned. I don't think "installer suddenly killed by oomd" is a reasonable outcome for a 2GB install attempt.

Comment 1 Adam Williamson 2021-02-17 19:16:04 UTC
Proposing as a Beta blocker as a conditional violation of "The installer must be able to complete an installation using any supported locally connected storage interface" (and all other criteria that imply successful installation), when attempted with 2GB of RAM or less from the KDE live.

Comment 2 Chris Murphy 2021-02-17 20:44:57 UTC
https://docs.fedoraproject.org/en-US/fedora/rawhide/release-notes/welcome/Hardware_Overview/#hardware_overview-specs
Minimum System Configuration
1GHz or faster processor
2GB System Memory
10GB unallocated drive space

However, there's a 'low memory installations' section that proposes both "less than 768MB of system memory" and "less than 1GB of memory".

https://getfedora.org/en/workstation/download/
* Fedora requires a minimum of 20GB disk, 2GB RAM, to install and run successfully. Double those amounts is recommended.

I think that'd be consistent with KDE too. 

If it's not possible then (a) oomd should disable itself with a suitable message, and/or (b) we need to increase the minimum memory requirements.

Comment 3 Geraldo SimiĆ£o 2021-02-17 21:07:51 UTC
It installed normally with 4Gb RAM and 4CPUs on KVM/qemu virt-manager install.

Comment 4 Chris Murphy 2021-02-17 21:55:32 UTC
Created attachment 1757662 [details]
journalctl

$ journalctl --no-hostname -o short-monotonic --no-pager

This is in a qemu-kvm, 2G RAM. The swap (on zram) is ~20% full at the time of the kill. It's strictly a memory pressure kill.

There are two kill events:

[  325.044689] systemd-oomd[993]: Memory pressure for /user.slice/user-1000.slice/user is greater than 4 for more than 10 seconds and there was reclaim activity
[  325.057782] systemd[1313]: plasma-plasmashell.service: systemd-oomd killed 69 process(es) in this unit.
[  340.059781] systemd-oomd[993]: Memory pressure for /user.slice/user-1000.slice/user is greater than 4 for more than 10 seconds and there was reclaim activity
[  340.108661] systemd[1313]: app-org.kde.korgac-autostart.service: systemd-oomd killed 65 process(es) in this unit.

Comment 5 Michel Lind 2021-02-18 00:24:11 UTC
Created attachment 1757672 [details]
oomctl data when invoking liveinst

cmurf provided this data that we're using to tweak oomd's settings

Comment 6 Michel Lind 2021-02-18 00:25:18 UTC
So the data shows memory pressure peaking at just under 9% right before oomd kills anaconda. We verified that bumping the threshold to 10% allows Anaconda to complete the installation

https://src.fedoraproject.org/rpms/systemd/pull-request/51 with this change

Comment 7 Michel Lind 2021-02-19 08:01:40 UTC
Should be fixed in https://bodhi.fedoraproject.org/updates/FEDORA-2021-dadbaaac54