Bug 1907030 - "dnf update" runs out of memory on swapless machines with 1G or less of RAM [NEEDINFO]
Summary: "dnf update" runs out of memory on swapless machines with 1G or less of RAM
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: distribution
Version: rawhide
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: Matthew Miller
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: RejectedBlocker https://ask.fedorapro...
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2020-12-12 13:14 UTC by David Tonhofer
Modified: 2024-03-27 12:34 UTC (History)
55 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:
bcotton: fedora_prioritized_bug-
pbrobinson: needinfo? (jmracek)
awilliam: needinfo? (davdunc)


Attachments (Terms of Use)
/var/log/messages for oom killing dnf (20.59 KB, text/plain)
2022-04-02 15:56 UTC, Yves Dorfsman
no flags Details
repodiff between good and bad composes (30.47 KB, text/plain)
2022-08-03 03:32 UTC, Daniel Alley
no flags Details

Description David Tonhofer 2020-12-12 13:14:14 UTC
Description of problem:

On a "Fedora 33 Cloud" instance on a:

Bezosville "t4g.nano" machine (ARM Graviton, 0.5 GiB of RAM) (https://aws.amazon.com/ec2/instance-types/)

Immediately after the image has come up, run "dnf update"

dnf gets killed by oom-kill due to high memory usage (remember when we had workstation with 64 MiB of RAM 😂). "free" actually lists only 423 MiB free total memory.

==
[ 1203.399002] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-1.scope,task=dnf,pid=1022,uid=0

[ 1203.401135] Out of memory: Killed process 1022 (dnf) total-vm:371088kB, anon-rss:287848kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:736kB oom_score_adj:0

[ 1203.426932] oom_reaper: reaped process 1022 (dnf), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
==

On the command line, we see:

- dnf starts
- contacts the update servers
- waits a few seconds
- then gets killed

The solution is to add a swap disk:

---
dd if=/dev/zero of=/mnt/2GB.swap count=2048 bs=1024K
mkswap /mnt/2GB.swap
chmod 600 /mnt/2GB.swap
swapon /mnt/2GB.swap
---

Then "dnf update" works nicely.

So I don't know what the requirement for dnf is regarding RAM, but this seems like a lot nevertheless.

Comment 1 Viktor Ashirov 2021-05-11 13:06:59 UTC
FWIW, there was a similar issue in F26-27: https://bugzilla.redhat.com/show_bug.cgi?id=1432219
It was fixed in F28 with dnf-2.7.5 and libdnf-0.11.1.

But in F29 it started again:
May 11 12:40:27 dnf-f29 kernel: Out of memory: Kill process 689 (dnf) score 748 or sacrifice child
May 11 12:40:27 dnf-f29 kernel: Killed process 689 (dnf) total-vm:657152kB, anon-rss:367228kB, file-rss:0kB, shmem-rss:0kB

# rpm -q dnf libdnf 
dnf-4.0.4-1.fc29.noarch
libdnf-0.22.0-6.fc29.x86_64

And since then it doesn't work on a VM with 512MB RAM (I've tried on all releases F26-F34).
It's not aarch64 only, but on x86_64 too.

Comment 2 Ben Cotton 2021-11-04 14:02:47 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 3 Ben Cotton 2021-11-04 14:31:59 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 4 Ben Cotton 2021-11-04 15:29:41 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Ben Cotton 2022-02-08 21:34:22 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 36 development cycle.
Changing version to 36.

Comment 6 Yves Dorfsman 2022-04-02 15:56:23 UTC
Created attachment 1870204 [details]
/var/log/messages for oom killing dnf

Comment 7 Yves Dorfsman 2022-04-02 15:59:13 UTC
Comment on attachment 1870204 [details]
/var/log/messages for oom killing dnf

Same issue with Fedora 35 on a brand new 512 MB VM with just the OS, OOM killer kills both dnf and the shell (ends up back to the login prompt).

Comment 8 willcoe 2022-07-24 13:35:20 UTC
It seems it is the updates repo that causes it to OOM, specifically generating the solvx:

==
strace -e trace=open,openat,connect,accept dnf info time

... snip

openat(AT_FDCWD, "/var/cache/dnf/updates-updateinfo.solvx", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/var/cache/dnf/updates-7eea87b22825bc0d/repodata/4df02c81ae2b639369ac317d5d6ab3ae85a62078fe2b4161cb75071452eb6c07-updateinfo.xml.zck", O_RDONLY) = 8
+++ killed by SIGKILL +++
Killed
==

Disabling zchunk metadata it fails in the same place:

==
openat(AT_FDCWD, "/var/cache/dnf/updates-updateinfo.solvx", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/var/cache/dnf/updates-7eea87b22825bc0d/repodata/9028d5cb3daf3d84a04189690e97356dc0b0bb75c4392bbbd03d8b026d0cdaf4-updateinfo.xml.xz", O_RDONLY) = 8
+++ killed by SIGKILL +++
Killed
==

dmesg
==
[ 1729.518235] Out of memory: Killed process 1665 (dnf) total-vm:1272448kB, anon-rss:635436kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1448kB oom_score_adj:0
==

On a larger machine it takes around 800MB of RAM to complete.

Comment 9 Dusty Mabe 2022-08-01 11:41:26 UTC
We are hitting this in the Fedora CoreOS CI. Fedora releng recently just pushed an update to the f36 image [1] and now our tests that run on VMs with 1G of RAM don't make it. If I just run `dnf update on these machines it gets OOM killed.

The command I run is somthing like:

```
podman run -it registry.fedoraproject.org/fedora:36 dnf update -y
```


The journal shows us:

```
Aug 01 11:27:50 cosa-devsh reverent_wescoff[1893]: [65B blob data]
Aug 01 11:27:54 cosa-devsh reverent_wescoff[1893]: [2.0K blob data]
Aug 01 11:28:07 cosa-devsh reverent_wescoff[1893]: [945B blob data]
Aug 01 11:28:09 cosa-devsh reverent_wescoff[1893]: [1.1K blob data]
Aug 01 11:28:10 cosa-devsh reverent_wescoff[1893]: [945B blob data]
Aug 01 11:28:15 cosa-devsh kernel: dnf invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Aug 01 11:28:15 cosa-devsh kernel: CPU: 1 PID: 1911 Comm: dnf Not tainted 5.18.13-200.fc36.x86_64 #1
Aug 01 11:28:15 cosa-devsh kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
Aug 01 11:28:15 cosa-devsh kernel: Call Trace:
Aug 01 11:28:15 cosa-devsh kernel:  <TASK>
Aug 01 11:28:15 cosa-devsh kernel:  dump_stack_lvl+0x44/0x5c
Aug 01 11:28:15 cosa-devsh kernel:  dump_header+0x4a/0x1ff
Aug 01 11:28:15 cosa-devsh kernel:  oom_kill_process.cold+0xb/0x10
Aug 01 11:28:15 cosa-devsh kernel:  out_of_memory+0x1be/0x4f0
Aug 01 11:28:15 cosa-devsh kernel:  __alloc_pages_slowpath.constprop.0+0xc3c/0xcf0
Aug 01 11:28:15 cosa-devsh kernel:  __alloc_pages+0x1e7/0x210
Aug 01 11:28:15 cosa-devsh kernel:  folio_alloc+0x17/0x50
Aug 01 11:28:15 cosa-devsh kernel:  __filemap_get_folio+0x175/0x420
Aug 01 11:28:15 cosa-devsh kernel:  filemap_fault+0x151/0x980
Aug 01 11:28:15 cosa-devsh kernel:  __do_fault+0x36/0x130
Aug 01 11:28:15 cosa-devsh kernel:  __handle_mm_fault+0xdaf/0x1400
Aug 01 11:28:15 cosa-devsh kernel:  ? __pv_queued_spin_lock_slowpath+0x156/0x2b0
Aug 01 11:28:15 cosa-devsh kernel:  handle_mm_fault+0xae/0x280
Aug 01 11:28:15 cosa-devsh kernel:  do_user_addr_fault+0x1c5/0x670
Aug 01 11:28:15 cosa-devsh kernel:  ? kvm_read_and_reset_apf_flags+0x3f/0x60
Aug 01 11:28:15 cosa-devsh kernel:  exc_page_fault+0x70/0x170
Aug 01 11:28:15 cosa-devsh kernel:  asm_exc_page_fault+0x21/0x30
Aug 01 11:28:15 cosa-devsh kernel: RIP: 0033:0x7f078dc68470
Aug 01 11:28:15 cosa-devsh kernel: Code: Unable to access opcode bytes at RIP 0x7f078dc68446.
Aug 01 11:28:15 cosa-devsh kernel: RSP: 002b:00007ffd90059c18 EFLAGS: 00010202
Aug 01 11:28:15 cosa-devsh kernel: RAX: 000000000054a7ff RBX: 0000557acf1273c0 RCX: 0000557acbcf80a0
Aug 01 11:28:15 cosa-devsh kernel: RDX: 0000000000000034 RSI: 0000557acbcf80a0 RDI: 0000557ad0e8435d
Aug 01 11:28:15 cosa-devsh kernel: RBP: 0000000000000034 R08: 000000000054a6d1 R09: 0000557ac98b0470
Aug 01 11:28:15 cosa-devsh kernel: R10: 0000000000000000 R11: eae92cfea8765b52 R12: 0000557acbcf80a0
Aug 01 11:28:15 cosa-devsh kernel: R13: 00000000fffe5dc3 R14: 0000000000000034 R15: 0000557ac98b0470
Aug 01 11:28:15 cosa-devsh kernel:  </TASK>
Aug 01 11:28:15 cosa-devsh kernel: Mem-Info:
Aug 01 11:28:15 cosa-devsh kernel: active_anon:401 inactive_anon:199213 isolated_anon:0
                                    active_file:12 inactive_file:3 isolated_file:0
                                    unevictable:0 dirty:0 writeback:0
                                    slab_reclaimable:9166 slab_unreclaimable:16852
                                    mapped:325 shmem:830 pagetables:1152 bounce:0
                                    kernel_misc_reclaimable:0
                                    free:11864 free_pcp:121 free_cma:0
Aug 01 11:28:15 cosa-devsh kernel: Node 0 active_anon:1604kB inactive_anon:796852kB active_file:48kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1300kB dirty:0kB writeback:0kB shmem:3320kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:4272kB pagetables:4608kB all_unreclaimable? yes
Aug 01 11:28:15 cosa-devsh kernel: Node 0 DMA free:4176kB boost:0kB min:760kB low:948kB high:1136kB reserved_highatomic:0KB active_anon:0kB inactive_anon:11000kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Aug 01 11:28:15 cosa-devsh kernel: lowmem_reserve[]: 0 854 854 854 854
Aug 01 11:28:15 cosa-devsh kernel: Node 0 DMA32 free:43280kB boost:0kB min:43480kB low:54348kB high:65216kB reserved_highatomic:0KB active_anon:1604kB inactive_anon:785696kB active_file:400kB inactive_file:136kB unevictable:0kB writepending:0kB present:1031988kB managed:967732kB mlocked:0kB bounce:0kB free_pcp:484kB local_pcp:196kB free_cma:0kB
Aug 01 11:28:15 cosa-devsh kernel: lowmem_reserve[]: 0 0 0 0 0
Aug 01 11:28:15 cosa-devsh kernel: Node 0 DMA: 2*4kB (UM) 2*8kB (UM) 2*16kB (UM) 1*32kB (M) 2*64kB (UM) 1*128kB (M) 1*256kB (U) 1*512kB (U) 1*1024kB (M) 1*2048kB (M) 0*4096kB = 4184kB
Aug 01 11:28:15 cosa-devsh kernel: Node 0 DMA32: 1320*4kB (UME) 770*8kB (UME) 405*16kB (UME) 314*32kB (UME) 151*64kB (UME) 50*128kB (UME) 2*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44544kB
Aug 01 11:28:15 cosa-devsh kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Aug 01 11:28:15 cosa-devsh kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Aug 01 11:28:15 cosa-devsh kernel: 905 total pagecache pages
Aug 01 11:28:15 cosa-devsh kernel: 0 pages in swap cache
Aug 01 11:28:15 cosa-devsh kernel: Swap cache stats: add 0, delete 0, find 0/0
Aug 01 11:28:15 cosa-devsh kernel: Free swap  = 0kB
Aug 01 11:28:15 cosa-devsh kernel: Total swap = 0kB
Aug 01 11:28:15 cosa-devsh kernel: 261995 pages RAM
Aug 01 11:28:15 cosa-devsh kernel: 0 pages HighMem/MovableOnly
Aug 01 11:28:15 cosa-devsh kernel: 16222 pages reserved
Aug 01 11:28:15 cosa-devsh kernel: 0 pages cma reserved
Aug 01 11:28:15 cosa-devsh kernel: 0 pages hwpoisoned
Aug 01 11:28:15 cosa-devsh kernel: Tasks state (memory values in pages):
Aug 01 11:28:15 cosa-devsh kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Aug 01 11:28:15 cosa-devsh kernel: [   1287]     0  1287    10667      313   102400        0          -250 systemd-journal
Aug 01 11:28:15 cosa-devsh kernel: [   1301]     0  1301     8698      936    94208        0         -1000 systemd-udevd
Aug 01 11:28:15 cosa-devsh kernel: [   1368]   990  1368     5913      883    94208        0             0 systemd-resolve
Aug 01 11:28:15 cosa-devsh kernel: [   1370]     0  1370     4341      229    69632        0             0 systemd-userdbd
Aug 01 11:28:15 cosa-devsh kernel: [   1371]     0  1371     4505      240    73728        0             0 systemd-userwor
Aug 01 11:28:15 cosa-devsh kernel: [   1372]     0  1372     4505      239    73728        0             0 systemd-userwor
Aug 01 11:28:15 cosa-devsh kernel: [   1373]     0  1373     4505      241    77824        0             0 systemd-userwor
Aug 01 11:28:15 cosa-devsh kernel: [   1376]     0  1376    63604      609   126976        0             0 NetworkManager
Aug 01 11:28:15 cosa-devsh kernel: [   1387]     0  1387    19811       64    57344        0             0 irqbalance
Aug 01 11:28:15 cosa-devsh kernel: [   1390]     0  1390     7119      338    90112        0             0 journalctl
Aug 01 11:28:15 cosa-devsh kernel: [   1421]     0  1421     4457      257    77824        0             0 systemd-homed
Aug 01 11:28:15 cosa-devsh kernel: [   1426]     0  1426     4978      574    81920        0             0 systemd-logind
Aug 01 11:28:15 cosa-devsh kernel: [   1428]   994  1428    21285      148    65536        0             0 chronyd
Aug 01 11:28:15 cosa-devsh kernel: [   1444]    81  1444     2700      196    61440        0          -900 dbus-broker-lau
Aug 01 11:28:15 cosa-devsh kernel: [   1445]    81  1445     1353      157    49152        0          -900 dbus-broker
Aug 01 11:28:15 cosa-devsh kernel: [   1448]   981  1448   240262      290   184320        0             0 zincati
Aug 01 11:28:15 cosa-devsh kernel: [   1466]     0  1466    84275      815   167936        0             0 rpm-ostree
Aug 01 11:28:15 cosa-devsh kernel: [   1494]   999  1494   748324      727   253952        0             0 polkitd
Aug 01 11:28:15 cosa-devsh kernel: [   1731]     0  1731     3909      318    69632        0         -1000 sshd
Aug 01 11:28:15 cosa-devsh kernel: [   1748]     0  1748     3309      312    61440        0             0 login
Aug 01 11:28:15 cosa-devsh kernel: [   1749]     0  1749     3393      313    65536        0             0 login
Aug 01 11:28:15 cosa-devsh kernel: [   1754]  1000  1754     5566      820    86016        0           100 systemd
Aug 01 11:28:15 cosa-devsh kernel: [   1756]  1000  1756     7080     1526    94208        0           100 (sd-pam)
Aug 01 11:28:15 cosa-devsh kernel: [   1763]     0  1763     4489      471    81920        0             0 sshd
Aug 01 11:28:15 cosa-devsh kernel: [   1765]  1000  1765     1403      361    53248        0             0 bash
Aug 01 11:28:15 cosa-devsh kernel: [   1769]  1000  1769     1325      356    49152        0             0 bash
Aug 01 11:28:15 cosa-devsh kernel: [   1807]  1000  1807     4489      471    81920        0             0 sshd
Aug 01 11:28:15 cosa-devsh kernel: [   1808]  1000  1808     1323      368    45056        0             0 bash
Aug 01 11:28:15 cosa-devsh kernel: [   1829]  1000  1829   371414     3770   286720        0             0 podman
Aug 01 11:28:15 cosa-devsh kernel: [   1838]  1000  1838   464186    12220   417792        0             0 podman
Aug 01 11:28:15 cosa-devsh kernel: [   1842]  1000  1842      273        1    28672        0             0 catatonit
Aug 01 11:28:15 cosa-devsh kernel: [   1863]  1000  1863     2597       85    57344        0           200 dbus-broker-lau
Aug 01 11:28:15 cosa-devsh kernel: [   1864]  1000  1864     1221       37    49152        0           200 dbus-broker
Aug 01 11:28:15 cosa-devsh kernel: [   1887]  1000  1887     1907      490    49152        0             0 slirp4netns
Aug 01 11:28:15 cosa-devsh kernel: [   1893]  1000  1893    20441       83    61440        0             0 conmon
Aug 01 11:28:15 cosa-devsh kernel: [   1896]  1000  1896     1205      140    45056        0             0 bash
Aug 01 11:28:15 cosa-devsh kernel: [   1911]  1000  1911   239442   168578  1531904        0             0 dnf
Aug 01 11:28:15 cosa-devsh kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user/user.slice/libpod-6731617db3bbd19034611224dfb9e5d3b6d5a16c800ee137f616e932b529b3af.scope/container,task=dnf,pid=1911,uid=1000
Aug 01 11:28:15 cosa-devsh kernel: Out of memory: Killed process 1911 (dnf) total-vm:957768kB, anon-rss:674312kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:1496kB oom_score_adj:0
Aug 01 11:28:15 cosa-devsh systemd[1]: user: A process of this unit has been killed by the OOM killer.
Aug 01 11:28:15 cosa-devsh systemd[1754]: libpod-6731617db3bbd19034611224dfb9e5d3b6d5a16c800ee137f616e932b529b3af.scope: A process of this unit has been killed by the OOM killer.
Aug 01 11:28:15 cosa-devsh reverent_wescoff[1893]: Killed
```

This container has dnf-4.13.0-1.fc36.noarch


[1] https://pagure.io/releng/issue/10935

Comment 10 Fedora Blocker Bugs Application 2022-08-01 12:25:59 UTC
Proposed as a Blocker for 37-beta by Fedora user pbrobinson using the blocker tracking app because:

 This is causing issues with composes, and a bunch of popular arm devices as well as memory constrained cloud instances.

Comment 11 Dusty Mabe 2022-08-01 13:27:03 UTC
OK. I did a little more investigation. It almost appears to be a combination of things. I went back to the old Fedora 36 container from before the most recent update [1] (I pushed this container to `quay.io/dustymabe/fedora:36` if anyone is interested testing with it).

This container has:

```
[root@ebff6c88c346 /]# rpm -q dnf
dnf-4.11.1-2.fc36.noarch
```

but I still can't `dnf update -y`. I still get an OOM. I almost wonder if this is somehow an issue with the updates repo metadata being changed in some way that DNF can't handle in a graceful way.

Comment 12 Dusty Mabe 2022-08-01 13:27:56 UTC
[1] https://pagure.io/releng/issue/10935

Comment 13 Dusty Mabe 2022-08-01 13:47:05 UTC
Ok I think I'm right.. This works fine:

```
podman run -it registry.fedoraproject.org/fedora:36

dnf update -y --disablerepo=updates --repofrompath=pungi0730,https://kojipkgs.fedoraproject.org/compose/updates/Fedora-36-updates-20220730.0/compose/Everything/x86_64/os
```

This gets an OOM:

```
podman run -it registry.fedoraproject.org/fedora:36

dnf update -y --disablerepo=updates --repofrompath=pungi0731,https://kojipkgs.fedoraproject.org/compose/updates/Fedora-36-updates-20220731.0/compose/Everything/x86_64/os
```

Is there something up with the repos starting with the 0731 compose?

Comment 14 Vratislav Podzimek 2022-08-02 12:41:43 UTC
Another reproducer:

```
$ vagrant init fedora/36-cloud-base
$ vagrant up
$ vagrant ssh -c "sudo dnf install openssl-devel"
Fedora 36 - x86_64                                                                                                                                                                 3.3 MB/s |  81 MB     00:24    
Fedora 36 openh264 (From Cisco) - x86_64                                                                                                                                           710  B/s | 2.5 kB     00:03    
Fedora Modular 36 - x86_64                                                                                                                                                         2.2 MB/s | 2.4 MB     00:01    
Fedora 36 - x86_64 - Updates                                                                                                                                                       3.6 MB/s |  24 MB     00:06    
Connection to 192.168.121.178 closed by remote host.
Connection to 192.168.121.178 closed.
```

IOW, this also affects the Fedora 36 Cloud Base image provided as a Vagrant box. And it is not aarch64 specific.

Comment 15 Daniel Alley 2022-08-03 03:32:53 UTC
Created attachment 1903167 [details]
repodiff between good and bad composes

Comment 16 Jaroslav Mracek 2022-08-08 13:01:50 UTC
I am sorry but we cannot do much here. The requirements for memory is related to a metadata size for the particular repository. DNF downloads particular repository then it converts data to internal format (libsolv) (in RAM) and then it stores processed data to the disk and frees the memory. If repository if big it requires more RAM then for the small one.

How the problem can be resolved?
1. Use only small repositories - Distribution can resolve the issue
2. Make loading file list optional - we will resolve it in the next generation of software management tool - DNF5/LIBDNF5 - RFE for Fedora 38+
3. Change libsolv to use RAM more efficiently - nice dream because it handles a lot of data
4. Delivery less files, provides, and requires in RPMs > smaller repositories, faster resolve of dependencies, requires less RAM to process. Right now it is permanently growing
5. Use compression of metadata that requires less RAM to decompress - Distribution can resolve the issue

Is it an issue?
The minimal requirement for Fedora is 2 GB of RAM.

Comment 17 Dusty Mabe 2022-08-08 13:39:39 UTC
(In reply to Jaroslav Mracek from comment #16)
>
> Is it an issue?
> The minimal requirement for Fedora is 2 GB of RAM.

I think that depends on who you ask. For example our Fedora containers (the default one) ships dnf. I think it would be sad to require 2G per container if the container wants to (for whatever reason) use dnf.

Comment 18 Peter Robinson 2022-08-08 15:42:13 UTC
(In reply to Jaroslav Mracek from comment #16)
> I am sorry but we cannot do much here. The requirements for memory is
> related to a metadata size for the particular repository. DNF downloads
> particular repository then it converts data to internal format (libsolv) (in
> RAM) and then it stores processed data to the disk and frees the memory. If
> repository if big it requires more RAM then for the small one.

Can we just ship the libsolv database as part of the repo metadata so each machine doesn't have to process this locally?

> How the problem can be resolved?
> 1. Use only small repositories - Distribution can resolve the issue

How do you suggest the "distribution" does this?

> 2. Make loading file list optional - we will resolve it in the next
> generation of software management tool - DNF5/LIBDNF5 - RFE for Fedora 38+

The original yum used to only load this data when doing operations that required it rather than all the time, this makes sense, you say "f-38+" so is it 38 or is it some time in the future?

> 3. Change libsolv to use RAM more efficiently - nice dream because it
> handles a lot of data
> 4. Delivery less files, provides, and requires in RPMs > smaller
> repositories, faster resolve of dependencies, requires less RAM to process.
> Right now it is permanently growing

I don't think this is a reasonable request for a core tool used by distributions.

> 5. Use compression of metadata that requires less RAM to decompress -
> Distribution can resolve the issue

Please provide details of this

> The minimal requirement for Fedora is 2 GB of RAM.

Where is that documented? The installer required 1.5Gb, but it's not the only way to deploy Fedora.

Comment 19 Daniel Mach 2022-08-09 20:14:40 UTC
(In reply to Peter Robinson from comment #18)
> (In reply to Jaroslav Mracek from comment #16)
> > I am sorry but we cannot do much here. The requirements for memory is
> > related to a metadata size for the particular repository. DNF downloads
> > particular repository then it converts data to internal format (libsolv) (in
> > RAM) and then it stores processed data to the disk and frees the memory. If
> > repository if big it requires more RAM then for the small one.
> 
> Can we just ship the libsolv database as part of the repo metadata so each
> machine doesn't have to process this locally?

I don't think this is a good idea.
The solv files are an internal cache.
They may even suffer from problems with endianess.

> 
> > How the problem can be resolved?
> > 1. Use only small repositories - Distribution can resolve the issue
> 
> How do you suggest the "distribution" does this?

Distribution can influence (by packaging policy) what goes into
Provides/Requires/Conflicts/Obsoletes and weak deps, and encourage maintainers
to drop records that are no longer needed because they e.g. obsolete packages
from a version of distro that is no longer relevant. 
Also, trimming changelogs helps in some cases (already done in Fedora).
But all this doesn't improve the situation too much.
Repodata is usually big and we all need to live with it
(although it's still important to write good code and keep repos reasonably sized).

> 
> > 2. Make loading file list optional - we will resolve it in the next
> > generation of software management tool - DNF5/LIBDNF5 - RFE for Fedora 38+
> 
> The original yum used to only load this data when doing operations that
> required it rather than all the time, this makes sense, you say "f-38+" so
> is it 38 or is it some time in the future?

YUM used sqlite3 repodata, lazily loading data on demand.
Libsolv works differently and the underlying data structures are designed according to that.
Moving back to the old sqlite3 format is not realistic.
I can imagine designing a new database backend that would work with libsolv,
but it would come with downsides - first of all the required disk space, because
it must remain uncompressed to achieve reasonable speed.

> 
> > 3. Change libsolv to use RAM more efficiently - nice dream because it
> > handles a lot of data
I measured used memory when waiting for transaction confirmation in fedora:36 container,
2nd run when solv files were on disk already.
microdnf: 100M
dnf5: 145M
dnf5 from master (no filelists loaded by default): 110M
dnf4: 147M
dnf4 with filelist loading disabled (a HACK): 138M
zypper: 164M

I also find these numbers quite reasonable, but there is still some space for improvement.
I'm more worried about the spikes that happen during the repo loading, which may be the problem that this bug is about.
The problem seem to be in how libsolv is used across various dnf implementations rather than in libsolv itself:

microdnf: 465M
dnf5: 700M
dnf5 from master (no filelists loaded by default): 195M
dnf4: 680M
dnf4 with filelist loading disabled (a HACK): 620M
zypper: 225M

> > 4. Delivery less files, provides, and requires in RPMs > smaller
> > repositories, faster resolve of dependencies, requires less RAM to process.
> > Right now it is permanently growing
> 
> I don't think this is a reasonable request for a core tool used by
> distributions.
> 
> > 5. Use compression of metadata that requires less RAM to decompress -
> > Distribution can resolve the issue
> 
> Please provide details of this
> 
Distributions can tweak compression parameters, but createrepo should come with reasonable defaults
which are a good compromise between speed, compression ratio and decompression memory requirements.

> > The minimal requirement for Fedora is 2 GB of RAM.
> 
> Where is that documented? The installer required 1.5Gb, but it's not the
> only way to deploy Fedora.

Requiring more than installer makes no sense to me, because installer == DNF, GUI and much more.
Ok, users can add more repos after installation, but still...

Comment 20 Kamil Páral 2022-08-15 15:40:03 UTC
There's a blocker discussion ticket here:
https://pagure.io/fedora-qa/blocker-review/issue/841

Affected teams' representatives can describe the impact and vote on a release blocker status in there.

Comment 21 Geoffrey Marr 2022-08-22 21:19:37 UTC
Discussed during the 2022-08-22 blocker review meeting: [0]

The decision to delay the classification of this as a blocker bug was made as this is a difficult call as it really depends on a subjective evaluation of how much RAM we're comfortable with requiring for basic packaging operations on a minimal Fedora environment. We will solicit input from various teams and re-consider this at a later time.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2022-08-22/f37-blocker-review.2022-08-22-16.01.txt

Comment 22 Adam Williamson 2022-08-23 00:36:03 UTC
On further consideration, as people are hitting this on F36, it seems odd to me to suggest blocking F37 on it. It's not an F37 bug, and blocking the F37 release on it wouldn't really help anyone, because if we don't ship F37 the only alternative we're offering is "use F36 instead", and the bug affects F36.

So, I'm gonna nominate this as a prioritized bug instead, and suggest at the next blocker review that we drop it as a proposed blocker. It certainly seems like potentially a major problem that we can't do dnf operations with 1G or less of RAM, but it doesn't feel like the release blocker process is the way to handle it.

Comment 23 Harald Reindl 2022-08-23 15:38:44 UTC
i had this on a F35 VM with way more than 0.5 GB RAM and updates-testing enabled

> YUM used sqlite3 repodata, lazily loading data on demand.
> Libsolv works differently and the underlying data structures are designed according to that

another regression - don't get me wrong but when the update-manager need 2-4 times more RAM than the whole production load something is terrible wrong - and no you don#t assign useless memory on virtual machines because it makes a ton of operations like live-migration slower than it could be (besides resource waste)

Comment 24 Sandro 2022-08-23 16:08:14 UTC
(In reply to Jaroslav Mracek from comment #16)
> Is it an issue?
> The minimal requirement for Fedora is 2 GB of RAM.

Besides cloud machine, that would make a lot of ARM devices unsupported. I experienced this issue on a 1G RPi 3. Had to resort to microdnf to work around it.

Comment 25 Adam Williamson 2022-08-23 16:39:17 UTC
Note also, "The minimal requirement for Fedora is 2 GB of RAM" is not really true. The story we tell is more complicated than that:

https://docs.fedoraproject.org/en-US/fedora/latest/release-notes/welcome/Hardware_Overview/

It says 2G for the "default installation", which is referring to a default Workstation install. But below that there is this boxout:

"Low memory installations

Fedora 36 can be installed and used on systems with limited resources for some applications. Text, VNC, or kickstart installations are advised over graphical installation for systems with very low memory. Larger package sets require more memory during installation, so users with less than 768MB of system memory may have better results performing a minimal install and adding to it afterward.

For best results on systems with less than 1GB of memory, use the DVD installation image."

which overall certainly gives the impression that minimal installs should work on lower-memory systems.

Comment 26 Peter Robinson 2022-08-23 17:53:23 UTC
(In reply to Adam Williamson from comment #25)
> Note also, "The minimal requirement for Fedora is 2 GB of RAM" is not really
> true. The story we tell is more complicated than that:

I believe the original 2Gb on Workstation was set because when running from an ram disk when installing the selinux-policy package and dealing with that it required ~ 1.5gb of RAM to install. There was someone back in the day that documented this analysis, probably via a blog post.

Comment 27 Adam Williamson 2022-08-23 18:11:27 UTC
However it was set, it seems like a reasonable number for a graphical install. But it's clearly not the whole story with all the different ways you can deploy/use Fedora these days.

Comment 28 Colin Walters 2022-08-23 21:43:29 UTC
Note that rpm-ostree based systems by default don't load the repodata; in combination with flatpak/podman for applications one can use Fedora systems without being affected by this.

(But, the moment one engages client-side layering e.g. `rpm-ostree install` or `dnf install` inside a container client side, that's gone)

> 1. Use only small repositories - Distribution can resolve the issue

This I agree with!  Fedora could split separate repositories like core/extras/buildroot for RPMs.  Splitting off build-only packages alone would cut out all of the e.g. rust- packages which are something like 20% of the repodata last I looked.

> 2. Make loading file list optional - we will resolve it in the next generation of software management tool - DNF5/LIBDNF5 - RFE for Fedora 38+

That's been discussed over and over - having optional filelists doesn't help because there are important packages that use `Requires: /usr/bin/foo` and so the filelist will very commonly be needed anyways.

Comment 29 Peter Robinson 2022-08-23 21:56:00 UTC
> > 2. Make loading file list optional - we will resolve it in the next generation of software management tool - DNF5/LIBDNF5 - RFE for Fedora 38+
> 
> That's been discussed over and over - having optional filelists doesn't help
> because there are important packages that use `Requires: /usr/bin/foo` and
> so the filelist will very commonly be needed anyways.

The way the original yum dealt with that was by having a subset of bin/sbin and other key ones where most of the requires were in a separate smaller cache that way it's not loading the 100s of libs/include/docs/etc files that far out number the ones in the path

Comment 30 Ben Cotton 2022-08-24 15:34:29 UTC
In today's Prioritized Bugs meeting, we agreed to reject this as a Prioritized Bug as it requires large-scope changes to the distribution. mattdm will coordinate or delegate an effort to develop an F38 Change proposal to address the issues: https://meetbot.fedoraproject.org/fedora-meeting-1/2022-08-24/fedora_prioritized_bugs_and_issues.2022-08-24-14.00.log.html#l-93

Comment 31 Zbigniew Jędrzejewski-Szmek 2022-08-29 07:09:20 UTC
>> That's been discussed over and over - having optional filelists doesn't help because there are important packages
>> that use `Requires: /usr/bin/foo` and so the filelist will very commonly be needed anyways.
> The way the original yum dealt with that was by having a subset of bin/sbin and other key ones where most of
> the requires were in a separate smaller cache that way it's not loading the 100s of libs/include/docs/etc files that
> far out number the ones in the path

Yeah. Separating out the non-primary-filepath-data (i.e. filepaths except for the small primary subset)
would be great. Our packaging guidelines actually still reflect this [1]:
only dependencies on /etc, /usr/bin, /usr/sbin are allowed. Rpmlint warns about non-conforming dependencies,
so the majority of packages conform. The actual implementation is different, and for various historical
reasons the check is done intentionally sloppily [2].

If we do this, we might also significantly reduce the initial 80-mb download on every update:
IIRC, apparently 60MB of this is non-essential-filepath-data. This is a major source of pain
on slow and metered networks. 

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/#_file_and_directory_dependencies
[2] https://github.com/rpm-software-management/createrepo_c/blob/master/src/misc.c#L179

Comment 32 Hans de Goede 2022-08-29 08:53:19 UTC
As I also mentioned on the fedora-devel list, having ZRAM enabled would really help to counter this.

Linux' memory-management subsystem is known to pretty much always need at least some swap to work correctly. So for any installs where no swap is created we really should enable ZRAM by default (as we currently do for Workstation installs). I believe that enabling ZRAM will be an effective workaround for this problem.

Comment 33 Peter Robinson 2022-08-29 10:54:19 UTC
(In reply to Hans de Goede from comment #32)
> As I also mentioned on the fedora-devel list, having ZRAM enabled would
> really help to counter this.
> 
> Linux' memory-management subsystem is known to pretty much always need at
> least some swap to work correctly. So for any installs where no swap is
> created we really should enable ZRAM by default (as we currently do for
> Workstation installs). I believe that enabling ZRAM will be an effective
> workaround for this problem.

We do this already on arm since F-29:

https://fedoraproject.org/wiki/Changes/ZRAMforARMimages

And it's been used elsewhere since F-33:
https://fedoraproject.org/wiki/Changes/SwapOnZRAM

Comment 34 Vratislav Podzimek 2022-08-29 18:36:44 UTC
And it doesn't help much in this case. I tried enabling swap on ZRAM in the vagrant machine and it wasn't enough.

Comment 35 Geoffrey Marr 2022-08-29 21:30:37 UTC
Discussed during the 2022-08-29 blocker review meeting: [0]

The decision to classify this bug as a "RejectedBlocker (Beta)" was made on the grounds it already affects F36 so blocking F37 on it doesn't achieve much. We also note no simple fix has been identified; fixing this may require a significant overhaul of DNF (which is already in the works as DNF 5). We note that it may be desirable to require the system requirements doc to be updated and possibly to include microdnf in installs likely to be used on.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2022-08-29/f37-blocker-review.2022-08-29-16.01.txt

Comment 36 Hubert Kario 2022-12-02 17:38:45 UTC
Looks like even with 3GB of swap present the recent Fedora 35 and 36 repos are getting a bit too large to work with machines that have just 1GB of RAM (I have constant dnf install commands killed by OOM).

When on a machine with more RAM I run:
dnf clean all
/usr/bin/time dnf update --refresh

On Fedora 35 I get:

Beaker Client - Fedora35                                                                                                                                                    41 kB/s | 7.6 kB     00:00    
Beaker harness                                                                                                                                                             510 kB/s |  64 kB     00:00    
Fedora 35 - x86_64                                                                                                                                                         2.6 MB/s |  79 MB     00:29    
Fedora 35 openh264 (From Cisco) - x86_64                                                                                                                                   3.8 kB/s | 2.5 kB     00:00    
Fedora Modular 35 - x86_64                                                                                                                                                 3.0 MB/s | 3.3 MB     00:01    
Fedora 35 - x86_64 - Updates                                                                                                                                                14 MB/s |  34 MB     00:02    
Fedora Modular 35 - x86_64 - Updates                                                                                                                                       5.1 MB/s | 3.9 MB     00:00    
Dependencies resolved.
Nothing to do.
Complete!
39.73user 3.11system 1:13.63elapsed 58%CPU (0avgtext+0avgdata 862884maxresident)k
4984inputs+648248outputs (8major+469876minor)pagefaults 0swaps

On Fedora 36 I get:

Beaker Client - Fedora36                                                                                                                                                    80 kB/s | 7.2 kB     00:00    
Beaker harness                                                                                                                                                             513 kB/s |  63 kB     00:00    
Fedora 36 - x86_64                                                                                                                                                          32 MB/s |  81 MB     00:02    
Fedora 36 openh264 (From Cisco) - x86_64                                                                                                                                   3.3 kB/s | 2.5 kB     00:00    
Fedora Modular 36 - x86_64                                                                                                                                                 2.2 MB/s | 2.4 MB     00:01    
Fedora 36 - x86_64 - Updates                                                                                                                                                17 MB/s |  30 MB     00:01    
Fedora Modular 36 - x86_64 - Updates                                                                                                                                       5.6 MB/s | 2.9 MB     00:00    
Dependencies resolved.
Nothing to do.
Complete!
32.00user 1.51system 0:37.43elapsed 89%CPU (0avgtext+0avgdata 946888maxresident)k
0inputs+625576outputs (0major+488552minor)pagefaults 0swaps

While on Fedora 37 I get:

Beaker Client - Fedora37                                                                                                                                                    77 kB/s | 6.9 kB     00:00    
Beaker harness                                                                                                                                                             508 kB/s |  63 kB     00:00    
Fedora 37 - x86_64                                                                                                                                                          22 MB/s |  64 MB     00:02    
Fedora 37 openh264 (From Cisco) - x86_64                                                                                                                                   4.3 kB/s | 2.5 kB     00:00    
Fedora Modular 37 - x86_64                                                                                                                                                 3.8 MB/s | 3.0 MB     00:00    
Fedora 37 - x86_64 - Updates                                                                                                                                                10 MB/s |  16 MB     00:01    
Fedora Modular 37 - x86_64 - Updates                                                                                                                                       1.8 MB/s | 911 kB     00:00    
Dependencies resolved.
Nothing to do.
Complete!
33.68user 2.32system 0:39.92elapsed 90%CPU (0avgtext+0avgdata 472512maxresident)k
96inputs+493304outputs (0major+337451minor)pagefaults 0swaps

Comment 37 David Baron 2023-01-03 21:52:37 UTC
What worked for me, to fix DNF running out of memory on my EC2 instance, was the advice from https://ask.fedoraproject.org/t/prune-dnf-history-database/14633, which was to remove (or, in my case, rename, in case I later want what's in them) the files /var/lib/dnf/history.sqlite /var/lib/dnf/history.sqlite-shm and /var/lib/dnf/history.sqlite-wal .

Comment 38 Harald Reindl 2023-01-03 21:56:08 UTC
i prune my /var/lib/dnf/* regulary for years - the only thing to avoid this is currently have your own caching-repos with only the subset of packages used on your machines - from the moment on a turn up official repos on machines with low RAM by intention DNF stops to work - thats ridicolous given that 512 MB should be enough for most production loads on dedicated servers

Comment 39 Leonardo Amaral 2023-01-06 17:23:37 UTC
I'm affected by this issue on Raspberry Pi Fedora 34 to upgrade to Fedora 35 or 36. Even following @h.reindl suggestion and setting zram to factor 0.9 and lz4 compressor, dnf crashes everything - including ssh shell in the middle of package update:

[root@srvad02 ~]# dnf system-upgrade download --refresh --releasever=36 --allowerasing
Before you continue ensure that your system is fully upgraded by running "dnf --refresh upgrade". Do you want to continue [y/N]: y
Fedora 36 - aarch64                                                                                                           9.5 kB/s |  36 kB     00:03    
Fedora 36 openh264 (From Cisco) - aarch64                                                                                     6.5 kB/s | 990  B     00:00    
Fedora Modular 36 - aarch64                                                                                                   208 kB/s |  36 kB     00:00    
Fedora 36 - aarch64 - Updates                                                                                                 175 kB/s |  31 kB     00:00    
Connection to 192.168.88.4 closed by remote host.
Connection to 192.168.88.4 closed.


I see no way to upgrade this fedora instalation using system-upgrade.

Comment 40 Harald Reindl 2023-01-06 17:30:58 UTC
i didn't say anything about zram beause it's nonsense - the repodata can't be swapped out and zram is nothing else then swap
you need to disbale as much repos as you can (modular as example) and terminate every process not needed for your ssh-session and a running internet connection

Comment 41 Hubert Kario 2023-01-06 18:45:36 UTC
It may be possible to workaround it by setting the kernel memory overcommit behaviour to 1, as root:

echo 1 > /proc/sys/vm/overcommit_memory

And having enough free swap available (likely few GB)

See https://www.kernel.org/doc/Documentation/vm/overcommit-accounting.

I haven't tried it as it was easier for my use case to increase the memory of individual VMs.

Comment 42 David Baron 2023-01-06 18:51:33 UTC
For what it's worth, my previous workaround only lasted for a day or two (on a VM with 1GB of RAM and 1GB of swap), then it started happening again.  I narrowed the problem down (using the --disablerepo) option to the "updates" repository (for Fedora 36).

I decided to just upgrade the VM to Fedora 37, since I'd been meaning to do so anyway, and that seems to have fixed the problem.  I think there may be something strange about the Fedora 36 updates repo that's triggering this.

Comment 43 Hubert Kario 2023-01-06 18:56:14 UTC
There's nothing strange about Fedora 36, it simply has half a year of extra packages in the updates repo. It will start happening in Fedora 37 when it gets similarly large updates repo. See my Comment #36: half the size of the Updates repo, half the size of Max resident.

Comment 44 ulissesf 2023-01-28 21:25:07 UTC
Hi,

I started having this issue with a DigitalOcean droplet with 1G RAM but also a 1G swap on Fedora 36. I never had this problem before even when my droplet had only 512MB RAM and I was using previous versions of Fedora. Below is the output of few commands with OOM killer always killing dnf and my SSH connection.

$ free
               total        used        free      shared  buff/cache   available
Mem:          986388      140712      585748        3084      259928      702376
Swap:         986108           0      986108

# dnf -v update
Loaded plugins: builddep, changelog, config-manager, copr, debug, debuginfo-install, download, generate_completion_cache, groups-manager, needs-restarting, playground, repoclosure, repodiff, repograph, repomanage, reposync
DNF version: 4.14.0
cachedir: /var/cache/dnf
User-Agent: constructed: 'libdnf (Fedora Linux 36; cloud; Linux.x86_64)'
repo: using cache for: fedora
fedora: using metadata from Wed 04 May 2022 09:16:11 PM UTC.
repo: using cache for: fedora-cisco-openh264
fedora-cisco-openh264: using metadata from Thu 06 Oct 2022 11:02:51 AM UTC.
repo: using cache for: fedora-modular
fedora-modular: using metadata from Wed 04 May 2022 09:12:01 PM UTC.
repo: using cache for: updates
Connection to <server> closed by remote host.
Connection to <server> closed.

From dmesg:
[  412.552818] Out of memory: Killed process 1064 (dnf) total-vm:1007084kB, anon-rss:24676kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:2028kB oom_score_adj:0

Any ideas on how to solve this so I can update my droplet? Thanks.

-- Ulisses

Comment 45 Paul E. Jones 2023-01-28 21:55:26 UTC
(In reply to ulissesf from comment #44)
> Hi,
> 
> I started having this issue with a DigitalOcean droplet with 1G RAM but also
> a 1G swap on Fedora 36. I never had this problem before even when my droplet
> had only 512MB RAM and I was using previous versions of Fedora. Below is the
> output of few commands with OOM killer always killing dnf and my SSH
> connection.
> 
> $ free
>                total        used        free      shared  buff/cache  
> available
> Mem:          986388      140712      585748        3084      259928     
> 702376
> Swap:         986108           0      986108
> 
> # dnf -v update
> Loaded plugins: builddep, changelog, config-manager, copr, debug,
> debuginfo-install, download, generate_completion_cache, groups-manager,
> needs-restarting, playground, repoclosure, repodiff, repograph, repomanage,
> reposync
> DNF version: 4.14.0
> cachedir: /var/cache/dnf
> User-Agent: constructed: 'libdnf (Fedora Linux 36; cloud; Linux.x86_64)'
> repo: using cache for: fedora
> fedora: using metadata from Wed 04 May 2022 09:16:11 PM UTC.
> repo: using cache for: fedora-cisco-openh264
> fedora-cisco-openh264: using metadata from Thu 06 Oct 2022 11:02:51 AM UTC.
> repo: using cache for: fedora-modular
> fedora-modular: using metadata from Wed 04 May 2022 09:12:01 PM UTC.
> repo: using cache for: updates
> Connection to <server> closed by remote host.
> Connection to <server> closed.
> 
> From dmesg:
> [  412.552818] Out of memory: Killed process 1064 (dnf) total-vm:1007084kB,
> anon-rss:24676kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:2028kB
> oom_score_adj:0
> 
> Any ideas on how to solve this so I can update my droplet? Thanks.
> 
> -- Ulisses


Ulisses,

The one suggestion I saw above that worked for the one machine I manage that is RAM-constrained was to use "microdnf".  To get that installed, I created several GB of swap space and only installed that tool.  Once installed, I removed the superfluous swap space and used that tool to update other packages.

Paul

Comment 46 Hubert Kario 2023-01-30 16:32:01 UTC
You should also be able to install individual packages by calling rpm directly

Comment 47 Daniel Mach 2023-01-30 19:16:27 UTC
> Any ideas on how to solve this so I can update my droplet? Thanks.
> 

Could you run dnf makecache and then your command? It should improve dnf memory consumption. See my comment#19

Also using microdnf instead of dnf should help. It might be tricky to get it installed due to your memory limitation. In that case try disabling all but the main repo: dnf --repoid=fedora makecache && dnf --repoid=fedora install microdnf

Comment 48 ulissesf 2023-01-31 01:51:18 UTC
Paul, Hubert and Daniel, thanks for the suggestions. I installed microdnf by installing libpeas from fedora repo with dnf (disabling all others repos but fedora one) and then installing the microdnf rpm directly since dnf couldn't find microdnf in the fedora repo. microdnf was then able to update my system without any OOM events. Thanks!

Now this seems to be a big regression with dnf on Fedora. I used to have only 512MB of RAM and everything worked. Now I have 1GB RAM and 1GB swap and it gets out of memory. I didn't change anything in terms of what I do with this system, I only updated it to F36 and was trying to keep it up to date.

Comment 49 Harald Reindl 2023-01-31 07:57:03 UTC
512 MB RAM is enough for dozens of workload like nameservers, voip-machines, routers, firewalls and works pretty well as long as the bloatet fedora repo metadata are avoided by local caches in the network - for the sake of god: is it asked too much that dnf developers run a virtual machine with 512 MB RAM and try if their software survives a "dnf upgrade" there?

Comment 50 LAMurakami 2023-04-15 09:22:24 UTC
I have been running a LAMP stack on AWS t3.nano instances with only 0.5 GiB of memory since 2018.  I have Cloud Init files for Amazon Linux 2 and Ubuntu.  Both of these are able to install a large number of packages during the initialization with the "packages:" section that takes place early in the initialization sequence before the instance has a swap file.  Amazon Linux 2 uses yum and and Ubuntu apt for package management.

Any packages specified in the "packages:" section for Amazon Linux 2023 causes the Cloud Init to hang when on a t3.nano instance.  The issue does not appear when a t2.micro with 1 GiB of memory is launched.  I was able to get around this on a t3.micro instance by using "dnf -y install" in the "runcmd:" section with the install broken up into smaller groups of packages after a 768M swap file was created and enabled.  Although I was able to get around the effect of this bug the initialization on the Amazon Linux 2023 t3.nano instance takes about 10 minutes which is twice as long as for either the Ubuntu or Amazon Linux 2 instances.  This is without etckeeper, glances and a few other things that are not yet available for Amazon Linux 2023 which doesn't provide EPEL support and has only been available for a month or so.

I am surprised that the dnf developers are so O.K. with their package manager being so inferior to yum or get in this type of case.  This is my first experience with dnf and it is disappointing that this is a known bug that has been around for years with no apparent path for a fix.

Comment 51 Adam Williamson 2023-04-15 16:16:38 UTC
dnf is the successor to yum and works very similarly to it; I don't know for sure but I'd expect yum actually theoretically has the same problem. I'd guess the difference is more likely just that there's more packages in al2023 and therefore more metadata; the problem is caused by yum/dnf trying to parse all the metadata.

I do wonder if AL could switch to microdnf/dnf5, or at least include it and provide some kind of option to use it at deployment time for small instance cases like this? David, have you looked into it? See the rest of this bug for context.

Comment 52 Harald Reindl 2023-04-15 16:21:54 UTC
this is a combination of differnt things like more and more metadata and dnf was alyways using more memory than yum

until now dnf still isn't on pair with yum when it comes to simple things like output the problem of depsolv
https://bugzilla.redhat.com/show_bug.cgi?id=2186544 is a perfect example where yum 15 years ago simply would have said where the problem is

Comment 53 LAMurakami 2023-04-17 04:28:31 UTC
Any packages specified in the "packages:" section of a Cloud Init specification for Amazon Linux 2023 causes the Cloud Init to fail on a t3.nano instance.

This does not happen for Amazon Linux 2 on a t3.nano AWS instance.  Amazon Linux 2 uses yum.

This does not happen for Ubuntu 24.04 on a t3.nano AWS instance.  Ubuntu 24.04 uses apt.

This does not happen on a t2.micro instance.  A t2.micro has 1 GiB of memory and a t3.nano has 0.5 GiB of memory.

I posted a comment on this bug earlier but this comment has some more detail.

boot message for oom killing dnf launched by Cloud Init on t3.nano AWS instance.

$ sudo dmesg | tail -2
[   54.149669] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/cloud-final.service,task=dnf,pid=1528,uid=0
[   54.153706] Out of memory: Killed process 1528 (dnf) total-vm:512092kB, anon-rss:80396kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:328kB oom_score_adj:0
[19:55:02 Sunday 04/16/2023] ec2-user@ip-172-31-24-125 ~

The Cloud Init process terminates when dnf is killed on a t3.nano AWS instance.

$ sudo tail -3 /var/log/cloud-init-output.log
Cloud-init v. 22.2.2 running 'modules:config' at Mon, 17 Apr 2023 03:24:18 +0000. Up 10.79 seconds.
Cloud-init v. 22.2.2 running 'modules:final' at Mon, 17 Apr 2023 03:24:19 +0000. Up 11.99 seconds.
Amazon Linux 2023 repository                     18 MB/s |  12 MB     00:00    
[19:57:12 Sunday 04/16/2023] ec2-user@ip-172-31-24-125 ~

A single package specified in the packages section causes this failure on a t3.nano AWS instance.

$ sudo head -16 /var/lib/cloud/instances/i-0fc2abba3e8539af5/user-data.txt | tail -4

packages:
 - mlocate

[19:59:34 Sunday 04/16/2023] ec2-user@ip-172-31-24-125 2 files, 60K ~

Info on the version of dns and another error message on a t3.nano AWS instance.

$ dnf info dnf
Amazon Linux 2023 Kernel Livepatch repository                                  811 kB/s | 155 kB     00:00    
Installed Packages
History database cannot be created, using in-memory database instead: SQLite error on "/var/lib/dnf/history.sqlite": Open failed: unable to open database file
Name         : dnf
Version      : 4.12.0
Release      : 2.amzn2023.0.4
Architecture : noarch
Size         : 2.2 M
Source       : dnf-4.12.0-2.amzn2023.0.4.src.rpm
Repository   : @System
Summary      : Package manager
URL          : https://github.com/rpm-software-management/dnf
License      : GPLv2+
Description  : Utility that allows users to manage packages on their systems.
             : It supports RPMs, modules and comps groups & environments.

[20:04:17 Sunday 04/16/2023] ec2-user@ip-172-31-24-125 ~

Comment 54 Joe Spencer 2023-06-02 08:51:04 UTC
I'm effected by this with my raspberry pi.  

[root@localhost systemd]# uname -a
Linux localhost.localdomain 6.0.12-100.fc35.armv7hl #1 SMP Thu Dec 8 19:53:54 UTC 2022 armv7l armv7l armv7l GNU/Linux

It would be nice to upgrade to 38, but I was unable to do so.  36 is the only version I can upgrade to and during the upgrade process my session runs out of memory.

Comment 55 Peter Robinson 2023-06-02 09:45:24 UTC
> It would be nice to upgrade to 38, but I was unable to do so.  36 is the
> only version I can upgrade to and during the upgrade process my session runs
> out of memory.

microdnf works OK for this use case. You can do: "microdnf --releasever=38 distro-sync"

Comment 56 Dominik 'Rathann' Mierzejewski 2023-06-02 10:29:10 UTC
(In reply to Peter Robinson from comment #55)
> > It would be nice to upgrade to 38, but I was unable to do so.  36 is the
> > only version I can upgrade to and during the upgrade process my session runs
> > out of memory.
> 
> microdnf works OK for this use case. You can do: "microdnf --releasever=38
> distro-sync"

Except it won't work to upgrade to F38, because Joe is on armv7hl which is not supported in F37+.

Comment 57 Peter Robinson 2023-06-02 11:12:30 UTC
> Linux localhost.localdomain 6.0.12-100.fc35.armv7hl #1 SMP Thu Dec 8
> 19:53:54 UTC 2022 armv7l armv7l armv7l GNU/Linux
>
> It would be nice to upgrade to 38, but I was unable to do so.  36 is the
> only version I can upgrade to and during the upgrade process my session runs
> out of memory.

We retired armv7 as of F-36 so you can't go to anything newer than F-36 (https://fedoraproject.org/wiki/Changes/RetireARMv7). To you to F-38 you will need to reinstall with an aarch64 image. To go to F-36 you can use microdnf : "microdnf --releasever=36 distro-sync"

Comment 58 Mohamed Akram 2023-06-07 22:19:53 UTC
Why can't `dnf update` do the equivalent of (dnf makecache && dnf update) itself?

Comment 59 Harald Reindl 2023-06-07 22:44:49 UTC
> Why can't `dnf update` do the equivalent of (dnf makecache && dnf update) itself?

what makes you think it doesn't and what has this to do with the unnacepatble memory usage for years?

Comment 60 Mohamed Akram 2023-06-07 22:49:39 UTC
Per comment #47, it's a workaround that resolved the issue for me.

Comment 61 Peter Robinson 2023-06-13 08:28:55 UTC
For reference there's a similar issue in dnf5 which has now replaced microdnf which was supposed to fix this issue. I've filed the following against it: https://bugzilla.redhat.com/show_bug.cgi?id=2214520

Comment 62 Benjamin Herrenschmidt 2023-08-03 23:35:27 UTC
So I read here or there about the need to "read the metadata and convert it to libsolv in RAM"... there's actually a good way to make this resilient to low memory conditions without have swap globally enabled, which is to replace malloc() or equivalent with mmap'ing a large file. This in effect means it will swap within that region. Of course it doesn't help if the file is in `/tmp` since that's usually tmpfs (in RAM) these days... I haven't dived into libdnf etc.. but same with opening large files in general, mmap'ing them rather than malloc() will make it much more resilient to low memory conditions in absence of swap. On most cloud systems, disks are networked, so swap will suck (can be alleviated with a zswap front-end but in overcommited low-memory systems I wonder how that really behaves).

Comment 63 Colin Walters 2023-08-04 01:31:18 UTC
Just reposting here, because it's surprising how many people don't understand:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/FMM4SKQ4TERRVPSWFK2UUKRTBW4KBLHM/

Everyone needs to internalize:  This has nothing to do with DNF (or microdnf, etc), really.

It's about the *size of the repository metadata*.

Every single time someone adds a new package into the single "Fedora" rpm-md
repo, that makes clients going OOM a bit more likely.  Every single time there's a new
package, it makes it more likely that a client will hit a timeout downloading the repodata
and fail to get that critical kernel security update.

Even doing the obvious thing and splitting off e.g. a `fedora-devel` rpm-md repo (as is
done kind of done in RHEL with CRB) that had all the -devel and Go/Rust etc. packages
would likely create a huge amount of savings.

Ultimately, "package all the world's software into a single giant repository and dynamically solve dependencies across all of it" will *never* scale (at least, not to smaller machines).

Shrinking the metadata (mainly, not loading all the file lists) would gain a lot of headroom.  But, the endless growth of software in the world versus the shrinking of general purpose computers and lower bandwidth connections will always have a high tension, and those small/low bandwidth systems just aren't well suited to packages - they're best suited to receiving pre-built images (which may contain packages, or might not).

Comment 64 Gustavo Sverzut Barbieri 2023-08-04 01:38:48 UTC
Sorry Colin, but it's really a DNF problem on how it deals with the metadata and memory usage.

Since the metadata is mostly read-only, the disk format should be laid in a way that allows scanning without bringing everything into RAM, be with indexes, pages of indexes, etc.

We can't control how many packages will exist, but we can control the memory consumption of code and not assume infinite memory is available.

Comment 65 Harald Reindl 2023-08-04 03:59:42 UTC
> It's about the *size of the repository metadata*

no - it's a design problem when you load the entire metadata of "all the world's software" completly into the RAM

you where able to store gigabytes of data into mysql-myisam and operate on a machine with 192 MB RAM and 466 Mhz (running the OS, a desktop and a ton of applications besdies the database server) 20 years ago with smart indexes and in 2023 you tell us you need 1 GB RAM for something trivial like package metadata

you don't need the whole metadata in memory all the time unless there is a giant design problem

Comment 66 Peter Robinson 2023-08-04 07:02:37 UTC
I also don't understand why createrepo can't generate the libsolv DBs and then dnf just consumes then rather than generating them on every single computer from the xml file. That would save a lot of local CPU cycles by doing it just once.

Comment 67 Adam Williamson 2023-08-04 07:47:59 UTC
wouldn't that be because of inter-repository dependencies? the depsolving will differ depending on which repos you have enabled, right?

Comment 68 Peter Robinson 2023-08-04 08:09:50 UTC
(In reply to Adam Williamson from comment #67)
> wouldn't that be because of inter-repository dependencies? the depsolving
> will differ depending on which repos you have enabled, right?

I believe it generates a libsolv db per repo.

Comment 69 Marek Blaha 2023-08-04 08:10:47 UTC
Each repository has it's own *.solv file (or files as comps, files, and modules are stored separately), so inter-repository deps should not be problem.
What comes to my mind as potential problems is:
- endianity - as *.solv is binary format directly copied to memory and can be used on different architectures
- versioning - *.solv format changes in time and there is no guarantee that every libsolv version can handle every *.solv file version.

Comment 70 Benjamin Herrenschmidt 2023-08-05 06:00:54 UTC
So even keeping the generation of .solv local, we can probably improve things tremendously by making use of mmap. mmap basically gives you swap in an environment where swap isn't enabled basically. I haven't had a chance yet to dive deep enough into dnf/libsolv to really pinpoint the steps that are killing us on small system sizes (I work for AWS so I'm concerned about things like "nano" instances with 0.5GB of RAM, I'm mostly concerned about Amazon Linux but if we fix this we'll try to fix it upstream), however I did notice that libsolv has its allocator abstraction.

So one experiement I might do is replace the allocators in there with some simple off-the-shelf malloc implementation that operate on a big mmaped file in /var/tmp and see where that takes us.

That said, it would be nice if the solv file was just mmap'ed for reading.... maybe things like strings etc can then point to the mmap'ed file rather than be copied, I don't know. It seems like libsolv would need quite a lot of work to operate directly on an mmap'ed file for all read functionality rather than its in-memory representation, but I have only given it a cursory glance at this point.

Comment 71 Kamil Páral 2024-03-27 12:34:49 UTC
Fedora 40 Beta now includes this change:
https://fedoraproject.org/wiki/Changes/DNFConditionalFilelists

Can anyone confirm whether it resolves this problem on machines with low amount of memory?


Note You need to log in before you can comment on or make changes to this bug.