Description of problem: On a "Fedora 33 Cloud" instance on a: Bezosville "t4g.nano" machine (ARM Graviton, 0.5 GiB of RAM) (https://aws.amazon.com/ec2/instance-types/) Immediately after the image has come up, run "dnf update" dnf gets killed by oom-kill due to high memory usage (remember when we had workstation with 64 MiB of RAM 😂). "free" actually lists only 423 MiB free total memory. == [ 1203.399002] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-1.scope,task=dnf,pid=1022,uid=0 [ 1203.401135] Out of memory: Killed process 1022 (dnf) total-vm:371088kB, anon-rss:287848kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:736kB oom_score_adj:0 [ 1203.426932] oom_reaper: reaped process 1022 (dnf), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB == On the command line, we see: - dnf starts - contacts the update servers - waits a few seconds - then gets killed The solution is to add a swap disk: --- dd if=/dev/zero of=/mnt/2GB.swap count=2048 bs=1024K mkswap /mnt/2GB.swap chmod 600 /mnt/2GB.swap swapon /mnt/2GB.swap --- Then "dnf update" works nicely. So I don't know what the requirement for dnf is regarding RAM, but this seems like a lot nevertheless.
FWIW, there was a similar issue in F26-27: https://bugzilla.redhat.com/show_bug.cgi?id=1432219 It was fixed in F28 with dnf-2.7.5 and libdnf-0.11.1. But in F29 it started again: May 11 12:40:27 dnf-f29 kernel: Out of memory: Kill process 689 (dnf) score 748 or sacrifice child May 11 12:40:27 dnf-f29 kernel: Killed process 689 (dnf) total-vm:657152kB, anon-rss:367228kB, file-rss:0kB, shmem-rss:0kB # rpm -q dnf libdnf dnf-4.0.4-1.fc29.noarch libdnf-0.22.0-6.fc29.x86_64 And since then it doesn't work on a VM with 512MB RAM (I've tried on all releases F26-F34). It's not aarch64 only, but on x86_64 too.
This message is a reminder that Fedora 33 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '33'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 33 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
This bug appears to have been reported against 'rawhide' during the Fedora 36 development cycle. Changing version to 36.
Created attachment 1870204 [details] /var/log/messages for oom killing dnf
Comment on attachment 1870204 [details] /var/log/messages for oom killing dnf Same issue with Fedora 35 on a brand new 512 MB VM with just the OS, OOM killer kills both dnf and the shell (ends up back to the login prompt).
It seems it is the updates repo that causes it to OOM, specifically generating the solvx: == strace -e trace=open,openat,connect,accept dnf info time ... snip openat(AT_FDCWD, "/var/cache/dnf/updates-updateinfo.solvx", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/var/cache/dnf/updates-7eea87b22825bc0d/repodata/4df02c81ae2b639369ac317d5d6ab3ae85a62078fe2b4161cb75071452eb6c07-updateinfo.xml.zck", O_RDONLY) = 8 +++ killed by SIGKILL +++ Killed == Disabling zchunk metadata it fails in the same place: == openat(AT_FDCWD, "/var/cache/dnf/updates-updateinfo.solvx", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/var/cache/dnf/updates-7eea87b22825bc0d/repodata/9028d5cb3daf3d84a04189690e97356dc0b0bb75c4392bbbd03d8b026d0cdaf4-updateinfo.xml.xz", O_RDONLY) = 8 +++ killed by SIGKILL +++ Killed == dmesg == [ 1729.518235] Out of memory: Killed process 1665 (dnf) total-vm:1272448kB, anon-rss:635436kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1448kB oom_score_adj:0 == On a larger machine it takes around 800MB of RAM to complete.
We are hitting this in the Fedora CoreOS CI. Fedora releng recently just pushed an update to the f36 image [1] and now our tests that run on VMs with 1G of RAM don't make it. If I just run `dnf update on these machines it gets OOM killed. The command I run is somthing like: ``` podman run -it registry.fedoraproject.org/fedora:36 dnf update -y ``` The journal shows us: ``` Aug 01 11:27:50 cosa-devsh reverent_wescoff[1893]: [65B blob data] Aug 01 11:27:54 cosa-devsh reverent_wescoff[1893]: [2.0K blob data] Aug 01 11:28:07 cosa-devsh reverent_wescoff[1893]: [945B blob data] Aug 01 11:28:09 cosa-devsh reverent_wescoff[1893]: [1.1K blob data] Aug 01 11:28:10 cosa-devsh reverent_wescoff[1893]: [945B blob data] Aug 01 11:28:15 cosa-devsh kernel: dnf invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0 Aug 01 11:28:15 cosa-devsh kernel: CPU: 1 PID: 1911 Comm: dnf Not tainted 5.18.13-200.fc36.x86_64 #1 Aug 01 11:28:15 cosa-devsh kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 Aug 01 11:28:15 cosa-devsh kernel: Call Trace: Aug 01 11:28:15 cosa-devsh kernel: <TASK> Aug 01 11:28:15 cosa-devsh kernel: dump_stack_lvl+0x44/0x5c Aug 01 11:28:15 cosa-devsh kernel: dump_header+0x4a/0x1ff Aug 01 11:28:15 cosa-devsh kernel: oom_kill_process.cold+0xb/0x10 Aug 01 11:28:15 cosa-devsh kernel: out_of_memory+0x1be/0x4f0 Aug 01 11:28:15 cosa-devsh kernel: __alloc_pages_slowpath.constprop.0+0xc3c/0xcf0 Aug 01 11:28:15 cosa-devsh kernel: __alloc_pages+0x1e7/0x210 Aug 01 11:28:15 cosa-devsh kernel: folio_alloc+0x17/0x50 Aug 01 11:28:15 cosa-devsh kernel: __filemap_get_folio+0x175/0x420 Aug 01 11:28:15 cosa-devsh kernel: filemap_fault+0x151/0x980 Aug 01 11:28:15 cosa-devsh kernel: __do_fault+0x36/0x130 Aug 01 11:28:15 cosa-devsh kernel: __handle_mm_fault+0xdaf/0x1400 Aug 01 11:28:15 cosa-devsh kernel: ? __pv_queued_spin_lock_slowpath+0x156/0x2b0 Aug 01 11:28:15 cosa-devsh kernel: handle_mm_fault+0xae/0x280 Aug 01 11:28:15 cosa-devsh kernel: do_user_addr_fault+0x1c5/0x670 Aug 01 11:28:15 cosa-devsh kernel: ? kvm_read_and_reset_apf_flags+0x3f/0x60 Aug 01 11:28:15 cosa-devsh kernel: exc_page_fault+0x70/0x170 Aug 01 11:28:15 cosa-devsh kernel: asm_exc_page_fault+0x21/0x30 Aug 01 11:28:15 cosa-devsh kernel: RIP: 0033:0x7f078dc68470 Aug 01 11:28:15 cosa-devsh kernel: Code: Unable to access opcode bytes at RIP 0x7f078dc68446. Aug 01 11:28:15 cosa-devsh kernel: RSP: 002b:00007ffd90059c18 EFLAGS: 00010202 Aug 01 11:28:15 cosa-devsh kernel: RAX: 000000000054a7ff RBX: 0000557acf1273c0 RCX: 0000557acbcf80a0 Aug 01 11:28:15 cosa-devsh kernel: RDX: 0000000000000034 RSI: 0000557acbcf80a0 RDI: 0000557ad0e8435d Aug 01 11:28:15 cosa-devsh kernel: RBP: 0000000000000034 R08: 000000000054a6d1 R09: 0000557ac98b0470 Aug 01 11:28:15 cosa-devsh kernel: R10: 0000000000000000 R11: eae92cfea8765b52 R12: 0000557acbcf80a0 Aug 01 11:28:15 cosa-devsh kernel: R13: 00000000fffe5dc3 R14: 0000000000000034 R15: 0000557ac98b0470 Aug 01 11:28:15 cosa-devsh kernel: </TASK> Aug 01 11:28:15 cosa-devsh kernel: Mem-Info: Aug 01 11:28:15 cosa-devsh kernel: active_anon:401 inactive_anon:199213 isolated_anon:0 active_file:12 inactive_file:3 isolated_file:0 unevictable:0 dirty:0 writeback:0 slab_reclaimable:9166 slab_unreclaimable:16852 mapped:325 shmem:830 pagetables:1152 bounce:0 kernel_misc_reclaimable:0 free:11864 free_pcp:121 free_cma:0 Aug 01 11:28:15 cosa-devsh kernel: Node 0 active_anon:1604kB inactive_anon:796852kB active_file:48kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1300kB dirty:0kB writeback:0kB shmem:3320kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:4272kB pagetables:4608kB all_unreclaimable? yes Aug 01 11:28:15 cosa-devsh kernel: Node 0 DMA free:4176kB boost:0kB min:760kB low:948kB high:1136kB reserved_highatomic:0KB active_anon:0kB inactive_anon:11000kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Aug 01 11:28:15 cosa-devsh kernel: lowmem_reserve[]: 0 854 854 854 854 Aug 01 11:28:15 cosa-devsh kernel: Node 0 DMA32 free:43280kB boost:0kB min:43480kB low:54348kB high:65216kB reserved_highatomic:0KB active_anon:1604kB inactive_anon:785696kB active_file:400kB inactive_file:136kB unevictable:0kB writepending:0kB present:1031988kB managed:967732kB mlocked:0kB bounce:0kB free_pcp:484kB local_pcp:196kB free_cma:0kB Aug 01 11:28:15 cosa-devsh kernel: lowmem_reserve[]: 0 0 0 0 0 Aug 01 11:28:15 cosa-devsh kernel: Node 0 DMA: 2*4kB (UM) 2*8kB (UM) 2*16kB (UM) 1*32kB (M) 2*64kB (UM) 1*128kB (M) 1*256kB (U) 1*512kB (U) 1*1024kB (M) 1*2048kB (M) 0*4096kB = 4184kB Aug 01 11:28:15 cosa-devsh kernel: Node 0 DMA32: 1320*4kB (UME) 770*8kB (UME) 405*16kB (UME) 314*32kB (UME) 151*64kB (UME) 50*128kB (UME) 2*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44544kB Aug 01 11:28:15 cosa-devsh kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Aug 01 11:28:15 cosa-devsh kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Aug 01 11:28:15 cosa-devsh kernel: 905 total pagecache pages Aug 01 11:28:15 cosa-devsh kernel: 0 pages in swap cache Aug 01 11:28:15 cosa-devsh kernel: Swap cache stats: add 0, delete 0, find 0/0 Aug 01 11:28:15 cosa-devsh kernel: Free swap = 0kB Aug 01 11:28:15 cosa-devsh kernel: Total swap = 0kB Aug 01 11:28:15 cosa-devsh kernel: 261995 pages RAM Aug 01 11:28:15 cosa-devsh kernel: 0 pages HighMem/MovableOnly Aug 01 11:28:15 cosa-devsh kernel: 16222 pages reserved Aug 01 11:28:15 cosa-devsh kernel: 0 pages cma reserved Aug 01 11:28:15 cosa-devsh kernel: 0 pages hwpoisoned Aug 01 11:28:15 cosa-devsh kernel: Tasks state (memory values in pages): Aug 01 11:28:15 cosa-devsh kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Aug 01 11:28:15 cosa-devsh kernel: [ 1287] 0 1287 10667 313 102400 0 -250 systemd-journal Aug 01 11:28:15 cosa-devsh kernel: [ 1301] 0 1301 8698 936 94208 0 -1000 systemd-udevd Aug 01 11:28:15 cosa-devsh kernel: [ 1368] 990 1368 5913 883 94208 0 0 systemd-resolve Aug 01 11:28:15 cosa-devsh kernel: [ 1370] 0 1370 4341 229 69632 0 0 systemd-userdbd Aug 01 11:28:15 cosa-devsh kernel: [ 1371] 0 1371 4505 240 73728 0 0 systemd-userwor Aug 01 11:28:15 cosa-devsh kernel: [ 1372] 0 1372 4505 239 73728 0 0 systemd-userwor Aug 01 11:28:15 cosa-devsh kernel: [ 1373] 0 1373 4505 241 77824 0 0 systemd-userwor Aug 01 11:28:15 cosa-devsh kernel: [ 1376] 0 1376 63604 609 126976 0 0 NetworkManager Aug 01 11:28:15 cosa-devsh kernel: [ 1387] 0 1387 19811 64 57344 0 0 irqbalance Aug 01 11:28:15 cosa-devsh kernel: [ 1390] 0 1390 7119 338 90112 0 0 journalctl Aug 01 11:28:15 cosa-devsh kernel: [ 1421] 0 1421 4457 257 77824 0 0 systemd-homed Aug 01 11:28:15 cosa-devsh kernel: [ 1426] 0 1426 4978 574 81920 0 0 systemd-logind Aug 01 11:28:15 cosa-devsh kernel: [ 1428] 994 1428 21285 148 65536 0 0 chronyd Aug 01 11:28:15 cosa-devsh kernel: [ 1444] 81 1444 2700 196 61440 0 -900 dbus-broker-lau Aug 01 11:28:15 cosa-devsh kernel: [ 1445] 81 1445 1353 157 49152 0 -900 dbus-broker Aug 01 11:28:15 cosa-devsh kernel: [ 1448] 981 1448 240262 290 184320 0 0 zincati Aug 01 11:28:15 cosa-devsh kernel: [ 1466] 0 1466 84275 815 167936 0 0 rpm-ostree Aug 01 11:28:15 cosa-devsh kernel: [ 1494] 999 1494 748324 727 253952 0 0 polkitd Aug 01 11:28:15 cosa-devsh kernel: [ 1731] 0 1731 3909 318 69632 0 -1000 sshd Aug 01 11:28:15 cosa-devsh kernel: [ 1748] 0 1748 3309 312 61440 0 0 login Aug 01 11:28:15 cosa-devsh kernel: [ 1749] 0 1749 3393 313 65536 0 0 login Aug 01 11:28:15 cosa-devsh kernel: [ 1754] 1000 1754 5566 820 86016 0 100 systemd Aug 01 11:28:15 cosa-devsh kernel: [ 1756] 1000 1756 7080 1526 94208 0 100 (sd-pam) Aug 01 11:28:15 cosa-devsh kernel: [ 1763] 0 1763 4489 471 81920 0 0 sshd Aug 01 11:28:15 cosa-devsh kernel: [ 1765] 1000 1765 1403 361 53248 0 0 bash Aug 01 11:28:15 cosa-devsh kernel: [ 1769] 1000 1769 1325 356 49152 0 0 bash Aug 01 11:28:15 cosa-devsh kernel: [ 1807] 1000 1807 4489 471 81920 0 0 sshd Aug 01 11:28:15 cosa-devsh kernel: [ 1808] 1000 1808 1323 368 45056 0 0 bash Aug 01 11:28:15 cosa-devsh kernel: [ 1829] 1000 1829 371414 3770 286720 0 0 podman Aug 01 11:28:15 cosa-devsh kernel: [ 1838] 1000 1838 464186 12220 417792 0 0 podman Aug 01 11:28:15 cosa-devsh kernel: [ 1842] 1000 1842 273 1 28672 0 0 catatonit Aug 01 11:28:15 cosa-devsh kernel: [ 1863] 1000 1863 2597 85 57344 0 200 dbus-broker-lau Aug 01 11:28:15 cosa-devsh kernel: [ 1864] 1000 1864 1221 37 49152 0 200 dbus-broker Aug 01 11:28:15 cosa-devsh kernel: [ 1887] 1000 1887 1907 490 49152 0 0 slirp4netns Aug 01 11:28:15 cosa-devsh kernel: [ 1893] 1000 1893 20441 83 61440 0 0 conmon Aug 01 11:28:15 cosa-devsh kernel: [ 1896] 1000 1896 1205 140 45056 0 0 bash Aug 01 11:28:15 cosa-devsh kernel: [ 1911] 1000 1911 239442 168578 1531904 0 0 dnf Aug 01 11:28:15 cosa-devsh kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user/user.slice/libpod-6731617db3bbd19034611224dfb9e5d3b6d5a16c800ee137f616e932b529b3af.scope/container,task=dnf,pid=1911,uid=1000 Aug 01 11:28:15 cosa-devsh kernel: Out of memory: Killed process 1911 (dnf) total-vm:957768kB, anon-rss:674312kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:1496kB oom_score_adj:0 Aug 01 11:28:15 cosa-devsh systemd[1]: user: A process of this unit has been killed by the OOM killer. Aug 01 11:28:15 cosa-devsh systemd[1754]: libpod-6731617db3bbd19034611224dfb9e5d3b6d5a16c800ee137f616e932b529b3af.scope: A process of this unit has been killed by the OOM killer. Aug 01 11:28:15 cosa-devsh reverent_wescoff[1893]: Killed ``` This container has dnf-4.13.0-1.fc36.noarch [1] https://pagure.io/releng/issue/10935
Proposed as a Blocker for 37-beta by Fedora user pbrobinson using the blocker tracking app because: This is causing issues with composes, and a bunch of popular arm devices as well as memory constrained cloud instances.
OK. I did a little more investigation. It almost appears to be a combination of things. I went back to the old Fedora 36 container from before the most recent update [1] (I pushed this container to `quay.io/dustymabe/fedora:36` if anyone is interested testing with it). This container has: ``` [root@ebff6c88c346 /]# rpm -q dnf dnf-4.11.1-2.fc36.noarch ``` but I still can't `dnf update -y`. I still get an OOM. I almost wonder if this is somehow an issue with the updates repo metadata being changed in some way that DNF can't handle in a graceful way.
[1] https://pagure.io/releng/issue/10935
Ok I think I'm right.. This works fine: ``` podman run -it registry.fedoraproject.org/fedora:36 dnf update -y --disablerepo=updates --repofrompath=pungi0730,https://kojipkgs.fedoraproject.org/compose/updates/Fedora-36-updates-20220730.0/compose/Everything/x86_64/os ``` This gets an OOM: ``` podman run -it registry.fedoraproject.org/fedora:36 dnf update -y --disablerepo=updates --repofrompath=pungi0731,https://kojipkgs.fedoraproject.org/compose/updates/Fedora-36-updates-20220731.0/compose/Everything/x86_64/os ``` Is there something up with the repos starting with the 0731 compose?
Another reproducer: ``` $ vagrant init fedora/36-cloud-base $ vagrant up $ vagrant ssh -c "sudo dnf install openssl-devel" Fedora 36 - x86_64 3.3 MB/s | 81 MB 00:24 Fedora 36 openh264 (From Cisco) - x86_64 710 B/s | 2.5 kB 00:03 Fedora Modular 36 - x86_64 2.2 MB/s | 2.4 MB 00:01 Fedora 36 - x86_64 - Updates 3.6 MB/s | 24 MB 00:06 Connection to 192.168.121.178 closed by remote host. Connection to 192.168.121.178 closed. ``` IOW, this also affects the Fedora 36 Cloud Base image provided as a Vagrant box. And it is not aarch64 specific.
Created attachment 1903167 [details] repodiff between good and bad composes
I am sorry but we cannot do much here. The requirements for memory is related to a metadata size for the particular repository. DNF downloads particular repository then it converts data to internal format (libsolv) (in RAM) and then it stores processed data to the disk and frees the memory. If repository if big it requires more RAM then for the small one. How the problem can be resolved? 1. Use only small repositories - Distribution can resolve the issue 2. Make loading file list optional - we will resolve it in the next generation of software management tool - DNF5/LIBDNF5 - RFE for Fedora 38+ 3. Change libsolv to use RAM more efficiently - nice dream because it handles a lot of data 4. Delivery less files, provides, and requires in RPMs > smaller repositories, faster resolve of dependencies, requires less RAM to process. Right now it is permanently growing 5. Use compression of metadata that requires less RAM to decompress - Distribution can resolve the issue Is it an issue? The minimal requirement for Fedora is 2 GB of RAM.
(In reply to Jaroslav Mracek from comment #16) > > Is it an issue? > The minimal requirement for Fedora is 2 GB of RAM. I think that depends on who you ask. For example our Fedora containers (the default one) ships dnf. I think it would be sad to require 2G per container if the container wants to (for whatever reason) use dnf.
(In reply to Jaroslav Mracek from comment #16) > I am sorry but we cannot do much here. The requirements for memory is > related to a metadata size for the particular repository. DNF downloads > particular repository then it converts data to internal format (libsolv) (in > RAM) and then it stores processed data to the disk and frees the memory. If > repository if big it requires more RAM then for the small one. Can we just ship the libsolv database as part of the repo metadata so each machine doesn't have to process this locally? > How the problem can be resolved? > 1. Use only small repositories - Distribution can resolve the issue How do you suggest the "distribution" does this? > 2. Make loading file list optional - we will resolve it in the next > generation of software management tool - DNF5/LIBDNF5 - RFE for Fedora 38+ The original yum used to only load this data when doing operations that required it rather than all the time, this makes sense, you say "f-38+" so is it 38 or is it some time in the future? > 3. Change libsolv to use RAM more efficiently - nice dream because it > handles a lot of data > 4. Delivery less files, provides, and requires in RPMs > smaller > repositories, faster resolve of dependencies, requires less RAM to process. > Right now it is permanently growing I don't think this is a reasonable request for a core tool used by distributions. > 5. Use compression of metadata that requires less RAM to decompress - > Distribution can resolve the issue Please provide details of this > The minimal requirement for Fedora is 2 GB of RAM. Where is that documented? The installer required 1.5Gb, but it's not the only way to deploy Fedora.
(In reply to Peter Robinson from comment #18) > (In reply to Jaroslav Mracek from comment #16) > > I am sorry but we cannot do much here. The requirements for memory is > > related to a metadata size for the particular repository. DNF downloads > > particular repository then it converts data to internal format (libsolv) (in > > RAM) and then it stores processed data to the disk and frees the memory. If > > repository if big it requires more RAM then for the small one. > > Can we just ship the libsolv database as part of the repo metadata so each > machine doesn't have to process this locally? I don't think this is a good idea. The solv files are an internal cache. They may even suffer from problems with endianess. > > > How the problem can be resolved? > > 1. Use only small repositories - Distribution can resolve the issue > > How do you suggest the "distribution" does this? Distribution can influence (by packaging policy) what goes into Provides/Requires/Conflicts/Obsoletes and weak deps, and encourage maintainers to drop records that are no longer needed because they e.g. obsolete packages from a version of distro that is no longer relevant. Also, trimming changelogs helps in some cases (already done in Fedora). But all this doesn't improve the situation too much. Repodata is usually big and we all need to live with it (although it's still important to write good code and keep repos reasonably sized). > > > 2. Make loading file list optional - we will resolve it in the next > > generation of software management tool - DNF5/LIBDNF5 - RFE for Fedora 38+ > > The original yum used to only load this data when doing operations that > required it rather than all the time, this makes sense, you say "f-38+" so > is it 38 or is it some time in the future? YUM used sqlite3 repodata, lazily loading data on demand. Libsolv works differently and the underlying data structures are designed according to that. Moving back to the old sqlite3 format is not realistic. I can imagine designing a new database backend that would work with libsolv, but it would come with downsides - first of all the required disk space, because it must remain uncompressed to achieve reasonable speed. > > > 3. Change libsolv to use RAM more efficiently - nice dream because it > > handles a lot of data I measured used memory when waiting for transaction confirmation in fedora:36 container, 2nd run when solv files were on disk already. microdnf: 100M dnf5: 145M dnf5 from master (no filelists loaded by default): 110M dnf4: 147M dnf4 with filelist loading disabled (a HACK): 138M zypper: 164M I also find these numbers quite reasonable, but there is still some space for improvement. I'm more worried about the spikes that happen during the repo loading, which may be the problem that this bug is about. The problem seem to be in how libsolv is used across various dnf implementations rather than in libsolv itself: microdnf: 465M dnf5: 700M dnf5 from master (no filelists loaded by default): 195M dnf4: 680M dnf4 with filelist loading disabled (a HACK): 620M zypper: 225M > > 4. Delivery less files, provides, and requires in RPMs > smaller > > repositories, faster resolve of dependencies, requires less RAM to process. > > Right now it is permanently growing > > I don't think this is a reasonable request for a core tool used by > distributions. > > > 5. Use compression of metadata that requires less RAM to decompress - > > Distribution can resolve the issue > > Please provide details of this > Distributions can tweak compression parameters, but createrepo should come with reasonable defaults which are a good compromise between speed, compression ratio and decompression memory requirements. > > The minimal requirement for Fedora is 2 GB of RAM. > > Where is that documented? The installer required 1.5Gb, but it's not the > only way to deploy Fedora. Requiring more than installer makes no sense to me, because installer == DNF, GUI and much more. Ok, users can add more repos after installation, but still...
There's a blocker discussion ticket here: https://pagure.io/fedora-qa/blocker-review/issue/841 Affected teams' representatives can describe the impact and vote on a release blocker status in there.
Discussed during the 2022-08-22 blocker review meeting: [0] The decision to delay the classification of this as a blocker bug was made as this is a difficult call as it really depends on a subjective evaluation of how much RAM we're comfortable with requiring for basic packaging operations on a minimal Fedora environment. We will solicit input from various teams and re-consider this at a later time. [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2022-08-22/f37-blocker-review.2022-08-22-16.01.txt
On further consideration, as people are hitting this on F36, it seems odd to me to suggest blocking F37 on it. It's not an F37 bug, and blocking the F37 release on it wouldn't really help anyone, because if we don't ship F37 the only alternative we're offering is "use F36 instead", and the bug affects F36. So, I'm gonna nominate this as a prioritized bug instead, and suggest at the next blocker review that we drop it as a proposed blocker. It certainly seems like potentially a major problem that we can't do dnf operations with 1G or less of RAM, but it doesn't feel like the release blocker process is the way to handle it.
i had this on a F35 VM with way more than 0.5 GB RAM and updates-testing enabled > YUM used sqlite3 repodata, lazily loading data on demand. > Libsolv works differently and the underlying data structures are designed according to that another regression - don't get me wrong but when the update-manager need 2-4 times more RAM than the whole production load something is terrible wrong - and no you don#t assign useless memory on virtual machines because it makes a ton of operations like live-migration slower than it could be (besides resource waste)
(In reply to Jaroslav Mracek from comment #16) > Is it an issue? > The minimal requirement for Fedora is 2 GB of RAM. Besides cloud machine, that would make a lot of ARM devices unsupported. I experienced this issue on a 1G RPi 3. Had to resort to microdnf to work around it.
Note also, "The minimal requirement for Fedora is 2 GB of RAM" is not really true. The story we tell is more complicated than that: https://docs.fedoraproject.org/en-US/fedora/latest/release-notes/welcome/Hardware_Overview/ It says 2G for the "default installation", which is referring to a default Workstation install. But below that there is this boxout: "Low memory installations Fedora 36 can be installed and used on systems with limited resources for some applications. Text, VNC, or kickstart installations are advised over graphical installation for systems with very low memory. Larger package sets require more memory during installation, so users with less than 768MB of system memory may have better results performing a minimal install and adding to it afterward. For best results on systems with less than 1GB of memory, use the DVD installation image." which overall certainly gives the impression that minimal installs should work on lower-memory systems.
(In reply to Adam Williamson from comment #25) > Note also, "The minimal requirement for Fedora is 2 GB of RAM" is not really > true. The story we tell is more complicated than that: I believe the original 2Gb on Workstation was set because when running from an ram disk when installing the selinux-policy package and dealing with that it required ~ 1.5gb of RAM to install. There was someone back in the day that documented this analysis, probably via a blog post.
However it was set, it seems like a reasonable number for a graphical install. But it's clearly not the whole story with all the different ways you can deploy/use Fedora these days.
Note that rpm-ostree based systems by default don't load the repodata; in combination with flatpak/podman for applications one can use Fedora systems without being affected by this. (But, the moment one engages client-side layering e.g. `rpm-ostree install` or `dnf install` inside a container client side, that's gone) > 1. Use only small repositories - Distribution can resolve the issue This I agree with! Fedora could split separate repositories like core/extras/buildroot for RPMs. Splitting off build-only packages alone would cut out all of the e.g. rust- packages which are something like 20% of the repodata last I looked. > 2. Make loading file list optional - we will resolve it in the next generation of software management tool - DNF5/LIBDNF5 - RFE for Fedora 38+ That's been discussed over and over - having optional filelists doesn't help because there are important packages that use `Requires: /usr/bin/foo` and so the filelist will very commonly be needed anyways.
> > 2. Make loading file list optional - we will resolve it in the next generation of software management tool - DNF5/LIBDNF5 - RFE for Fedora 38+ > > That's been discussed over and over - having optional filelists doesn't help > because there are important packages that use `Requires: /usr/bin/foo` and > so the filelist will very commonly be needed anyways. The way the original yum dealt with that was by having a subset of bin/sbin and other key ones where most of the requires were in a separate smaller cache that way it's not loading the 100s of libs/include/docs/etc files that far out number the ones in the path
In today's Prioritized Bugs meeting, we agreed to reject this as a Prioritized Bug as it requires large-scope changes to the distribution. mattdm will coordinate or delegate an effort to develop an F38 Change proposal to address the issues: https://meetbot.fedoraproject.org/fedora-meeting-1/2022-08-24/fedora_prioritized_bugs_and_issues.2022-08-24-14.00.log.html#l-93
>> That's been discussed over and over - having optional filelists doesn't help because there are important packages >> that use `Requires: /usr/bin/foo` and so the filelist will very commonly be needed anyways. > The way the original yum dealt with that was by having a subset of bin/sbin and other key ones where most of > the requires were in a separate smaller cache that way it's not loading the 100s of libs/include/docs/etc files that > far out number the ones in the path Yeah. Separating out the non-primary-filepath-data (i.e. filepaths except for the small primary subset) would be great. Our packaging guidelines actually still reflect this [1]: only dependencies on /etc, /usr/bin, /usr/sbin are allowed. Rpmlint warns about non-conforming dependencies, so the majority of packages conform. The actual implementation is different, and for various historical reasons the check is done intentionally sloppily [2]. If we do this, we might also significantly reduce the initial 80-mb download on every update: IIRC, apparently 60MB of this is non-essential-filepath-data. This is a major source of pain on slow and metered networks. [1] https://docs.fedoraproject.org/en-US/packaging-guidelines/#_file_and_directory_dependencies [2] https://github.com/rpm-software-management/createrepo_c/blob/master/src/misc.c#L179
As I also mentioned on the fedora-devel list, having ZRAM enabled would really help to counter this. Linux' memory-management subsystem is known to pretty much always need at least some swap to work correctly. So for any installs where no swap is created we really should enable ZRAM by default (as we currently do for Workstation installs). I believe that enabling ZRAM will be an effective workaround for this problem.
(In reply to Hans de Goede from comment #32) > As I also mentioned on the fedora-devel list, having ZRAM enabled would > really help to counter this. > > Linux' memory-management subsystem is known to pretty much always need at > least some swap to work correctly. So for any installs where no swap is > created we really should enable ZRAM by default (as we currently do for > Workstation installs). I believe that enabling ZRAM will be an effective > workaround for this problem. We do this already on arm since F-29: https://fedoraproject.org/wiki/Changes/ZRAMforARMimages And it's been used elsewhere since F-33: https://fedoraproject.org/wiki/Changes/SwapOnZRAM
And it doesn't help much in this case. I tried enabling swap on ZRAM in the vagrant machine and it wasn't enough.
Discussed during the 2022-08-29 blocker review meeting: [0] The decision to classify this bug as a "RejectedBlocker (Beta)" was made on the grounds it already affects F36 so blocking F37 on it doesn't achieve much. We also note no simple fix has been identified; fixing this may require a significant overhaul of DNF (which is already in the works as DNF 5). We note that it may be desirable to require the system requirements doc to be updated and possibly to include microdnf in installs likely to be used on. [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2022-08-29/f37-blocker-review.2022-08-29-16.01.txt
Looks like even with 3GB of swap present the recent Fedora 35 and 36 repos are getting a bit too large to work with machines that have just 1GB of RAM (I have constant dnf install commands killed by OOM). When on a machine with more RAM I run: dnf clean all /usr/bin/time dnf update --refresh On Fedora 35 I get: Beaker Client - Fedora35 41 kB/s | 7.6 kB 00:00 Beaker harness 510 kB/s | 64 kB 00:00 Fedora 35 - x86_64 2.6 MB/s | 79 MB 00:29 Fedora 35 openh264 (From Cisco) - x86_64 3.8 kB/s | 2.5 kB 00:00 Fedora Modular 35 - x86_64 3.0 MB/s | 3.3 MB 00:01 Fedora 35 - x86_64 - Updates 14 MB/s | 34 MB 00:02 Fedora Modular 35 - x86_64 - Updates 5.1 MB/s | 3.9 MB 00:00 Dependencies resolved. Nothing to do. Complete! 39.73user 3.11system 1:13.63elapsed 58%CPU (0avgtext+0avgdata 862884maxresident)k 4984inputs+648248outputs (8major+469876minor)pagefaults 0swaps On Fedora 36 I get: Beaker Client - Fedora36 80 kB/s | 7.2 kB 00:00 Beaker harness 513 kB/s | 63 kB 00:00 Fedora 36 - x86_64 32 MB/s | 81 MB 00:02 Fedora 36 openh264 (From Cisco) - x86_64 3.3 kB/s | 2.5 kB 00:00 Fedora Modular 36 - x86_64 2.2 MB/s | 2.4 MB 00:01 Fedora 36 - x86_64 - Updates 17 MB/s | 30 MB 00:01 Fedora Modular 36 - x86_64 - Updates 5.6 MB/s | 2.9 MB 00:00 Dependencies resolved. Nothing to do. Complete! 32.00user 1.51system 0:37.43elapsed 89%CPU (0avgtext+0avgdata 946888maxresident)k 0inputs+625576outputs (0major+488552minor)pagefaults 0swaps While on Fedora 37 I get: Beaker Client - Fedora37 77 kB/s | 6.9 kB 00:00 Beaker harness 508 kB/s | 63 kB 00:00 Fedora 37 - x86_64 22 MB/s | 64 MB 00:02 Fedora 37 openh264 (From Cisco) - x86_64 4.3 kB/s | 2.5 kB 00:00 Fedora Modular 37 - x86_64 3.8 MB/s | 3.0 MB 00:00 Fedora 37 - x86_64 - Updates 10 MB/s | 16 MB 00:01 Fedora Modular 37 - x86_64 - Updates 1.8 MB/s | 911 kB 00:00 Dependencies resolved. Nothing to do. Complete! 33.68user 2.32system 0:39.92elapsed 90%CPU (0avgtext+0avgdata 472512maxresident)k 96inputs+493304outputs (0major+337451minor)pagefaults 0swaps
What worked for me, to fix DNF running out of memory on my EC2 instance, was the advice from https://ask.fedoraproject.org/t/prune-dnf-history-database/14633, which was to remove (or, in my case, rename, in case I later want what's in them) the files /var/lib/dnf/history.sqlite /var/lib/dnf/history.sqlite-shm and /var/lib/dnf/history.sqlite-wal .
i prune my /var/lib/dnf/* regulary for years - the only thing to avoid this is currently have your own caching-repos with only the subset of packages used on your machines - from the moment on a turn up official repos on machines with low RAM by intention DNF stops to work - thats ridicolous given that 512 MB should be enough for most production loads on dedicated servers
I'm affected by this issue on Raspberry Pi Fedora 34 to upgrade to Fedora 35 or 36. Even following @h.reindl suggestion and setting zram to factor 0.9 and lz4 compressor, dnf crashes everything - including ssh shell in the middle of package update: [root@srvad02 ~]# dnf system-upgrade download --refresh --releasever=36 --allowerasing Before you continue ensure that your system is fully upgraded by running "dnf --refresh upgrade". Do you want to continue [y/N]: y Fedora 36 - aarch64 9.5 kB/s | 36 kB 00:03 Fedora 36 openh264 (From Cisco) - aarch64 6.5 kB/s | 990 B 00:00 Fedora Modular 36 - aarch64 208 kB/s | 36 kB 00:00 Fedora 36 - aarch64 - Updates 175 kB/s | 31 kB 00:00 Connection to 192.168.88.4 closed by remote host. Connection to 192.168.88.4 closed. I see no way to upgrade this fedora instalation using system-upgrade.
i didn't say anything about zram beause it's nonsense - the repodata can't be swapped out and zram is nothing else then swap you need to disbale as much repos as you can (modular as example) and terminate every process not needed for your ssh-session and a running internet connection
It may be possible to workaround it by setting the kernel memory overcommit behaviour to 1, as root: echo 1 > /proc/sys/vm/overcommit_memory And having enough free swap available (likely few GB) See https://www.kernel.org/doc/Documentation/vm/overcommit-accounting. I haven't tried it as it was easier for my use case to increase the memory of individual VMs.
For what it's worth, my previous workaround only lasted for a day or two (on a VM with 1GB of RAM and 1GB of swap), then it started happening again. I narrowed the problem down (using the --disablerepo) option to the "updates" repository (for Fedora 36). I decided to just upgrade the VM to Fedora 37, since I'd been meaning to do so anyway, and that seems to have fixed the problem. I think there may be something strange about the Fedora 36 updates repo that's triggering this.
There's nothing strange about Fedora 36, it simply has half a year of extra packages in the updates repo. It will start happening in Fedora 37 when it gets similarly large updates repo. See my Comment #36: half the size of the Updates repo, half the size of Max resident.
Hi, I started having this issue with a DigitalOcean droplet with 1G RAM but also a 1G swap on Fedora 36. I never had this problem before even when my droplet had only 512MB RAM and I was using previous versions of Fedora. Below is the output of few commands with OOM killer always killing dnf and my SSH connection. $ free total used free shared buff/cache available Mem: 986388 140712 585748 3084 259928 702376 Swap: 986108 0 986108 # dnf -v update Loaded plugins: builddep, changelog, config-manager, copr, debug, debuginfo-install, download, generate_completion_cache, groups-manager, needs-restarting, playground, repoclosure, repodiff, repograph, repomanage, reposync DNF version: 4.14.0 cachedir: /var/cache/dnf User-Agent: constructed: 'libdnf (Fedora Linux 36; cloud; Linux.x86_64)' repo: using cache for: fedora fedora: using metadata from Wed 04 May 2022 09:16:11 PM UTC. repo: using cache for: fedora-cisco-openh264 fedora-cisco-openh264: using metadata from Thu 06 Oct 2022 11:02:51 AM UTC. repo: using cache for: fedora-modular fedora-modular: using metadata from Wed 04 May 2022 09:12:01 PM UTC. repo: using cache for: updates Connection to <server> closed by remote host. Connection to <server> closed. From dmesg: [ 412.552818] Out of memory: Killed process 1064 (dnf) total-vm:1007084kB, anon-rss:24676kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:2028kB oom_score_adj:0 Any ideas on how to solve this so I can update my droplet? Thanks. -- Ulisses
(In reply to ulissesf from comment #44) > Hi, > > I started having this issue with a DigitalOcean droplet with 1G RAM but also > a 1G swap on Fedora 36. I never had this problem before even when my droplet > had only 512MB RAM and I was using previous versions of Fedora. Below is the > output of few commands with OOM killer always killing dnf and my SSH > connection. > > $ free > total used free shared buff/cache > available > Mem: 986388 140712 585748 3084 259928 > 702376 > Swap: 986108 0 986108 > > # dnf -v update > Loaded plugins: builddep, changelog, config-manager, copr, debug, > debuginfo-install, download, generate_completion_cache, groups-manager, > needs-restarting, playground, repoclosure, repodiff, repograph, repomanage, > reposync > DNF version: 4.14.0 > cachedir: /var/cache/dnf > User-Agent: constructed: 'libdnf (Fedora Linux 36; cloud; Linux.x86_64)' > repo: using cache for: fedora > fedora: using metadata from Wed 04 May 2022 09:16:11 PM UTC. > repo: using cache for: fedora-cisco-openh264 > fedora-cisco-openh264: using metadata from Thu 06 Oct 2022 11:02:51 AM UTC. > repo: using cache for: fedora-modular > fedora-modular: using metadata from Wed 04 May 2022 09:12:01 PM UTC. > repo: using cache for: updates > Connection to <server> closed by remote host. > Connection to <server> closed. > > From dmesg: > [ 412.552818] Out of memory: Killed process 1064 (dnf) total-vm:1007084kB, > anon-rss:24676kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:2028kB > oom_score_adj:0 > > Any ideas on how to solve this so I can update my droplet? Thanks. > > -- Ulisses Ulisses, The one suggestion I saw above that worked for the one machine I manage that is RAM-constrained was to use "microdnf". To get that installed, I created several GB of swap space and only installed that tool. Once installed, I removed the superfluous swap space and used that tool to update other packages. Paul
You should also be able to install individual packages by calling rpm directly
> Any ideas on how to solve this so I can update my droplet? Thanks. > Could you run dnf makecache and then your command? It should improve dnf memory consumption. See my comment#19 Also using microdnf instead of dnf should help. It might be tricky to get it installed due to your memory limitation. In that case try disabling all but the main repo: dnf --repoid=fedora makecache && dnf --repoid=fedora install microdnf
Paul, Hubert and Daniel, thanks for the suggestions. I installed microdnf by installing libpeas from fedora repo with dnf (disabling all others repos but fedora one) and then installing the microdnf rpm directly since dnf couldn't find microdnf in the fedora repo. microdnf was then able to update my system without any OOM events. Thanks! Now this seems to be a big regression with dnf on Fedora. I used to have only 512MB of RAM and everything worked. Now I have 1GB RAM and 1GB swap and it gets out of memory. I didn't change anything in terms of what I do with this system, I only updated it to F36 and was trying to keep it up to date.
512 MB RAM is enough for dozens of workload like nameservers, voip-machines, routers, firewalls and works pretty well as long as the bloatet fedora repo metadata are avoided by local caches in the network - for the sake of god: is it asked too much that dnf developers run a virtual machine with 512 MB RAM and try if their software survives a "dnf upgrade" there?
I have been running a LAMP stack on AWS t3.nano instances with only 0.5 GiB of memory since 2018. I have Cloud Init files for Amazon Linux 2 and Ubuntu. Both of these are able to install a large number of packages during the initialization with the "packages:" section that takes place early in the initialization sequence before the instance has a swap file. Amazon Linux 2 uses yum and and Ubuntu apt for package management. Any packages specified in the "packages:" section for Amazon Linux 2023 causes the Cloud Init to hang when on a t3.nano instance. The issue does not appear when a t2.micro with 1 GiB of memory is launched. I was able to get around this on a t3.micro instance by using "dnf -y install" in the "runcmd:" section with the install broken up into smaller groups of packages after a 768M swap file was created and enabled. Although I was able to get around the effect of this bug the initialization on the Amazon Linux 2023 t3.nano instance takes about 10 minutes which is twice as long as for either the Ubuntu or Amazon Linux 2 instances. This is without etckeeper, glances and a few other things that are not yet available for Amazon Linux 2023 which doesn't provide EPEL support and has only been available for a month or so. I am surprised that the dnf developers are so O.K. with their package manager being so inferior to yum or get in this type of case. This is my first experience with dnf and it is disappointing that this is a known bug that has been around for years with no apparent path for a fix.
dnf is the successor to yum and works very similarly to it; I don't know for sure but I'd expect yum actually theoretically has the same problem. I'd guess the difference is more likely just that there's more packages in al2023 and therefore more metadata; the problem is caused by yum/dnf trying to parse all the metadata. I do wonder if AL could switch to microdnf/dnf5, or at least include it and provide some kind of option to use it at deployment time for small instance cases like this? David, have you looked into it? See the rest of this bug for context.
this is a combination of differnt things like more and more metadata and dnf was alyways using more memory than yum until now dnf still isn't on pair with yum when it comes to simple things like output the problem of depsolv https://bugzilla.redhat.com/show_bug.cgi?id=2186544 is a perfect example where yum 15 years ago simply would have said where the problem is
Any packages specified in the "packages:" section of a Cloud Init specification for Amazon Linux 2023 causes the Cloud Init to fail on a t3.nano instance. This does not happen for Amazon Linux 2 on a t3.nano AWS instance. Amazon Linux 2 uses yum. This does not happen for Ubuntu 24.04 on a t3.nano AWS instance. Ubuntu 24.04 uses apt. This does not happen on a t2.micro instance. A t2.micro has 1 GiB of memory and a t3.nano has 0.5 GiB of memory. I posted a comment on this bug earlier but this comment has some more detail. boot message for oom killing dnf launched by Cloud Init on t3.nano AWS instance. $ sudo dmesg | tail -2 [ 54.149669] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/cloud-final.service,task=dnf,pid=1528,uid=0 [ 54.153706] Out of memory: Killed process 1528 (dnf) total-vm:512092kB, anon-rss:80396kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:328kB oom_score_adj:0 [19:55:02 Sunday 04/16/2023] ec2-user@ip-172-31-24-125 ~ The Cloud Init process terminates when dnf is killed on a t3.nano AWS instance. $ sudo tail -3 /var/log/cloud-init-output.log Cloud-init v. 22.2.2 running 'modules:config' at Mon, 17 Apr 2023 03:24:18 +0000. Up 10.79 seconds. Cloud-init v. 22.2.2 running 'modules:final' at Mon, 17 Apr 2023 03:24:19 +0000. Up 11.99 seconds. Amazon Linux 2023 repository 18 MB/s | 12 MB 00:00 [19:57:12 Sunday 04/16/2023] ec2-user@ip-172-31-24-125 ~ A single package specified in the packages section causes this failure on a t3.nano AWS instance. $ sudo head -16 /var/lib/cloud/instances/i-0fc2abba3e8539af5/user-data.txt | tail -4 packages: - mlocate [19:59:34 Sunday 04/16/2023] ec2-user@ip-172-31-24-125 2 files, 60K ~ Info on the version of dns and another error message on a t3.nano AWS instance. $ dnf info dnf Amazon Linux 2023 Kernel Livepatch repository 811 kB/s | 155 kB 00:00 Installed Packages History database cannot be created, using in-memory database instead: SQLite error on "/var/lib/dnf/history.sqlite": Open failed: unable to open database file Name : dnf Version : 4.12.0 Release : 2.amzn2023.0.4 Architecture : noarch Size : 2.2 M Source : dnf-4.12.0-2.amzn2023.0.4.src.rpm Repository : @System Summary : Package manager URL : https://github.com/rpm-software-management/dnf License : GPLv2+ Description : Utility that allows users to manage packages on their systems. : It supports RPMs, modules and comps groups & environments. [20:04:17 Sunday 04/16/2023] ec2-user@ip-172-31-24-125 ~
I'm effected by this with my raspberry pi. [root@localhost systemd]# uname -a Linux localhost.localdomain 6.0.12-100.fc35.armv7hl #1 SMP Thu Dec 8 19:53:54 UTC 2022 armv7l armv7l armv7l GNU/Linux It would be nice to upgrade to 38, but I was unable to do so. 36 is the only version I can upgrade to and during the upgrade process my session runs out of memory.
> It would be nice to upgrade to 38, but I was unable to do so. 36 is the > only version I can upgrade to and during the upgrade process my session runs > out of memory. microdnf works OK for this use case. You can do: "microdnf --releasever=38 distro-sync"
(In reply to Peter Robinson from comment #55) > > It would be nice to upgrade to 38, but I was unable to do so. 36 is the > > only version I can upgrade to and during the upgrade process my session runs > > out of memory. > > microdnf works OK for this use case. You can do: "microdnf --releasever=38 > distro-sync" Except it won't work to upgrade to F38, because Joe is on armv7hl which is not supported in F37+.
> Linux localhost.localdomain 6.0.12-100.fc35.armv7hl #1 SMP Thu Dec 8 > 19:53:54 UTC 2022 armv7l armv7l armv7l GNU/Linux > > It would be nice to upgrade to 38, but I was unable to do so. 36 is the > only version I can upgrade to and during the upgrade process my session runs > out of memory. We retired armv7 as of F-36 so you can't go to anything newer than F-36 (https://fedoraproject.org/wiki/Changes/RetireARMv7). To you to F-38 you will need to reinstall with an aarch64 image. To go to F-36 you can use microdnf : "microdnf --releasever=36 distro-sync"
Why can't `dnf update` do the equivalent of (dnf makecache && dnf update) itself?
> Why can't `dnf update` do the equivalent of (dnf makecache && dnf update) itself? what makes you think it doesn't and what has this to do with the unnacepatble memory usage for years?
Per comment #47, it's a workaround that resolved the issue for me.
For reference there's a similar issue in dnf5 which has now replaced microdnf which was supposed to fix this issue. I've filed the following against it: https://bugzilla.redhat.com/show_bug.cgi?id=2214520
So I read here or there about the need to "read the metadata and convert it to libsolv in RAM"... there's actually a good way to make this resilient to low memory conditions without have swap globally enabled, which is to replace malloc() or equivalent with mmap'ing a large file. This in effect means it will swap within that region. Of course it doesn't help if the file is in `/tmp` since that's usually tmpfs (in RAM) these days... I haven't dived into libdnf etc.. but same with opening large files in general, mmap'ing them rather than malloc() will make it much more resilient to low memory conditions in absence of swap. On most cloud systems, disks are networked, so swap will suck (can be alleviated with a zswap front-end but in overcommited low-memory systems I wonder how that really behaves).
Just reposting here, because it's surprising how many people don't understand: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/FMM4SKQ4TERRVPSWFK2UUKRTBW4KBLHM/ Everyone needs to internalize: This has nothing to do with DNF (or microdnf, etc), really. It's about the *size of the repository metadata*. Every single time someone adds a new package into the single "Fedora" rpm-md repo, that makes clients going OOM a bit more likely. Every single time there's a new package, it makes it more likely that a client will hit a timeout downloading the repodata and fail to get that critical kernel security update. Even doing the obvious thing and splitting off e.g. a `fedora-devel` rpm-md repo (as is done kind of done in RHEL with CRB) that had all the -devel and Go/Rust etc. packages would likely create a huge amount of savings. Ultimately, "package all the world's software into a single giant repository and dynamically solve dependencies across all of it" will *never* scale (at least, not to smaller machines). Shrinking the metadata (mainly, not loading all the file lists) would gain a lot of headroom. But, the endless growth of software in the world versus the shrinking of general purpose computers and lower bandwidth connections will always have a high tension, and those small/low bandwidth systems just aren't well suited to packages - they're best suited to receiving pre-built images (which may contain packages, or might not).
Sorry Colin, but it's really a DNF problem on how it deals with the metadata and memory usage. Since the metadata is mostly read-only, the disk format should be laid in a way that allows scanning without bringing everything into RAM, be with indexes, pages of indexes, etc. We can't control how many packages will exist, but we can control the memory consumption of code and not assume infinite memory is available.
> It's about the *size of the repository metadata* no - it's a design problem when you load the entire metadata of "all the world's software" completly into the RAM you where able to store gigabytes of data into mysql-myisam and operate on a machine with 192 MB RAM and 466 Mhz (running the OS, a desktop and a ton of applications besdies the database server) 20 years ago with smart indexes and in 2023 you tell us you need 1 GB RAM for something trivial like package metadata you don't need the whole metadata in memory all the time unless there is a giant design problem
I also don't understand why createrepo can't generate the libsolv DBs and then dnf just consumes then rather than generating them on every single computer from the xml file. That would save a lot of local CPU cycles by doing it just once.
wouldn't that be because of inter-repository dependencies? the depsolving will differ depending on which repos you have enabled, right?
(In reply to Adam Williamson from comment #67) > wouldn't that be because of inter-repository dependencies? the depsolving > will differ depending on which repos you have enabled, right? I believe it generates a libsolv db per repo.
Each repository has it's own *.solv file (or files as comps, files, and modules are stored separately), so inter-repository deps should not be problem. What comes to my mind as potential problems is: - endianity - as *.solv is binary format directly copied to memory and can be used on different architectures - versioning - *.solv format changes in time and there is no guarantee that every libsolv version can handle every *.solv file version.
So even keeping the generation of .solv local, we can probably improve things tremendously by making use of mmap. mmap basically gives you swap in an environment where swap isn't enabled basically. I haven't had a chance yet to dive deep enough into dnf/libsolv to really pinpoint the steps that are killing us on small system sizes (I work for AWS so I'm concerned about things like "nano" instances with 0.5GB of RAM, I'm mostly concerned about Amazon Linux but if we fix this we'll try to fix it upstream), however I did notice that libsolv has its allocator abstraction. So one experiement I might do is replace the allocators in there with some simple off-the-shelf malloc implementation that operate on a big mmaped file in /var/tmp and see where that takes us. That said, it would be nice if the solv file was just mmap'ed for reading.... maybe things like strings etc can then point to the mmap'ed file rather than be copied, I don't know. It seems like libsolv would need quite a lot of work to operate directly on an mmap'ed file for all read functionality rather than its in-memory representation, but I have only given it a cursory glance at this point.
Fedora 40 Beta now includes this change: https://fedoraproject.org/wiki/Changes/DNFConditionalFilelists Can anyone confirm whether it resolves this problem on machines with low amount of memory?