When running "microdnf upgrade" on F-38 or F-39 the process crashes. dnf5 was supposed to fix the issues experienced in dn4 (Bug 1907030) but it appears dnf5 has similar issues but we've now replaced microdnf which worked. Reproducible: Always Steps to Reproduce: 1. microdnf upgrade 2. 3. Actual Results: crash Expected Results: updated packages
# microdnf upgrade Updating and loading repositories: Fedora 38 - aarch64 - Updates 100% | 23.3 KiB/s | 11.7 KiB | 00m01s Fedora 38 - aarch64 - Test Updates 100% | 25.3 KiB/s | 12.3 KiB | 00m00s Fedora 38 - aarch64 - Updates 100% | 933.2 KiB/s | 3.1 MiB | 00m03s Fedora 38 - aarch64 - Test Updates 100% | 809.9 KiB/s | 3.0 MiB | 00m04s Fedora 38 - aarch64 100% [==================] | 10.2 KiB/s | 13.2 KiB | -00m00s Killed
Happens on both F-38/F-39, current version is dnf5-5.0.13-2.fc38
Proposed as a Blocker for 39-beta by Fedora user pbrobinson using the blocker tracking app because: This makes extremely popular devices like the Raspberry Pi Zero2W unusable. This is covered under section 2.6.3 "Installing, removing and updating software" of the "Basic Release Criteria"
Could you please share with us a kernel log (from dmesg, or systemd journal) where the out-of-memory killing message is visible? It could shed a light on how much memory was actually available to DNF.
Created attachment 1971568 [details] microdnf/dnf5 crashing on RPi Zero2W dmesg output
If I read your dmesg correctly, then a memory available to the operating system on boot was 329 MB. Then you set up swapping into zram (419 MB). However, because zram is backed by RAM, the effective available memory is smaller than 329 + 419 MB sum (a rough guess with 1:2 compression ratio is 539 MB of swapable memory). Then at the time of crash microdnf process requested 690 MB, but only accessed 9 MB. What surprise me are ridiculously small numbers in RSS column. Maybe I don't interpret the log right. E.g. on my x86_64 system sshd occupies 10 MB RSS, while your has 0.3 MB. Does Fedora on aarch64 use 4KB pages? When I try "dnf5 upgrade --assumeno" on a fully upgraded x86_64 F38 with 1346 installed packages using dnf5-5.0.14-20230619010028.8.g1c005e32.fc38.x86_64, RSS peeks 148 MB and virtual memory 158 MB. That's what top and time reports. However, using ulimit I observe allocation failures at 178 MB for virtual memory. ulimit -m seems having no effect. How many packages do you have installed? Could you post a list of them (rpm -qa)? Can you work around it by upgrading a few package a time (e.g. "dnf5 upgrade $(rpm -qa | grep glibc)" etc.
I limited operating system to 512 MB (mem=512M memmap=512M kernel paramaters), disabled swap, I left 335 MB free, "dnf5 upgrade --assumeno" was killed by OOM at 324 MB RSS and 597 MB VMM. It died on when downloading the one but not last repository. I then boot with unlimited memory, rerun the upgrade, it downloaded all repository metadata, reported nothing to do. Then I again booted with 512MB limit and this time the command was not killed. I conclude the dependency solver is not a bottleneck here. The problem is probably somewhere between downloading a YUM metadata and converting them into solver cache files. The output looks like this: # dnf5 makecache --assumeno Updating and loading repositories: Fedora 38 - x86_64 - Updates 100% | 10.8 MiB/s | 6.4 MiB | 00m01s Fedora Modular 38 - x86_64 100% | 2.0 MiB/s | 1.2 MiB | 00m01s Fedora Modular 38 - x86_64 - Test Update 100% [==================] | 0.0 B/s | 85.3 KiB | -00m00s Killed or: # dnf5 makecache --assumeno Updating and loading repositories: Fedora 38 - x86_64 100% [==================] | 0.0 B/s | 18.6 KiB | -00m00s Killed My full repolist: # dnf5 repolist --enabled repo id repo name copr:copr.fedorainfracloud.org:rpmsoftwaremanagement:dnf5-unstable Copr repo for dnf5-unstable owned by rpmsoftwareman fedora Fedora 38 - x86_64 fedora-modular Fedora Modular 38 - x86_64 systemd systemd updates Fedora 38 - x86_64 - Updates updates-modular Fedora Modular 38 - x86_64 - Updates updates-testing Fedora 38 - x86_64 - Test Updates updates-testing-modular Fedora Modular 38 - x86_64 - Test Updates
> Does Fedora on aarch64 use 4KB pages? Yes. > How many packages do you have installed? Could you post a list of them (rpm > -qa)? It's the fedora arm minimal image, which is around 500 packages (I don't have access to the system right now). > Can you work around it by upgrading a few package a time (e.g. "dnf5 upgrade > $(rpm -qa | grep glibc)" etc. No, as per comment #1 it crashes processing the metadata before it gets to actual package updates.
> I conclude the dependency solver is not a bottleneck here. The problem is > probably somewhere between downloading a YUM metadata and converting them > into solver cache files. The output looks like this: Is the solver the same between dnf4/dnf5/microdnf? Purely looking at when it's dying in the visual process I suspect that makes sense but it's probably good to have someone that knows the code/process to verify.
DNF5 uses the same solver like dnf and microdnf (libsolv). It also uses the same library for download repositories and packages - librepo.
I reduced the reproducer to: # dnf5 makecache --disablerepo=\* --enablerepo=updates Updating and loading repositories: Killed /var/log/dnf5.log ends with: 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] https://creeperhost.mm.fcix.net/fedora/linux/updates/38/Everything/x86_64/repodata/repomd.xml 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] rsync://creeperhost.mm.fcix.net/fedora-enchilada/linux/updates/38/Everything/x86_64/repodata/repomd.xml 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_handle_prepare_metalink: Metalink parsed 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_handle_prepare_internal_mirrorlist: Finalizing internal mirrorlist 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_handle_prepare_internal_mirrorlist: Finalizing mirrors reported via LRI_MIRRORS 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_handle_perform: Downloading/Locating yum repo 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_yum_use_local: Locating repo.. 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_yum_use_local_load_base: Found local metalink: /var/cache/libdnf5/updates-b7ba662710b98f1a/metalink.xml 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_yum_use_local_load_base: Parsing repomd.xml 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_yum_use_local_load_base: Repomd revision: 1687220659 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] Found primary_zck so using instead of primary 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] Found updateinfo_zck so using instead of updateinfo 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_yum_use_local: Repository was successfully located 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_yum_check_checksum_of_md_record: Checking checksum of /var/cache/libdnf5/updates-b7ba662710b98f1a/repodata/ea2ed03b63189a34ac792e4f78ea95a7d82f3a41db960c7b9d3fac2e01b0015d-primary.xml.zck (expected: 9e7f621b197465d692755bdcbcf7abfce850afa316486927789cc2d96bfe5109 [sha256]) 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_yum_check_checksum_of_md_record: Checksum check - Passed 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_yum_check_checksum_of_md_record: Checking checksum of /var/cache/libdnf5/updates-b7ba662710b98f1a/repodata/9495d10433b9fdaacce9014b0276773a153b315f7f53fae2d0b8795c7c862921-comps-Everything.x86_64.xml.gz (expected: 9495d10433b9fdaacce9014b0276773a153b315f7f53fae2d0b8795c7c862921 [sha256]) 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_checksum_fd_compare: Using mtime cached in xattr: [user.Librepo.checksum.mtime] 1687258243181000000 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_checksum_fd_compare: Using checksum cached in xattr: [user.Librepo.checksum.sha256] 9495d10433b9fdaacce9014b0276773a153b315f7f53fae2d0b8795c7c862921 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_yum_check_checksum_of_md_record: Checksum check - Passed 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_yum_check_checksum_of_md_record: Checking checksum of /var/cache/libdnf5/updates-b7ba662710b98f1a/repodata/defb8d572e55661bf6cc04050182b20e01e630c616176cf0e6a65a2c9f884503-updateinfo.xml.zck (expected: 51481fe35e3882326bc4d12908f921d6bdda4632343906a4b71ee3035c143219 [sha256]) 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_yum_check_checksum_of_md_record: Checksum check - Passed 2023-06-20T13:05:46+0200 [1009] DEBUG [librepo] lr_handle_perform: Restoring an old SIGINT handler 2023-06-20T13:05:46+0200 [1009] DEBUG Using cache for repo "updates" 2023-06-20T13:05:46+0200 [1009] DEBUG Loading solv cache file: "/var/cache/libdnf5/updates-b7ba662710b98f1a/solv/updates.solv" 2023-06-20T13:05:46+0200 [1009] TRACE Cache file "/var/cache/libdnf5/updates-b7ba662710b98f1a/solv/updates-updateinfo.solvx" not found 2023-06-20T13:05:46+0200 [1009] DEBUG Loading updateinfo extension for repo "updates" from "/var/cache/libdnf5/updates-b7ba662710b98f1a/repodata/defb8d572e55661bf6cc04050182b20e01e630c616176cf0e6a65a2c9f884503-updateinfo.xml.zck"
With 1 GB, I got an out-of-memory error and a coredump. But the coredump was unsuable as gdb complained that it was unable to load some list and a backtrace was empty. When I increased memory to 2 GB, the command passed and log continued: 2023-06-20T13:53:44+0200 [1194] TRACE Writing updateinfo extension cache for repo "updates" to "/var/cache/libdnf5/updates-b7ba662710b98f1a/solv/updates-updateinfo.solvx.PBnJHT" 2023-06-20T13:53:44+0200 [1194] TRACE Cache file "/var/cache/libdnf5/updates-b7ba662710b98f1a/solv/updates-group.solvx" not found 2023-06-20T13:53:44+0200 [1194] DEBUG Loading group extension for repo "updates" from "/var/cache/libdnf5/updates-b7ba662710b98f1a/repodata/9495d10433b9fdaacce9014b0276773a153b315f7f53fae2d0b8795c7c862921-comps-Everything.x86_64.xml.gz" 2023-06-20T13:53:44+0200 [1194] TRACE Writing group extension cache for repo "updates" to "/var/cache/libdnf5/updates-b7ba662710b98f1a/solv/updates-group.solvx.a1y5se" 2023-06-20T13:53:44+0200 [1194] DEBUG No modules metadata available for repo updates 2023-06-20T13:53:44+0200 [1194] INFO DNF5 end Comparing the messages in the source code, it dies in SolvRepo::load_repo_ext() between repo_add_updateinfoxml() and write_ext() calls.
With 1500 MB, the best backtrace I could get was: __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0; (gdb) bt #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x00007ffff76b08b3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78 #2 0x00007ffff765fabe in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x00007ffff764887f in __GI_abort () at abort.c:79 #4 0x00007ffff7486d08 in solv_oom (num=num@entry=0, len=len@entry=48234496) at /usr/src/debug/libsolv-0.7.22-4.fc38.x86_64/src/util.c:31 #5 0x00007ffff7486df0 in solv_realloc (old=0x0, len=48234496) at /usr/src/debug/libsolv-0.7.22-4.fc38.x86_64/src/util.c:60 #6 0x00007ffff7482416 in solv_extend (block=63, size=4, nmemb=<optimized out>, len=<optimized out>, buf=<optimized out>) at /usr/src/debug/libsolv-0.7.22-4.fc38.x86_64/src/util.h:58 #7 repodata_add_array (data=data@entry=0x7ffff00302a0, handle=-21762, keyname=keyname@entry=107, keytype=keytype@entry=53, entrysize=entrysize@entry=1) at /usr/src/debug/libsolv-0.7.22-4.fc38.x86_64/src/repodata.c:2618 #8 0x00007ffff7482b11 in repodata_add_flexarray (data=0x7ffff00302a0, solvid=<optimized out>, keyname=keyname@entry=107, handle=-23262) at /usr/src/debug/libsolv-0.7.22-4.fc38.x86_64/src/repodata.c:2979 #9 0x00007ffff7421c7b in endElement (xmlp=<optimized out>, state=<optimized out>, content=<optimized out>) at /usr/src/debug/libsolv-0.7.22-4.fc38.x86_64/ext/repo_updateinfoxml.c:445 #10 0x00007ffff742bea1 in end_element (name=<optimized out>, userData=0x7ffff5b916c0) at /usr/src/debug/libsolv-0.7.22-4.fc38.x86_64/ext/solv_xmlparser.c:167 #11 end_element (userData=0x7ffff5b916c0, name=<optimized out>) at /usr/src/debug/libsolv-0.7.22-4.fc38.x86_64/ext/solv_xmlparser.c:150 #12 0x00007ffff72ea029 in xmlParseEndTag1.constprop.0 (ctxt=0x7ffff0032a50, line=0) at /usr/src/debug/libxml2-2.10.4-1.fc38.x86_64/parser.c:8717 #13 0x00007ffff72ef1d4 in xmlParseTryOrFinish.isra.0 (ctxt=0x7ffff0032a50, terminate=0) at /usr/src/debug/libxml2-2.10.4-1.fc38.x86_64/parser.c:11647 #14 0x00007ffff7224b7a in xmlParseChunk (ctxt=0x7ffff0032a50, chunk=chunk@entry=0x7ffff5b8f620 "4.1-1.fc38.s390x.rpm</filename>\n <reboot_suggested>True</reboot_suggested>\n </package>\n <package name=\"kdevelop-debugsource\" version=\"23.04.1\" release=\"1.fc38\" epoch=\"9\" arch=\"s"..., size=<optimized out>, size@entry=8192, terminate=terminate@entry=0) at /usr/src/debug/libxml2-2.10.4-1.fc38.x86_64/parser.c:12351 #15 0x00007ffff7430bcf in parse_block (l=8192, buf=0x7ffff5b8f620 "4.1-1.fc38.s390x.rpm</filename>\n <reboot_suggested>True</reboot_suggested>\n </package>\n <package name=\"kdevelop-debugsource\" version=\"23.04.1\" release=\"1.fc38\" epoch=\"9\" arch=\"s"..., xmlp=0x7ffff5b916c0) at /usr/src/debug/libsolv-0.7.22-4.fc38.x86_64/ext/solv_xmlparser.c:280 #16 solv_xmlparser_parse ( ../../gdb/utils.c:717: internal-error: virtual memory exhausted: can't allocate 4064 bytes. A problem internal to GDB has been detected, Be ware that it's the last allocation. The memory can be ineffectively used before. By the way, the out of memory alloacation error looks like this: # /usr/bin/dnf5 makecache --disablerepo=\* --enablerepo=updates Updating and loading repositories: Out of memory allocating 469762048 bytes! Aborted (core dumped) It attempts to allocate a chunk of 448 MB. Before starting the command, there were 1151 MB free.
I made a PR that should help: https://github.com/rpm-software-management/dnf5/pull/630 It would be very helpful if you could try it out. There are builds for x84_64 here: https://copr.fedorainfracloud.org/coprs/rpmsoftwaremanagement/CI-libdnf5-pr630/build/6098125/ We were unnecessarily loading the updateinfo metadata which surprisingly require quite a lot of memory. I will investigate further why it uses so much but when they are not needed my patch should help.
> There are builds for x84_64 here: > https://copr.fedorainfracloud.org/coprs/rpmsoftwaremanagement/CI-libdnf5- > pr630/build/6098125/ Could you turn on aarch64 for that copr (and any of the dnf CI pieces)? > We were unnecessarily loading the updateinfo metadata which surprisingly > require quite a lot of memory. I will investigate further why it uses so > much but when they are not needed my patch should help. Thanks for actually looking into the problem :)
I did separate builds here: https://copr.fedorainfracloud.org/coprs/amatej/dnf5/build/6098755/ this includes aarch64. In case you need newer librepo or libsolv they should be in https://copr.fedorainfracloud.org/coprs/rpmsoftwaremanagement/dnf-nightly/
I upgraded to dnf5-5.0.14-1.git.2429.a4a5664 (inc libdnf5/libdnf5-cli) only and tried and it ran to completion \o/ # microdnf upgrade Updating and loading repositories: Fedora 38 - aarch64 - Test Updates 100% | 19.0 KiB/s | 12.9 KiB | 00m01s Fedora 38 openh264 (From Cisco) - aarch64 100% | 1.8 KiB/s | 990.0 B | 00m01s Fedora 38 - aarch64 - Updates 100% | 20.9 KiB/s | 13.8 KiB | 00m01s Fedora 38 - aarch64 - Test Updates 100% | 107.2 KiB/s | 302.8 KiB | 00m03s Fedora 38 - aarch64 100% | 343.1 KiB/s | 4.4 MiB | 00m13s Fedora 38 - aarch64 - Updates 100% | 217.4 KiB/s | 857.0 KiB | 00m04s Repositories loaded. <snip> Transaction Summary: Installing: 5 packages Upgrading: 54 packages Replacing: 54 packages Removing: 4 packages Total size of inbound packages is 158 MiB. Need to download 158 MiB. After this operation 2 MiB will be used (install 739 MiB, remove 737 MiB). Is this ok [y/N]: y <snip> I will do some more testing over the next few days when I have a bit more time but first test looks a lot better.
The issue is related to updateinfo metadata processing. There is a workaround - `dnf5 upgrade --setopt=optional_metadata_types=,comps`.
(In reply to Jaroslav Mracek from comment #18) > The issue is related to updateinfo metadata processing. There is a > workaround - `dnf5 upgrade --setopt=optional_metadata_types=,comps`. I think that will not help because of the bug in checking whether updateinfo is needed. It is always added with runtime priority.
I am sorry, I tested it with makecache command and it work. It does not work with upgrade.
After installing dnf5-5.0.14-20230621050101.16.g01fa4c03.fc38.x86_64 both "dnf5 makecache" and "dnf5 upgrade" passes with 1500 MB RAM. I observe that now no *.solvx files are created in the cache. However with 512 MB limit of the operating system, 336 MB free before invoking dnf, "dnf makecache" reports it cannot resolve a host name because a thread for a resolver cannot be created and then it dies with an out-of-memory reporting a slightly above 1000 MB allocation failure. The resolver error comes from curl library. DNF probably starts a download thread and in parallel preallocates a memory for the to-be-downloaded data and there is no memory left for the asynchronous DNS resolver libcurl uses. All that happens when I remove all cached files. If I prepopulate the cache and remove only *.solv files, and try again with 512 MB limit, it does not attempt to download anything, but reports an out-of-memory error like: # /usr/bin/dnf5 makecache --disablerepo=\* --enablerepo=updates Updating and loading repositories: Out of memory allocating 188416 bytes! Aborted (core dumped) /var/log/dnf5.log ends: 2023-06-21T14:05:29+0200 [940] DEBUG [librepo] lr_yum_check_checksum_of_md_record: Checking checksum of /var/cache/libdnf5/updates-b7ba662710b98f1a/repodata/ea2ed03b63189a34ac792e4f78ea95a7d82f3a41db960c7b9d3c) 2023-06-21T14:05:29+0200 [940] DEBUG [librepo] lr_yum_check_checksum_of_md_record: Checksum check - Passed 2023-06-21T14:05:29+0200 [940] DEBUG [librepo] lr_handle_perform: Restoring an old SIGINT handler 2023-06-21T14:05:29+0200 [940] DEBUG Using cache for repo "updates" 2023-06-21T14:05:29+0200 [940] TRACE Cache file "/var/cache/libdnf5/updates-b7ba662710b98f1a/solv/updates.solv" not found 2023-06-21T14:05:29+0200 [940] DEBUG Loading repomd and primary for repo "updates" RSS peeks 33 MB. Here the numbers are really small. So for my x86_64 machine 512 MB is not enough.
Conditional download of updateinfo: https://github.com/rpm-software-management/dnf5/pull/630 It enables workaround - `dnf5 upgrade --setopt=optional_metadata_types=,comps` Solution: Ales Matej have created a patch to resolve the issue in libsolv - https://github.com/openSUSE/libsolv/pull/533 (Great job Ales).
FEDORA-2023-0abadf4915 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-0abadf4915
FEDORA-2023-0abadf4915 has been pushed to the Fedora 39 stable repository. If problem still persists, please make note of it in this bug report.