Fresh install of Fedora 40 running "dnf update" after installation gets stuck trying to regenerate the initramfs in the kernel-core packages postinstall scriptlet. The offending module seems to be drm. Output of "ps -ef --forest" during stuck execution of "dnf update" yields: root 16751 16673 0 22:54 pts/0 00:00:00 | | \_ sudo dnf update root 16775 16751 0 22:54 pts/1 00:00:00 | | \_ sudo dnf update root 16776 16775 5 22:54 pts/1 00:01:05 | | \_ /usr/bin/python3 /usr/bin/dnf update root 20860 16776 0 23:01 pts/1 00:00:00 | | \_ /bin/sh /var/tmp/rpm-tmp.QVwxlT 2 root 20862 20860 0 23:01 pts/1 00:00:00 | | \_ /bin/kernel-install add 6.8.7-300.fc40.x86_64 /lib/modules/6.8.7-300.fc40.x86_64/vmlinuz root 20863 20862 0 23:01 pts/1 00:00:00 | | \_ (sd-exec-strv) root 20903 20863 0 23:01 pts/1 00:00:00 | | \_ /usr/bin/bash /usr/lib/kernel/install.d/50-dracut.install add 6.8.7-300.fc40.x86_64 /boot/c2e65e4adf394e37814d3c0b07410524/6.8.7-300.fc40.x86_64 /lib/modules/6.8.7-300.fc40.x86_64/vmlinuz root 20906 20903 0 23:01 pts/1 00:00:02 | | \_ /usr/bin/bash -p /bin/dracut -f --kernel-image /lib/modules/6.8.7-300.fc40.x86_64/vmlinuz --kver 6.8.7-300.fc40.x86_64 /boot/initramfs-6.8.7-300.fc40.x86_64.img root 20932 20906 0 23:01 pts/1 00:00:00 | | \_ /bin/cat root 26019 20906 99 23:14 pts/1 00:00:00 | | \_ /usr/lib/dracut/dracut-install -D /var/tmp/dracut.KKmzVY/initramfs --kerneldir /lib/modules/6.8.7-300.fc40.x86_64/ -m --silent -s drm_crtc_init|drm_dev_register|drm_encoder_init -S iw_handler_get_spy platform:GHES Running "dracut -vvv -f --regenerate-all" gets stuck as well trying to include the drm module. Running "dracut -vvv -f --regenerate-all -o drm" to omit the drm module from initramfs executes without issue. Reproducible: Always Steps to Reproduce: 1. Install Fedora 40 Workstation from Live ISO 2. Run "dnf update" 3. Get's stuck trying to dracut-install subcommand Actual Results: Process get's stuck trying the execute "/usr/lib/dracut/dracut-install -D /var/tmp/dracut.KKmzVY/initramfs --kerneldir /lib/modules/6.8.7-300.fc40.x86_64/ -m --silent -s drm_crtc_init|drm_dev_register|drm_encoder_init -S iw_handler_get_spy platform:GHES" subcommand Expected Results: Initramfs should be successfully be build. Running "dracut -vvv -f --regenerate-all" gets stuck as well trying to include the drm module. Running "dracut -vvv -f --regenerate-all -o drm" to omit the drm module from initramfs executes without issue.
Used filesystem is BTRFS. Used graphics cards are Intel Arc A380 and Nvidia RTX 4080
So the initramfs generation does not hang indefinitely. "time sudo dracut -vvv -f --kver=KERNELVERSION" finished after a total of about 55 minutes. I will try to further diagnose where the issue comes from but do not have that many ideas, help would be appreciated.
*** Bug 2280321 has been marked as a duplicate of this bug. ***
We're getting lots of user reports of unbootable systems since this started happening, because if the initramfs generation gets interrupted for any reason (machine sleep, running out of battery, user impatience, etc.) then the default kernel ends up missing its initramfs and boot fails. This is a major UX issue. If kernel installation and initramfs generation were at least atomic to some extent (the kernel is installed to /boot and the GRUB menu only *after* the initramfs is successfully generated) it wouldn't be that bad, but these two things together end up causing a lot of user pain.
OK, I found the culprit: /usr/lib/dracut/modules.d/50drm/module-setup.sh ``` for i in /sys/bus/{pci/devices,platform/devices,virtio/devices,soc/devices/soc?,vmbus/devices}/*/modalias; do [[ -e $i ]] || continue [[ -n $(< "$i") ]] || continue # shellcheck disable=SC2046 if hostonly="" dracut_instmods --silent -s "drm_crtc_init|drm_dev_register|drm_encoder_init" -S "iw_handler_get_spy" $(< "$i"); then if strstr "$(modinfo -F filename $(< "$i") 2> /dev/null)" radeon.ko; then hostonly='' instmods amdkfd fi fi done ``` This bit of code goes through *every single device on the system* (288 on my system) and calls dracut_instmods for each, even for duplicate modaliases. That call takes seconds, because it *itself* goes through every modalias and, for each device, reads /lib/modules/$KVER/modules.*.bin. There is a ridiculous O(n^2) behavior here with a horrible constant factor. No wonder this takes anywhere from minutes to hours depending on your particular system.
This module's code hasn't changed since 2021... https://github.com/dracut-ng/dracut-ng/commits/main/modules.d/50drm/module-setup.sh
Err 2022 (in released dracut versions)
It looks like this commit may improve things: https://github.com/dracut-ng/dracut-ng/commit/80f2caf4f5ee47a708b5e4bd65c28e3f8ff1b9c8
It does. We should get that released ASAP given how painful this is. Re the module not changing recently, it's possible that the second n factor was introduced by dracut_instmods more recently, and the module always had one n factor.
I've created https://github.com/redhat-plumbers/dracut-fedora/pull/26 with a backport for rawhide / f40 Is was this and issue in f39 as well? Timing wise I started to see this around the time I started updating to f40 but I can't say if I noticed this issue only after updating to f40.
As far as I remember even the initial release of Fedora 40 did not have this issue. Only after installing from the Live-ISO and then doing an update the problem occurred.
Just adding a report of my experience: I upgraded a F38 host to F40 using dnf system-upgrade today. After rebooting it made it to the kernel-core script and stopped. I waited an hour and a half before terminating it with a control-alt-delete as there appeared to be nothing else I could do. Came up with an initramfs failure. I booted into the prior kernel and attempted a dnf reinstall of the F40 kernel, which also 'hung'... but since I was on a working system I was able to debug and got enough information to bring me to this bug (e.g. that it was dracut-install that was spinning). I manually applied https://github.com/dracut-ng/dracut-ng/commit/80f2caf4f5ee47a708b5e4bd65c28e3f8ff1b9c8 (the PR linked in this thread) and the reinstall was able to complete successfully though there was still a pretty obvious delay on the scripts step it wasn't one that would have caused me any concern during the install. I'm now running the fc40 kernel. On the system in question, a 128 core epyc host with a lot of storage on it: # for i in /sys/bus/{pci/devices,platform/devices,virtio/devices,soc/devices/soc?,vmbus/devices}/*/modalias; do [[ -e $i ]] && [[ -n $(< "$i") ]] && echo $i; done | wc -l 14540
https://bodhi.fedoraproject.org/updates/FEDORA-2024-18215bc41f (dracut-102) brings the initramfs generation time back to acceptable levels. Still slower than dracut-059 but not any more brokenly slow (! saw 28 minutes). On 2 tested apple silicon machines it tooks 50 / 120 seconds.
I think this can be closed now. Further optimizations are on the way upstream, but this is no longer a major UX issue.