Bug 2278534 - dracut-install triggered by kernel-core scriptlet or dracut regenerate hangs for an unlimited amount of time
Summary: dracut-install triggered by kernel-core scriptlet or dracut regenerate hangs ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 40
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: dracut-maint-list
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2280321 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-05-01 21:49 UTC by Persona non grata
Modified: 2024-08-11 10:36 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-08-11 10:36:00 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Persona non grata 2024-05-01 21:49:31 UTC
Fresh install of Fedora 40 running "dnf update" after installation gets stuck trying to regenerate the initramfs in the kernel-core packages postinstall scriptlet. The offending module seems to be drm.

Output of "ps -ef --forest" during stuck execution of "dnf update" yields:
root       16751   16673  0 22:54 pts/0    00:00:00  |   |   \_ sudo dnf update
root       16775   16751  0 22:54 pts/1    00:00:00  |   |       \_ sudo dnf update
root       16776   16775  5 22:54 pts/1    00:01:05  |   |           \_ /usr/bin/python3 /usr/bin/dnf update
root       20860   16776  0 23:01 pts/1    00:00:00  |   |               \_ /bin/sh /var/tmp/rpm-tmp.QVwxlT 2
root       20862   20860  0 23:01 pts/1    00:00:00  |   |                   \_ /bin/kernel-install add 6.8.7-300.fc40.x86_64 /lib/modules/6.8.7-300.fc40.x86_64/vmlinuz
root       20863   20862  0 23:01 pts/1    00:00:00  |   |                       \_ (sd-exec-strv)
root       20903   20863  0 23:01 pts/1    00:00:00  |   |                           \_ /usr/bin/bash /usr/lib/kernel/install.d/50-dracut.install add 6.8.7-300.fc40.x86_64 /boot/c2e65e4adf394e37814d3c0b07410524/6.8.7-300.fc40.x86_64 /lib/modules/6.8.7-300.fc40.x86_64/vmlinuz
root       20906   20903  0 23:01 pts/1    00:00:02  |   |                               \_ /usr/bin/bash -p /bin/dracut -f --kernel-image /lib/modules/6.8.7-300.fc40.x86_64/vmlinuz  --kver 6.8.7-300.fc40.x86_64 /boot/initramfs-6.8.7-300.fc40.x86_64.img
root       20932   20906  0 23:01 pts/1    00:00:00  |   |                                   \_ /bin/cat
root       26019   20906 99 23:14 pts/1    00:00:00  |   |                                   \_ /usr/lib/dracut/dracut-install -D /var/tmp/dracut.KKmzVY/initramfs --kerneldir /lib/modules/6.8.7-300.fc40.x86_64/ -m --silent -s drm_crtc_init|drm_dev_register|drm_encoder_init -S iw_handler_get_spy platform:GHES

Running "dracut -vvv -f --regenerate-all" gets stuck as well trying to include the drm module. Running "dracut -vvv -f --regenerate-all -o drm" to omit the drm module from initramfs executes without issue.

Reproducible: Always

Steps to Reproduce:
1. Install Fedora 40 Workstation from Live ISO
2. Run "dnf update"
3. Get's stuck trying to dracut-install subcommand
Actual Results:  
Process get's stuck trying the execute "/usr/lib/dracut/dracut-install -D /var/tmp/dracut.KKmzVY/initramfs --kerneldir /lib/modules/6.8.7-300.fc40.x86_64/ -m --silent -s drm_crtc_init|drm_dev_register|drm_encoder_init -S iw_handler_get_spy platform:GHES" subcommand

Expected Results:  
Initramfs should be successfully be build.

Running "dracut -vvv -f --regenerate-all" gets stuck as well trying to include the drm module. Running "dracut -vvv -f --regenerate-all -o drm" to omit the drm module from initramfs executes without issue.

Comment 1 Persona non grata 2024-05-01 22:06:11 UTC
Used filesystem is BTRFS. Used graphics cards are Intel Arc A380 and Nvidia RTX 4080

Comment 2 Persona non grata 2024-05-04 12:01:31 UTC
So the initramfs generation does not hang indefinitely. "time sudo dracut -vvv -f --kver=KERNELVERSION" finished after a total of about 55 minutes. I will try to further diagnose where the issue comes from but do not have that many ideas, help would be appreciated.

Comment 3 Jan Bušta 2024-05-15 04:35:41 UTC
*** Bug 2280321 has been marked as a duplicate of this bug. ***

Comment 4 Hector Martin 2024-05-19 01:19:59 UTC
We're getting lots of user reports of unbootable systems since this started happening, because if the initramfs generation gets interrupted for any reason (machine sleep, running out of battery, user impatience, etc.) then the default kernel ends up missing its initramfs and boot fails. This is a major UX issue. If kernel installation and initramfs generation were at least atomic to some extent (the kernel is installed to /boot and the GRUB menu only *after* the initramfs is successfully generated) it wouldn't be that bad, but these two things together end up causing a lot of user pain.

Comment 5 Hector Martin 2024-05-19 01:35:35 UTC
OK, I found the culprit:

/usr/lib/dracut/modules.d/50drm/module-setup.sh

```
        for i in /sys/bus/{pci/devices,platform/devices,virtio/devices,soc/devices/soc?,vmbus/devices}/*/modalias; do
            [[ -e $i ]] || continue
            [[ -n $(< "$i") ]] || continue
            # shellcheck disable=SC2046
            if hostonly="" dracut_instmods --silent -s "drm_crtc_init|drm_dev_register|drm_encoder_init" -S "iw_handler_get_spy" $(< "$i"); then
                if strstr "$(modinfo -F filename $(< "$i") 2> /dev/null)" radeon.ko; then
                    hostonly='' instmods amdkfd
                fi
            fi
        done
```

This bit of code goes through *every single device on the system* (288 on my system) and calls dracut_instmods for each, even for duplicate modaliases. That call takes seconds, because it *itself* goes through every modalias and, for each device, reads /lib/modules/$KVER/modules.*.bin.

There is a ridiculous O(n^2) behavior here with a horrible constant factor. No wonder this takes anywhere from minutes to hours depending on your particular system.

Comment 6 Neal Gompa 2024-05-19 01:38:14 UTC
This module's code hasn't changed since 2021... https://github.com/dracut-ng/dracut-ng/commits/main/modules.d/50drm/module-setup.sh

Comment 7 Neal Gompa 2024-05-19 01:38:54 UTC
Err 2022 (in released dracut versions)

Comment 8 Neal Gompa 2024-05-19 01:39:55 UTC
It looks like this commit may improve things: https://github.com/dracut-ng/dracut-ng/commit/80f2caf4f5ee47a708b5e4bd65c28e3f8ff1b9c8

Comment 9 Hector Martin 2024-05-19 01:45:15 UTC
It does. We should get that released ASAP given how painful this is.

Re the module not changing recently, it's possible that the second n factor was introduced by dracut_instmods more recently, and the module always had one n factor.

Comment 10 Janne Grunau 2024-05-20 10:52:19 UTC
I've created https://github.com/redhat-plumbers/dracut-fedora/pull/26 with a backport for rawhide / f40

Is was this and issue in f39 as well? Timing wise I started to see this around the time I started updating to f40 but I can't say if I noticed this issue only after updating to f40.

Comment 11 Persona non grata 2024-05-20 11:50:30 UTC
As far as I remember even the initial release of Fedora 40 did not have this issue. Only after installing from the Live-ISO and then doing an update the problem occurred.

Comment 12 Gregory Maxwell 2024-05-28 02:30:20 UTC
Just adding a report of my experience:  I upgraded a F38 host to F40 using dnf system-upgrade today.  After rebooting it made it to the kernel-core script and stopped.  I waited an hour and a half before terminating it with a control-alt-delete as there appeared to be nothing else I could do.  Came up with an initramfs failure.   I booted into the prior kernel and attempted a dnf reinstall of the F40 kernel, which also 'hung'... but since I was on a working system I was able to debug and got enough information to bring me to this bug (e.g. that it was dracut-install that was spinning).

I manually applied https://github.com/dracut-ng/dracut-ng/commit/80f2caf4f5ee47a708b5e4bd65c28e3f8ff1b9c8 (the PR linked in this thread) and the reinstall was able to complete successfully though there was still a pretty obvious delay on the scripts step it wasn't one that would have caused me any concern during the install.  I'm now running the fc40 kernel.

On the system in question, a 128 core epyc host with a lot of storage on it:

# for i in /sys/bus/{pci/devices,platform/devices,virtio/devices,soc/devices/soc?,vmbus/devices}/*/modalias; do [[ -e $i ]] && [[ -n $(< "$i") ]]  && echo $i; done | wc -l
14540

Comment 13 Janne Grunau 2024-07-17 20:36:06 UTC
https://bodhi.fedoraproject.org/updates/FEDORA-2024-18215bc41f (dracut-102) brings the initramfs generation time back to acceptable levels. Still slower than dracut-059 but not any more brokenly slow (! saw 28 minutes). On 2 tested apple silicon machines it tooks 50 / 120 seconds.

Comment 14 Hector Martin 2024-08-11 10:36:00 UTC
I think this can be closed now. Further optimizations are on the way upstream, but this is no longer a major UX issue.


Note You need to log in before you can comment on or make changes to this bug.