Description of problem: In an F31 VM, when installing an F33 kernel, depmod runs for a very long time and consumes nearly 100% CPU. Version-Release number of selected component (if applicable): kmod-26-4.fc31.x86_64 kernel-0:5.7.0-0.rc2.20200422git18bf34080c4c.1.fc33.x86_64 How reproducible: Tested once. Steps to Reproduce: # dnf update --nogpg kernel-0:5.7.0-0.rc2.20200422git18bf34080c4c.1.fc33.x86_64 --releasever=33 Actual results: Top shows CPU usage by depmod approaching 100% at times: $ top -bc -n 1 -u root | head -9 top - 10:31:33 up 28 min, 1 user, load average: 1.20, 1.11, 0.99 Tasks: 186 total, 2 running, 184 sleeping, 0 stopped, 0 zombie %Cpu(s): 94.1 us, 5.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 3928.0 total, 2000.5 free, 734.9 used, 1192.6 buff/cache MiB Swap: 512.0 total, 512.0 free, 0.0 used. 2942.6 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 38906 root 20 0 41960 39528 4220 R 81.2 1.0 0:03.82 /sbin/depmod -C /tmp/weak-modules.Os10HS/depmod.conf + 1 root 20 0 172764 16432 9752 S 0.0 0.4 0:02.78 /usr/lib/systemd/systemd --switched-root --system --d+ Expected results: Kernel install completes in a normal time. Additional info: This bug appears to be reporting the same issue: Bug 1825940 - kernel-core-5.7.0-0.rc1.20200416git9786cab67457.1 took very long time to install
This problem could have something to do with the ARK project: kernel-ark https://gitlab.com/cki-project/kernel-ark
Most probably duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1817581 (https://bugzilla.redhat.com/show_bug.cgi?id=1814422)
I attached strace for only a few seconds and had messages like these scrolling by: # strace -f -p 2214 -o strace-1.txt strace: Process 2214 attached strace: Process 174476 attached strace: Process 174477 attached ... These statistics might suggest a reason: # grep exec strace-1.txt | wc -l 212 # grep readlink strace-1.txt | wc -l 2502 Note the huge PIDs here: # ps -g PID TTY STAT TIME COMMAND 2073 pts/0 S 0:00 sudo su 2080 pts/0 S 0:00 su 2083 pts/0 S 0:00 bash 2206 pts/0 S+ 0:11 /usr/bin/python3 /usr/bin/dnf update --nogpg kernel-0:5.7.0-0.rc2.20200422git18bf34080c4c.1.fc33. 2213 pts/0 S+ 0:00 /bin/sh /var/tmp/rpm-tmp.Cf9m41 5 2214 pts/0 S+ 0:39 /usr/bin/bash /usr/sbin/weak-modules --add-kernel 5.7.0-0.rc2.20200422git18bf34080c4c.1.fc33.x86_ 222990 pts/1 S 0:00 sudo su 222997 pts/1 S 0:00 su 223000 pts/1 S 0:00 bash 223313 pts/0 R+ 0:01 /sbin/depmod -C /tmp/weak-modules.Os10HS/depmod.conf -naeE /tmp/weak-modules.Os10HS/symvers-5.7.0 223314 pts/0 S+ 0:00 grep /tmp/weak-modules.Os10HS/5.7.0-0.rc2.20200422git18bf34080c4c.1.fc33.x86_64/weak-updates 223316 pts/1 R+ 0:00 ps -g
(In reply to Yauheni Kaliuta from comment #2) > Most probably duplicate of > https://bugzilla.redhat.com/show_bug.cgi?id=1817581 (https://bugzilla.redhat.com/show_bug.cgi?id=1814422) Related, perhaps, but depmod is obviously failing to handle invalid input, so there really is a kmod bug.
(In reply to Steve from comment #4) > (In reply to Yauheni Kaliuta from comment #2) > > Most probably duplicate of > > https://bugzilla.redhat.com/show_bug.cgi?id=1817581 (https://bugzilla.redhat.com/show_bug.cgi?id=1814422) > > Related, perhaps, but depmod is obviously failing to handle invalid input, > so there really is a kmod bug. What do you mean?
(In reply to Yauheni Kaliuta from comment #5) > (In reply to Steve from comment #4) > > (In reply to Yauheni Kaliuta from comment #2) > > > Most probably duplicate of > > > https://bugzilla.redhat.com/show_bug.cgi?id=1817581 (https://bugzilla.redhat.com/show_bug.cgi?id=1814422) > > > > Related, perhaps, but depmod is obviously failing to handle invalid input, > > so there really is a kmod bug. > > What do you mean? There is a process storm. I don't know why, but that should not happen. With all those readlinks in the strace log, my first guess would be a loop in the directory structure. I've carefully documented the reproducer, so it should be possible to investigate.
This is a known issue with old kernels installing modules-extra in the wrong place. It's fixed with current kernel packages, but old versions of modules-extra still exist.
(In reply to Jeremy Cline from comment #7) > This is a known issue with old kernels installing modules-extra in the wrong > place. It's fixed with current kernel packages, but old versions of > modules-extra still exist. OK, but depmod should fail with an error, not set my system on fire.
(In reply to Jeremy Cline from comment #7) > ... old versions of modules-extra still exist. This is not an "old version": kernel-0:5.7.0-0.rc2.20200422git18bf34080c4c.1.fc33.x86_64
Hi Steve, depmod is doing what it's supposed to be doing, it's just getting wrong (but valid) input from the kernel package. The issue is old kernel-modules-extra packages with the *new* kernel post-install script. I'm assuming "rpm -q kernel-modules-extra" lists more than the most recent kernel-modules.extra.
(In reply to Steve from comment #8) > OK, but depmod should fail with an error, not set my system on fire. But there is no error in depmod, just too much work to do. I can suggest temporary remove old kernel-modules-extra packages for the new kernel installation time, if it is possible.
(In reply to Yauheni Kaliuta from comment #11) > (In reply to Steve from comment #8) > > OK, but depmod should fail with an error, not set my system on fire. > > But there is no error in depmod, just too much work to do. I can suggest > temporary remove old kernel-modules-extra packages for the new kernel > installation time, if it is possible. Thanks. That sounds like a good solution. Presumably, the install would fail, with an error message, and the problem could then be reported and fixed.
Here is the invalid data. The 5.7 directory has links to 5.3.7 modules under it. If those version numbers don't match, depmod should exit with an error message: # ll /tmp/weak-modules.9BrwAr/5.7.0-0.rc3.1.fc33.x86_64/weak-updates/drivers/input/joystick/a3d.ko.xz lrwxrwxrwx. 1 root root 73 Apr 27 16:58 /tmp/weak-modules.9BrwAr/5.7.0-0.rc3.1.fc33.x86_64/weak-updates/drivers/input/joystick/a3d.ko.xz -> ^^^ /lib/modules/5.3.7-301.fc31.x86_64/extra/drivers/input/joystick/a3d.ko.xz ^^^^^ Here are the modules directories in my F31 VM: $ ls /lib/modules/ 5.3.7-301.fc31.x86_64 5.5.11-200.fc31.x86_64 5.5.8-200.fc31.x86_64 5.6.6-200.fc31.x86_64 5.7.0-0.rc3.1.fc33.x86_64 5.5.10-200.fc31.x86_64 5.5.16-200.fc31.x86_64 5.5.9-200.fc31.x86_64 5.6.7-200.fc31.x86_64 NB: I added newlines for readability. Reopening.
No. This is the whole idea of weak-updates.
But couldn't weak-modules run depmod only once for all modules ?
*** Bug 1886734 has been marked as a duplicate of this bug. ***
can we remove weak-modules from kmod ? kmod run depmod on empty directories, it is a bug .
And as I wrote in previous report , weak-modules just can be applied on same KABI , i.e. when kernel it is the same with small security package which never happens in Fedora , so bring weak-modules into Fedora is it wrong . And lead us to problems on dkms and depmods with wrong "version magic" [1] Please revert all weak-modules stuff. [1] https://github.com/tomaspinho/rtl8821ce/issues/171
We certainly don't want to remove weak-modules from Fedora.ELN as weak-modules are critical for enterprise distributions. https://fedoraproject.org/wiki/Changes/ELN_Buildroot_and_Compose But, perhaps we could drop it from standard Fedora builds? Also, I'm not sure why empty directories are increasing the time needed to run weak-modules. It should be looking only into installed modules and kernels. Any thoughts on that one Yauheni?
OK this bug is not empty directories . I have 5.8.x kernel on my system [1] and I want test network drive rtl8821ce which is only available on 5.9.0-0.rc8.28.fc34.x86_64. Command [2] from script [3] take about 30 minutes because add 155 .ko to weak-updates [4] but and do it 3 times kernel 5.8.11 [5] after a while kernel 5.8.12 [6] and [7], you can see the timestamps, it started 15:21 and end at 16:03 . So we have several bugs here not just one . [1] rpm -q kernel kernel-5.8.11-100.fc31.x86_64 kernel-5.8.12-100.fc31.x86_64 kernel-5.8.13-100.fc31.x86_64 [2] /usr/bin/bash /usr/sbin/weak-modules --add-kernel 5.9.0-0.rc8.28.fc34.x86_64 [3] cat /var/tmp/rpm-tmp.IyoF7o if [ -x /usr/sbin/weak-modules ] then /usr/sbin/weak-modules --add-kernel 5.9.0-0.rc8.28.fc34.x86_64 || exit $? fi /bin/kernel-install add 5.9.0-0.rc8.28.fc34.x86_64 /lib/modules/5.9.0-0.rc8.28.fc34.x86_64/vmlinuz || exit $?r [4] find /lib/modules/5.9.0-0.rc8.28.fc34.x86_64/weak-updates/ | wc -l 155 [5] for example: ll /lib/modules/5.9.0-0.rc8.28.fc34.x86_64/weak-updates/drivers/input/joystick/ lrwxrwxrwx 1 root root 74 Oct 9 15:21 adi.ko.xz -> /lib/modules/5.8.11-100.fc31.x86_64/extra/drivers/input/joystick/adi.ko.xz lrwxrwxrwx 1 root root 78 Oct 9 15:21 joydump.ko.xz -> /lib/modules/5.8.11-100.fc31.x86_64/extra/drivers/input/joystick/joydump.ko.xz lrwxrwxrwx 1 root root 75 Oct 9 15:21 xpad.ko.xz -> /lib/modules/5.8.11-100.fc31.x86_64/extra/drivers/input/joystick/xpad.ko.xz [6] for example: ll /lib/modules/5.9.0-0.rc8.28.fc34.x86_64/weak-updates/drivers/input/joystick/ lrwxrwxrwx 1 root root 74 Oct 9 15:51 a3d.ko.xz -> /lib/modules/5.8.12-100.fc31.x86_64/extra/drivers/input/joystick/a3d.ko.xz lrwxrwxrwx 1 root root 74 Oct 9 15:51 adi.ko.xz -> /lib/modules/5.8.12-100.fc31.x86_64/extra/drivers/input/joystick/adi.ko.xz [7] ll /lib/modules/5.9.0-0.rc8.28.fc34.x86_64/weak-updates/drivers/input/joystick/ lrwxrwxrwx 1 root root 74 Oct 9 16:03 a3d.ko.xz -> /lib/modules/5.8.13-100.fc31.x86_64/extra/drivers/input/joystick/a3d.ko.xz lrwxrwxrwx 1 root root 74 Oct 9 16:03 adi.ko.xz -> /lib/modules/5.8.13-100.fc31.x86_64/extra/drivers/input/joystick/adi.ko.xz
(In reply to Sergio Basto from comment #20) > I have 5.8.x kernel on my system [1] and I want test network drive rtl8821ce > which is only available on 5.9.0-0.rc8.28.fc34.x86_64. Command [2] from > script [3] take about 30 minutes because add 155 .ko to weak-updates [4] but > and do it 3 times kernel 5.8.11 [5] after a while kernel 5.8.12 [6] and [7], > you can see the timestamps, it started 15:21 and end at 16:03 . > So it ran depmod 155 * 3 = 465 times. All the slowness comes from the fact it runs depmod for each module, instead of just once at the end.
(In reply to Loïc Yhuel from comment #21) > So it ran depmod 155 * 3 = 465 times. > All the slowness comes from the fact it runs depmod for each module, instead > of just once at the end. Thank you for information, yeah maybe the issue is run depmod 465 times . I think, I gave you a simple way to reproduce the problem, I believe you can also reproduce it on F32 . Also I see that /usr/sbin/weak-modules is a BASH script, I will try to look to weak-modules script. Have a nice weekend!
This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
any progress ? weak-modules behaves very badly when we switch to a major version
(In reply to Sergio Basto from comment #24) > any progress ? weak-modules behaves very badly when we switch to a major > version no
Really, we should just back weak-modules out of Fedora, as we do not support KABI and keep CONFIG_MODVERSIONS disabled. Let me see what can be done here.
Hmm, actually, weak-modules should not be running on stable Fedora. I was looking to back it out today with the 5.9.13 kernel updates and noticed I already did that on 8/20 %{expand:%%posttrans %{?1:%{1}-}core}\ %if 0%{!?fedora:1}\ if [ -x %{_sbindir}/weak-modules ]\ then\ %{_sbindir}/weak-modules --add-kernel %{KVERREL}%{?1:+%{1}} || exit $?\ fi\ %endif\ Were you seeing this on the 5.9 rebases in F33/32 ?
(In reply to Justin M. Forbes from comment #27) > Were you seeing this on the 5.9 rebases in F33/32 ? No . But I still see it when use fedora-rawhide-kernel-nodebug [1] , [2] [1] dnf --enablerepo=fedora-rawhide-kernel-nodebug update [2] rpm -q kernel-core-5.10.0-0.rc6.90.fc34.x86_64 --scripts posttrans scriptlet (using /bin/sh): if [ -x /usr/sbin/weak-modules ] then /usr/sbin/weak-modules --add-kernel 5.10.0-0.rc6.90.fc34.x86_64 || exit $? fi /bin/kernel-install add 5.10.0-0.rc6.90.fc34.x86_64 /lib/modules/5.10.0-0.rc6.90.fc34.x86_64/vmlinuz || exit $?
Confirmed for my for 5.10.....fc34 last night, around 7 minutes to complete installation.
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle. Changing version to 34.
I think this bug should be closed. Please re-open if you think it's not the case.