Bug 1828455 - depmod does not detect inconsistent kernel version numbers under /tmp/weak-modules.9BrwAr/ (was: running for a long time when installing F33 kernel)
Summary: depmod does not detect inconsistent kernel version numbers under /tmp/weak-mo...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kmod
Version: 34
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1886734 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-27 18:03 UTC by Steve
Modified: 2022-03-15 12:49 UTC (History)
28 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-03-15 12:49:55 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Steve 2020-04-27 18:03:21 UTC
Description of problem:

In an F31 VM, when installing an F33 kernel, depmod runs for a very long time and consumes nearly 100% CPU.

Version-Release number of selected component (if applicable):

kmod-26-4.fc31.x86_64
kernel-0:5.7.0-0.rc2.20200422git18bf34080c4c.1.fc33.x86_64

How reproducible:
Tested once.

Steps to Reproduce:

# dnf update --nogpg kernel-0:5.7.0-0.rc2.20200422git18bf34080c4c.1.fc33.x86_64 --releasever=33

Actual results:

Top shows CPU usage by depmod approaching 100% at times:

$ top -bc -n 1 -u root | head -9
top - 10:31:33 up 28 min,  1 user,  load average: 1.20, 1.11, 0.99
Tasks: 186 total,   2 running, 184 sleeping,   0 stopped,   0 zombie
%Cpu(s): 94.1 us,  5.9 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3928.0 total,   2000.5 free,    734.9 used,   1192.6 buff/cache
MiB Swap:    512.0 total,    512.0 free,      0.0 used.   2942.6 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  38906 root      20   0   41960  39528   4220 R  81.2   1.0   0:03.82 /sbin/depmod -C /tmp/weak-modules.Os10HS/depmod.conf +
      1 root      20   0  172764  16432   9752 S   0.0   0.4   0:02.78 /usr/lib/systemd/systemd --switched-root --system --d+

Expected results:

Kernel install completes in a normal time.

Additional info:

This bug appears to be reporting the same issue:

Bug 1825940 - kernel-core-5.7.0-0.rc1.20200416git9786cab67457.1 took very long time to install

Comment 1 Steve 2020-04-27 18:10:30 UTC
This problem could have something to do with the ARK project:

kernel-ark
https://gitlab.com/cki-project/kernel-ark

Comment 3 Steve 2020-04-27 18:36:01 UTC
I attached strace for only a few seconds and had messages like these scrolling by:

# strace -f -p 2214 -o strace-1.txt
strace: Process 2214 attached
strace: Process 174476 attached
strace: Process 174477 attached
...

These statistics might suggest a reason:

# grep exec strace-1.txt | wc -l
212

# grep readlink strace-1.txt | wc -l
2502

Note the huge PIDs here:

# ps -g
    PID TTY      STAT   TIME COMMAND
   2073 pts/0    S      0:00 sudo su
   2080 pts/0    S      0:00 su
   2083 pts/0    S      0:00 bash
   2206 pts/0    S+     0:11 /usr/bin/python3 /usr/bin/dnf update --nogpg kernel-0:5.7.0-0.rc2.20200422git18bf34080c4c.1.fc33.
   2213 pts/0    S+     0:00 /bin/sh /var/tmp/rpm-tmp.Cf9m41 5
   2214 pts/0    S+     0:39 /usr/bin/bash /usr/sbin/weak-modules --add-kernel 5.7.0-0.rc2.20200422git18bf34080c4c.1.fc33.x86_
 222990 pts/1    S      0:00 sudo su
 222997 pts/1    S      0:00 su
 223000 pts/1    S      0:00 bash
 223313 pts/0    R+     0:01 /sbin/depmod -C /tmp/weak-modules.Os10HS/depmod.conf -naeE /tmp/weak-modules.Os10HS/symvers-5.7.0
 223314 pts/0    S+     0:00 grep /tmp/weak-modules.Os10HS/5.7.0-0.rc2.20200422git18bf34080c4c.1.fc33.x86_64/weak-updates
 223316 pts/1    R+     0:00 ps -g

Comment 4 Steve 2020-04-27 18:43:14 UTC
(In reply to Yauheni Kaliuta from comment #2)
> Most probably duplicate of
> https://bugzilla.redhat.com/show_bug.cgi?id=1817581 (https://bugzilla.redhat.com/show_bug.cgi?id=1814422)

Related, perhaps, but depmod is obviously failing to handle invalid input, so there really is a kmod bug.

Comment 5 Yauheni Kaliuta 2020-04-27 18:45:20 UTC
(In reply to Steve from comment #4)
> (In reply to Yauheni Kaliuta from comment #2)
> > Most probably duplicate of
> > https://bugzilla.redhat.com/show_bug.cgi?id=1817581 (https://bugzilla.redhat.com/show_bug.cgi?id=1814422)
> 
> Related, perhaps, but depmod is obviously failing to handle invalid input,
> so there really is a kmod bug.

What do you mean?

Comment 6 Steve 2020-04-27 19:00:46 UTC
(In reply to Yauheni Kaliuta from comment #5)
> (In reply to Steve from comment #4)
> > (In reply to Yauheni Kaliuta from comment #2)
> > > Most probably duplicate of
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1817581 (https://bugzilla.redhat.com/show_bug.cgi?id=1814422)
> > 
> > Related, perhaps, but depmod is obviously failing to handle invalid input,
> > so there really is a kmod bug.
> 
> What do you mean?

There is a process storm. I don't know why, but that should not happen.

With all those readlinks in the strace log, my first guess would be a loop in the directory structure.

I've carefully documented the reproducer, so it should be possible to investigate.

Comment 7 Jeremy Cline 2020-04-27 19:05:26 UTC
This is a known issue with old kernels installing modules-extra in the wrong place. It's fixed with current kernel packages, but old versions of modules-extra still exist.

Comment 8 Steve 2020-04-27 19:10:55 UTC
(In reply to Jeremy Cline from comment #7)
> This is a known issue with old kernels installing modules-extra in the wrong
> place. It's fixed with current kernel packages, but old versions of
> modules-extra still exist.

OK, but depmod should fail with an error, not set my system on fire.

Comment 9 Steve 2020-04-27 19:13:04 UTC
(In reply to Jeremy Cline from comment #7)
> ... old versions of modules-extra still exist.

This is not an "old version":

kernel-0:5.7.0-0.rc2.20200422git18bf34080c4c.1.fc33.x86_64

Comment 10 Jeremy Cline 2020-04-27 19:17:37 UTC
Hi Steve,

depmod is doing what it's supposed to be doing, it's just getting wrong (but valid) input from the kernel package.

The issue is old kernel-modules-extra packages with the *new* kernel post-install script. I'm assuming "rpm -q kernel-modules-extra" lists more than the most recent kernel-modules.extra.

Comment 11 Yauheni Kaliuta 2020-04-27 19:19:55 UTC
(In reply to Steve from comment #8)
> OK, but depmod should fail with an error, not set my system on fire.

But there is no error in depmod, just too much work to do. I can suggest temporary remove old kernel-modules-extra packages for the new kernel installation time, if it is possible.

Comment 12 Steve 2020-04-27 19:33:22 UTC
(In reply to Yauheni Kaliuta from comment #11)
> (In reply to Steve from comment #8)
> > OK, but depmod should fail with an error, not set my system on fire.
> 
> But there is no error in depmod, just too much work to do. I can suggest
> temporary remove old kernel-modules-extra packages for the new kernel
> installation time, if it is possible.

Thanks. That sounds like a good solution. Presumably, the install would fail, with an error message, and the problem could then be reported and fixed.

Comment 13 Steve 2020-04-28 00:12:51 UTC
Here is the invalid data. The 5.7 directory has links to 5.3.7 modules under it. If those version numbers don't match, depmod should exit with an error message:

# ll /tmp/weak-modules.9BrwAr/5.7.0-0.rc3.1.fc33.x86_64/weak-updates/drivers/input/joystick/a3d.ko.xz 

lrwxrwxrwx. 1 root root 73 Apr 27 16:58 

/tmp/weak-modules.9BrwAr/5.7.0-0.rc3.1.fc33.x86_64/weak-updates/drivers/input/joystick/a3d.ko.xz -> 
                         ^^^

/lib/modules/5.3.7-301.fc31.x86_64/extra/drivers/input/joystick/a3d.ko.xz
             ^^^^^

Here are the modules directories in my F31 VM:

$ ls /lib/modules/
5.3.7-301.fc31.x86_64   5.5.11-200.fc31.x86_64  5.5.8-200.fc31.x86_64  5.6.6-200.fc31.x86_64  5.7.0-0.rc3.1.fc33.x86_64
5.5.10-200.fc31.x86_64  5.5.16-200.fc31.x86_64  5.5.9-200.fc31.x86_64  5.6.7-200.fc31.x86_64

NB: I added newlines for readability.

Reopening.

Comment 14 Yauheni Kaliuta 2020-04-28 00:54:29 UTC
No. This is the whole idea of weak-updates.

Comment 15 Loïc Yhuel 2020-05-08 15:51:27 UTC
But couldn't weak-modules run depmod only once for all modules ?

Comment 16 Yauheni Kaliuta 2020-10-09 12:06:02 UTC
*** Bug 1886734 has been marked as a duplicate of this bug. ***

Comment 17 Sergio Basto 2020-10-09 13:34:38 UTC
can we remove weak-modules from kmod ? 

kmod run depmod on empty directories, it is a bug .

Comment 18 Sergio Basto 2020-10-09 13:54:08 UTC
And as I wrote in previous report , weak-modules just can be applied on same KABI , i.e. when kernel it is the same with small security package which never happens in Fedora , so bring  weak-modules  into Fedora is it wrong . And lead us to problems on dkms and depmods with wrong "version magic" [1] 

Please revert all weak-modules stuff.  

[1] 
https://github.com/tomaspinho/rtl8821ce/issues/171

Comment 19 Stanislav Kozina 2020-10-09 14:05:19 UTC
We certainly don't want to remove weak-modules from Fedora.ELN as weak-modules are critical for enterprise distributions.
https://fedoraproject.org/wiki/Changes/ELN_Buildroot_and_Compose
But, perhaps we could drop it from standard Fedora builds?
Also, I'm not sure why empty directories are increasing the time needed to run weak-modules. It should be looking only into installed modules and kernels. Any thoughts on that one Yauheni?

Comment 20 Sergio Basto 2020-10-09 15:21:07 UTC
OK this bug is not empty directories .

I have 5.8.x kernel on my system [1] and I want test network drive rtl8821ce which is only available on 5.9.0-0.rc8.28.fc34.x86_64.  Command [2] from script [3] take about 30 minutes because add 155 .ko to weak-updates [4] but and do it 3 times kernel 5.8.11 [5] after a while kernel 5.8.12 [6] and [7], you can see the timestamps, it started 15:21 and end at 16:03 .

So we have several bugs here not just one . 


[1]
rpm -q kernel
kernel-5.8.11-100.fc31.x86_64
kernel-5.8.12-100.fc31.x86_64
kernel-5.8.13-100.fc31.x86_64

[2]
/usr/bin/bash /usr/sbin/weak-modules --add-kernel 5.9.0-0.rc8.28.fc34.x86_64

[3]
cat /var/tmp/rpm-tmp.IyoF7o
if [ -x /usr/sbin/weak-modules ]
then
    /usr/sbin/weak-modules --add-kernel 5.9.0-0.rc8.28.fc34.x86_64 || exit $?
fi
/bin/kernel-install add 5.9.0-0.rc8.28.fc34.x86_64 /lib/modules/5.9.0-0.rc8.28.fc34.x86_64/vmlinuz || exit $?r

[4]
find  /lib/modules/5.9.0-0.rc8.28.fc34.x86_64/weak-updates/  | wc -l 
155

[5]
for example:

ll /lib/modules/5.9.0-0.rc8.28.fc34.x86_64/weak-updates/drivers/input/joystick/
lrwxrwxrwx 1 root root 74 Oct  9 15:21 adi.ko.xz -> /lib/modules/5.8.11-100.fc31.x86_64/extra/drivers/input/joystick/adi.ko.xz
lrwxrwxrwx 1 root root 78 Oct  9 15:21 joydump.ko.xz -> /lib/modules/5.8.11-100.fc31.x86_64/extra/drivers/input/joystick/joydump.ko.xz
lrwxrwxrwx 1 root root 75 Oct  9 15:21 xpad.ko.xz -> /lib/modules/5.8.11-100.fc31.x86_64/extra/drivers/input/joystick/xpad.ko.xz


[6]
for example:
ll /lib/modules/5.9.0-0.rc8.28.fc34.x86_64/weak-updates/drivers/input/joystick/

lrwxrwxrwx 1 root root 74 Oct  9 15:51 a3d.ko.xz -> /lib/modules/5.8.12-100.fc31.x86_64/extra/drivers/input/joystick/a3d.ko.xz
lrwxrwxrwx 1 root root 74 Oct  9 15:51 adi.ko.xz -> /lib/modules/5.8.12-100.fc31.x86_64/extra/drivers/input/joystick/adi.ko.xz

[7]
ll /lib/modules/5.9.0-0.rc8.28.fc34.x86_64/weak-updates/drivers/input/joystick/

lrwxrwxrwx 1 root root 74 Oct  9 16:03 a3d.ko.xz -> /lib/modules/5.8.13-100.fc31.x86_64/extra/drivers/input/joystick/a3d.ko.xz
lrwxrwxrwx 1 root root 74 Oct  9 16:03 adi.ko.xz -> /lib/modules/5.8.13-100.fc31.x86_64/extra/drivers/input/joystick/adi.ko.xz

Comment 21 Loïc Yhuel 2020-10-09 20:32:19 UTC
(In reply to Sergio Basto from comment #20)
> I have 5.8.x kernel on my system [1] and I want test network drive rtl8821ce
> which is only available on 5.9.0-0.rc8.28.fc34.x86_64.  Command [2] from
> script [3] take about 30 minutes because add 155 .ko to weak-updates [4] but
> and do it 3 times kernel 5.8.11 [5] after a while kernel 5.8.12 [6] and [7],
> you can see the timestamps, it started 15:21 and end at 16:03 .
> 
So it ran depmod 155 * 3 = 465 times.
All the slowness comes from the fact it runs depmod for each module, instead of just once at the end.

Comment 22 Sergio Basto 2020-10-09 22:32:59 UTC
(In reply to Loïc Yhuel from comment #21)

> So it ran depmod 155 * 3 = 465 times.
> All the slowness comes from the fact it runs depmod for each module, instead
> of just once at the end.

Thank you for information, yeah maybe the issue is run depmod 465 times .  
I think, I gave you a simple way to reproduce the problem, I believe you can also reproduce it on F32 .
Also I see that /usr/sbin/weak-modules is a BASH script, I will try to look to weak-modules script. 

Have a nice weekend!

Comment 23 Ben Cotton 2020-11-03 16:39:40 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 24 Sergio Basto 2020-12-01 05:29:38 UTC
any progress ? weak-modules behaves very badly when we switch to a major version

Comment 25 Yauheni Kaliuta 2020-12-01 07:52:51 UTC
(In reply to Sergio Basto from comment #24)
> any progress ? weak-modules behaves very badly when we switch to a major
> version

no

Comment 26 Justin M. Forbes 2020-12-02 16:40:13 UTC
Really, we should just back weak-modules out of Fedora, as we do not support KABI and keep CONFIG_MODVERSIONS disabled. Let me see what can be done here.

Comment 27 Justin M. Forbes 2020-12-08 14:12:37 UTC
Hmm, actually, weak-modules should not be running on stable Fedora. I was looking to back it out today with the 5.9.13 kernel updates and noticed I already did that on 8/20

%{expand:%%posttrans %{?1:%{1}-}core}\
%if 0%{!?fedora:1}\
if [ -x %{_sbindir}/weak-modules ]\
then\
    %{_sbindir}/weak-modules --add-kernel %{KVERREL}%{?1:+%{1}} || exit $?\
fi\
%endif\

Were you seeing this on the 5.9 rebases in F33/32 ?

Comment 28 Sergio Basto 2020-12-09 15:03:54 UTC
(In reply to Justin M. Forbes from comment #27)
> Were you seeing this on the 5.9 rebases in F33/32 ?

No . 

But I still see it when use fedora-rawhide-kernel-nodebug [1] , [2] 



[1] 
dnf --enablerepo=fedora-rawhide-kernel-nodebug update 

[2]
rpm -q kernel-core-5.10.0-0.rc6.90.fc34.x86_64  --scripts
posttrans scriptlet (using /bin/sh):
if [ -x /usr/sbin/weak-modules ]
then
    /usr/sbin/weak-modules --add-kernel 5.10.0-0.rc6.90.fc34.x86_64 || exit $?
fi
/bin/kernel-install add 5.10.0-0.rc6.90.fc34.x86_64 /lib/modules/5.10.0-0.rc6.90.fc34.x86_64/vmlinuz || exit $?

Comment 29 Felix Miata 2020-12-27 04:10:36 UTC
Confirmed for my for 5.10.....fc34 last night, around 7 minutes to complete installation.

Comment 30 Ben Cotton 2021-02-09 15:14:53 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.

Comment 31 Nicolas Chauvet (kwizart) 2022-03-15 12:49:55 UTC
I think this bug should be closed.
Please re-open if you think it's not the case.


Note You need to log in before you can comment on or make changes to this bug.