Bug 2379766 - akmods for installed-but-not-running kernels are built too late
Summary: akmods for installed-but-not-running kernels are built too late
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: akmods
Version: 42
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Nicolas Chauvet (kwizart)
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2025-07-13 15:06 UTC by Avi Kivity
Modified: 2025-10-01 00:41 UTC (History)
8 users (show)

Fixed In Version: akmods-0.6.1-2.fc43 akmods-0.6.1-2.fc42 akmods-0.6.1-2.el10_2 akmods-0.6.1-2.el10_1 akmods-0.6.1-2.el10_0
Clone Of:
Environment:
Last Closed: 2025-09-26 00:20:06 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Avi Kivity 2025-07-13 15:06:42 UTC
For example, I'm running kernel-6.15.3-200.fc42.x86_64 and have kernel-6.15.5-200.fc42.x86_64 installed.

When kmod-nvidia is updated to kmod-nvidia-575.64.03-2.fc42.x86_64, the corresponding kmod for the running kernel is updated, but not the kmod for the installed-but-not-running kernel.

When I restart, the kmod for the now-running kernel is built, but it is too late, since the modprobe stuff already happened. I end up having to reboot again.

Running sudo akmods --kernels 6.15.5-200.fc42.x86_64 fixes it, but it's a better experience if we build kmods for installed-but-not-running kernels ahead of time.

Note: if someone updates using gnome-update, the problem is likely hidden by the additional reboot step.

Reproducible: Always

Steps to Reproduce:
1. `dnf update` that updates a kernel (but don't reboot)
2. `dnf update` that updates kmod-nvidia
3. `rpm -qa kmod-nvidia*`
4. reboot

Actual Results:
1. kmod-nvidia for the installed-but-not-running kernel points to the older driver
2. gnome-shell loads using xorg instead of wayland
3. during the reboot, a new kmod-nvidia is generated, but too late to take effect

Expected Results:
1. kmod-nvidia updated for the installed-but-not-running kernel immediately
2. rpm -qa lists the new kmod
3. reboot can load gnome-shell on wayland

Comment 1 Nicolas Chauvet (kwizart) 2025-09-16 18:15:57 UTC
Thanks for the report.

It looks similar to this rhbz#2376351 but with the additional context of updating the akmod-nvidia version.

I will check both cases...

Comment 2 Francis Montagnac 2025-09-17 10:14:43 UTC
This is the same problem as rhbz#2376351:

  - rhbz#2376351 applies when using akmods without specifying a kernel
  - this is the case in all the akmod-FOO posttrans scriptlets:
    nohup /usr/sbin/akmods --from-akmod-posttrans --akmod FOO &> /dev/null &

Comment 3 Francis Montagnac 2025-09-17 10:22:57 UTC
In addition, I don't see the point of testing that /usr/sbin/grubby is
a symlink in the check_default_kernel function. This may be the case
when upgrading to F42 if the unification of /usr/sbin is partial:
/usr/sbin/grubby will be a symlink to ../bin/grubby (I think .. from memory).

Testing "command -v grubby >/dev/null 2>&1" should be enough.

Ref: https://docs.fedoraproject.org/en-US/fedora/latest/release-notes/sysadmin/#usr-bin-usr-sbin-unification

Comment 4 Nicolas Chauvet (kwizart) 2025-09-17 11:21:48 UTC
(In reply to Francis Montagnac from comment #2)
> This is the same problem as rhbz#2376351:
> 
>   - rhbz#2376351 applies when using akmods without specifying a kernel
>   - this is the case in all the akmod-FOO posttrans scriptlets:
>     nohup /usr/sbin/akmods --from-akmod-posttrans --akmod FOO &> /dev/null &

Right, with the additional point that when running from --from-akmod-posttrans we only need to build only for the latest kernel or it will take a big amount of time if build for both latest and current uname -r.
I expect that we might only build for current uname -r kernel (but it's a regression).

Now when both kernel-devel and akmod-foo updates are present, I need to check the appropriate behavior...


> Testing "command -v grubby >/dev/null 2>&1" should be enough.

This is literally the point I've spotted, indeed.


I've just pushed the commit I had. I will check for others changes as needed a do a build for f43+ el10 by the end of the week.

Comment 5 Francis Montagnac 2025-09-17 14:30:12 UTC
(In reply to Nicolas Chauvet (kwizart) from comment #4)

> Right, with the additional point that when running from --from-akmod-posttrans we only need to build only for the
> latest kernel or it will take a big amount of time if build for both latest and current uname -r.

I disagree:

  - the latest kernel may not be the default kernel that will be used on next reboot
  - loosing time here is not important at all compared to have the kmods available at next boot

IMO, building for the current, default and latest kernel would be better for security.

> I expect that we might only build for current uname -r kernel (but it's a regression).

Why? This would make this BZ irrelevant, solved by answering something like:

  Do not update a kmod-foo if you previously installed a new kernel: reboot first

That would be impracticable.
 
> Now when both kernel-devel and akmod-foo updates are present, I need to check the appropriate behavior...
 
If you mean what to do if a single dnf transaction contains both a new kernel (and kernel-devel) and a new version of a
akmod-foo, IMO the current behavior is fine:

  - akmods will be (asynchronously) called twice, with:
      --from-kernel-posttrans --kernels kver
    and
      --from-akmod-posttrans --akmod foo

  - thanks to the lock done by akmods (flock in the init function), the builds and installs will be done in sequence,
    resulting in all the kmods build for the new kernel, and kmod-foo build for all the "wanted" kernels.

  - kmod-foo will be build and installed only once for the new kernel

Comment 6 Avi Kivity 2025-09-17 15:28:56 UTC
We should build kmods for all installed kernels. Otherwise, we're converting a usable kernel to an unusable kernel.

An example scenario is that we reboot to a new kernel, find it broken for some reason, boot to an installed-but-not-current kernel, and now that's unusable too because we didn't build a kmod for it.

The time to build a kmod, even spread across the entire Fedora installed base that uses kmods, isn't worth the frustration of users booting to a kernel and having to figure out what's broken and how to fix it.

Comment 7 Nicolas Chauvet (kwizart) 2025-09-17 16:35:55 UTC
> We should build kmods for all installed kernels.
We only used to build uname -r or default_kernel (as the next boot kernel as detected by grub or others).
If they do not match it means two kernels at most. (while Fedora by defaults allows 3 kernel, but it's configurable)

> Otherwise, we're converting a usable kernel to an unusable kernel.
No. previously working kernels are still usable as booting to userspace were previously working at the very least. One can still recover from there.

To the opposite, we have no mean to check that a non-default non-uname-r kernel used to work. (systemd-boot might).


Building all kernels will take too much time for no much gain.
I disagree to have this the default.

Comment 8 Francis Montagnac 2025-09-18 06:17:17 UTC
(In reply to Nicolas Chauvet (kwizart) from comment #7)
>> We should build kmods for all installed kernels.
>> Otherwise, we're converting a usable kernel to an unusable kernel.

> No. previously working kernels are still usable as booting to userspace were previously working at the very
> least. One can still recover from there.
 
Not quite true. I don't known for wayland, but for Xorg, having the exact same version for the kmod-nvidia
kernel module and the nvidia_drv Xorg module is absolutely mandatory.

If they do not match, no Xorg, no graphical session. We still have textual consoles, but this is not what we
call a usable system nowadays :-)

> Building all kernels will take too much time for no much gain.

Looking at the current changelog of akmod-nvidia, there as been only 7 versions from May 27 to Sep 02. In
average a new version every 14 days. I find quite bearable to compile for all (default 3) installed kernels at
this low frequency.

Comment 9 Avi Kivity 2025-09-18 12:41:37 UTC
It's true for wayland as well. Boot without a built kmod = failed start.

Comment 10 Nicolas Chauvet (kwizart) 2025-09-22 12:56:50 UTC
> kernel module and the nvidia_drv Xorg module is absolutely mandatory.

This is known contains that is specific to the nvidia module where a strict version check for kernel and userspace components is enforced. (it's the same on Xorg or Wayland with that respect).
Others DRM modules don't have such contains, but others akmods might.

When booting to a kernel with no nvidia kmod, then there is already a fallback mechanism, but there is a known issue where we need to unload the previous driver if there is a component version miss-match.
See also https://bugzilla.rpmfusion.org/show_bug.cgi?id=4903

That been said, we might have a better way forward to improve akmods.service running --from-init to start earlier, then to trigger the reload of the nvidia module (or any other akmods) instead of going with the older ones.

At least, I've checked that current fixed code (akmods 0.6.1) does restore the build for uname -r and default_kernel if they don't match on any akmod version upgrade...
This was the previous behaviour before the regression occurred.


I'm open to discussion (likely in a separate RFE ticket) , but still I don't see the point to build for every kernel. Building all my kmod on my given laptop takes 2minutes (rather recent model), so it will take 6 minutes of full CPU service with the addition of the regular updates.

When there is nothing else to do: time is cheap. But when you are waiting for laptop shutdown and it becomes too hot to travel, then you've actually not the time you've expected to do the upgrade and you force shutdown, then that's too much time...

Anyway, as this bug is concerned, I'm about to close it along with akmods-0.6.1 update.
Please test...

Comment 11 Fedora Update System 2025-09-22 13:46:19 UTC
FEDORA-2025-b18a85b925 (akmods-0.6.1-2.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-b18a85b925

Comment 12 Fedora Update System 2025-09-22 13:46:19 UTC
FEDORA-EPEL-2025-33ea68b80b (akmods-0.6.1-2.el10_2) has been submitted as an update to Fedora EPEL 10.2.
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-33ea68b80b

Comment 13 Fedora Update System 2025-09-22 13:46:20 UTC
FEDORA-EPEL-2025-08996c0ee3 (akmods-0.6.1-2.el10_1) has been submitted as an update to Fedora EPEL 10.1.
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-08996c0ee3

Comment 14 Fedora Update System 2025-09-22 13:46:20 UTC
FEDORA-EPEL-2025-b5c526537e (akmods-0.6.1-2.el10_0) has been submitted as an update to Fedora EPEL 10.0.
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-b5c526537e

Comment 15 Francis Montagnac 2025-09-22 14:46:23 UTC
Hi.

(In reply to Nicolas Chauvet (kwizart) from comment #10)

> Anyway, as this bug is concerned, I'm about to close it along with akmods-0.6.1 update.

Good.

> Please test...

Test successfull on fc42 with:

  akmod-VirtualBox from 7.2.0-1.fc42 to
                        7.2.2-1.fc42
  kernels: 6.16.5-200.fc42 # uname -r
           6.16.7-200.fc42 # default-kernel

The two kmods have been build and installed (not for the 6.15.6 kernel as expected)

    dnf list --installed kernel kmod-\* | cat
    Installed packages
    kernel.x86_64                                 6.15.6-200.fc42 updates
    kernel.x86_64                                 6.16.5-200.fc42 updates
    kernel.x86_64                                 6.16.7-200.fc42 updates
    kmod-VirtualBox-6.15.6-200.fc42.x86_64.x86_64 7.2.0-1.fc42    @commandline
    kmod-VirtualBox-6.16.5-200.fc42.x86_64.x86_64 7.2.2-1.fc42    @commandline
    kmod-VirtualBox-6.16.7-200.fc42.x86_64.x86_64 7.2.2-1.fc42    @commandline
    kmod-libs.x86_64                              33-3.fc42       anaconda
    root@fmpc 2025-09-22 16:21:12

PS: I did first a "dnf --exclude=\*VirtualBox\* update" that updated ~180 RPMs including the
    6.16.7-200.fc42 kernel
    This update took ~ 4 minutes
    akmods failed to install the (correctly) build kmod-VirtualBox RPM:
    
        tail /var/cache/akmods/VirtualBox/7.2.0-1-for-6.16.7-200.fc42.x86_64.failed.log
        Transaction Summary:
         Installing:         1 package

        Total size of inbound packages is 222 KiB. Need to download 0 B.
        After this operation, 211 KiB extra will be used (install 211 KiB, remove 0 B).
        Running transaction
        Transaction failed: Failed to obtain rpm transaction lock. Another transaction is in progress.
        Warning: skipped OpenPGP checks for 1 package from repository: @commandline
        2025/09/22 16:03:40 akmods: Could not install newly built RPMs. You can find them and the logfile in:
        2025/09/22 16:03:40 akmods: /var/cache/akmods/VirtualBox/7.2.0-1-for-6.16.7-200.fc42.x86_64.failed.log
        root@fmpc 2025-09-22 16:19:07

    known problem with dnf5, see rhbz#2358625

Comment 16 Fedora Update System 2025-09-23 01:44:40 UTC
FEDORA-2025-d5be7c00c4 has been pushed to the Fedora 43 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-d5be7c00c4`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-d5be7c00c4

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 17 Fedora Update System 2025-09-23 01:57:44 UTC
FEDORA-EPEL-2025-08996c0ee3 has been pushed to the Fedora EPEL 10.1 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-08996c0ee3

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 18 Fedora Update System 2025-09-23 02:04:34 UTC
FEDORA-EPEL-2025-33ea68b80b has been pushed to the Fedora EPEL 10.2 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-33ea68b80b

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 19 Fedora Update System 2025-09-23 02:19:32 UTC
FEDORA-EPEL-2025-b5c526537e has been pushed to the Fedora EPEL 10.0 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-b5c526537e

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 20 Fedora Update System 2025-09-23 02:46:04 UTC
FEDORA-2025-b18a85b925 has been pushed to the Fedora 42 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-b18a85b925`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-b18a85b925

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 21 Fedora Update System 2025-09-26 00:20:06 UTC
FEDORA-2025-d5be7c00c4 (akmods-0.6.1-2.fc43) has been pushed to the Fedora 43 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 22 Fedora Update System 2025-09-26 01:09:56 UTC
FEDORA-2025-b18a85b925 (akmods-0.6.1-2.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 23 Fedora Update System 2025-10-01 00:27:33 UTC
FEDORA-EPEL-2025-33ea68b80b (akmods-0.6.1-2.el10_2) has been pushed to the Fedora EPEL 10.2 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 24 Fedora Update System 2025-10-01 00:30:43 UTC
FEDORA-EPEL-2025-08996c0ee3 (akmods-0.6.1-2.el10_1) has been pushed to the Fedora EPEL 10.1 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 25 Fedora Update System 2025-10-01 00:41:52 UTC
FEDORA-EPEL-2025-b5c526537e (akmods-0.6.1-2.el10_0) has been pushed to the Fedora EPEL 10.0 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.