Bug 2011120

Summary: kmod failed to load after upgrade Fedora using dnf system-upgrade
Product: [Fedora] Fedora Reporter: Rodd Clarkson <rodd>
Component: akmodsAssignee: Nicolas Chauvet (kwizart) <kwizart>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 37CC: drepper, Francis.Montagnac, hdegoede, hugh, johnpilk222, kwizart, leigh123linux, nicolas.vieville, russ+bugzilla-redhat, sergio, tchollingsworth, vtq-gnome
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
akmods-on-stop
none
95-akmodsposttrans.install
none
akmods command none

Description Rodd Clarkson 2021-10-06 03:19:40 UTC
Description of problem:


Upgraded to Fedora 35 beta using commandline.  All worked fine.

However the kmod-nvidia drvier wouldn't load, forcing the use of nouveau.


How reproducible:

Every reboot after the install.


Additional info:

I fixed this by removing the kmod-nvidia rpm package:

$ sudo dnf remove kmod-nvidia-[kernelversion]

and then rebuilding with:

$ sudo akmods --force --kernels $(uname -r)


There might be a more graceful way.

I'm guessing that because the kmod was compiled using the 34 tool chain that the kmod isn't compiling in a way that's right for 35 beta, but I stress that this is just a guess.

I originally filed this with rpmfusion and they suggested I report it here. ( https://bugzilla.rpmfusion.org/show_bug.cgi?id=6099 )

Comment 1 Nicolas Chauvet (kwizart) 2021-10-06 07:02:22 UTC
Can you report which upgrade process you have followed exactly ? (step by step ?)

Comment 2 Rodd Clarkson 2021-10-09 10:25:15 UTC
I used the whole

$ sudo dnf system-upgrade --releasever=35 download

and then

$ sudo dnf systemupgrade reboot

Comment 3 Nicolas Chauvet (kwizart) 2021-11-04 10:20:09 UTC
I've always used dnf distro-sync --releasever=35 and never had such issue.

Akmod is scheduled to run "at the very end" of the RPM transaction, so it's expected to run akmod with the newer gcc/userspace.

I will try to reproduce using system-upgrade, maybe there are some races and that situation can be detected in order to defer the akmod build on a later step.

Thanks for the report. Hopefully this can be fixed by f36
Also if you can have a closer look, any patch will be welcomed.

Comment 4 Russell Odom 2022-01-04 19:24:13 UTC
It looks like the same underlying problem may also be the cause of xtables-addons not working correctly after upgrade to F35, which I logged at https://bugzilla.rpmfusion.org/show_bug.cgi?id=6165 initially.

A "more graceful way" to work around it was to run `depmod` - this avoided having to uninstall and reinstall.

Comment 5 Nicolas Chauvet (kwizart) 2022-01-04 19:40:05 UTC
Thanks for joining the bug.

If one could easily reproduce the issue by upgrading from f34 to f35 using a vm with a given kmod:
- dnf system upgrade
- dnf distro-sync --releasever=35 

And see which version exhibit the issue. ?

Comment 6 Nicolas Chauvet (kwizart) 2022-03-30 17:00:51 UTC
I've managed to reproduce with f35->f36 using system-upgrade. (where a gcc upgrade occurred).

Few observation:
- kmod-foo was created for the target kernel from the upgrade (tested with VirtualBox)
- no files where created in /var/cache/akmods/ for the kmod built by akmods (uname -r logs or any rpm)
- After booting to the target kernel the module where not found
- using depmod -ae was enough to have a functional module loading...

Comment 7 leigh scott 2022-05-09 14:44:05 UTC
*** Bug 2083069 has been marked as a duplicate of this bug. ***

Comment 8 Sergio Basto 2022-05-09 14:58:47 UTC
Hi, can you post please the relevant log file of /var/cache/akmods/nvidia/*log ? 

Thank you

Comment 9 Sergio Basto 2022-05-09 15:20:54 UTC
(In reply to Nicolas Chauvet (kwizart) from comment #6)
> I've managed to reproduce with f35->f36 using system-upgrade. (where a gcc
> upgrade occurred).
> 
> Few observation:
> - kmod-foo was created for the target kernel from the upgrade (tested with
> VirtualBox)
> - no files where created in /var/cache/akmods/ for the kmod built by akmods
> (uname -r logs or any rpm)
> - After booting to the target kernel the module where not found
> - using depmod -ae was enough to have a functional module loading...

yeah, no log file for me too, looking to journalctl of upgrade  [1] , I found this:

Apr 26 22:58:31 ideapad.local systemd[1]: Created slice Slice /system/akmods-keygen.
Apr 26 22:58:31 ideapad.local systemd[1]: Condition check resulted in Akmods Secure boot MOK Key Generation being skipped.
Apr 26 22:58:31 ideapad.local systemd[1]: Created slice Slice /system/akmods.
Apr 26 22:58:31 ideapad.local systemd[1]: Reached target akmods-keygen.target.
Apr 26 22:58:31 ideapad.local systemd[1]: Starting Builds and install new kmods from akmod for a given kernel...
Apr 26 22:58:31 ideapad.local systemd[1]: akmods.4-200.fc35.x86_64.service: Main process exited, code=killed, status=15/TERM
Apr 26 22:58:31 ideapad.local systemd[1]: akmods.4-200.fc35.x86_64.service: Failed with result 'signal'.
Apr 26 22:58:31 ideapad.local systemd[1]: Stopped Builds and install new kmods from akmod for a given kernel.
(...)
Apr 26 22:58:31 ideapad.local systemd[1]: Starting Builds and install new kmods from akmod for a given kernel...Apr 26 22:58:31 ideapad.local systemd[1]: Starting Builds and install new kmods from akmod for a given kernel...
Apr 26 22:58:31 ideapad.local dnf[2553]:   Running scriptlet: kernel-core-5.17.4-200.fc35.x86_64             15562/15562
Apr 26 22:58:31 ideapad.local systemd-inhibit[84625]: Checking kmods exist for 5.17.4-200.fc35.x86_64[  OK  ]
Apr 26 22:58:31 ideapad.local dnf[2553]:   Running scriptlet: kernel-modules-5.17.4-200.fc35.x86_64          15562/15562
Apr 26 22:58:31 ideapad.local runuser[84776]: pam_unix(runuser:session): session opened for user akmods(uid=966) by (uid=0)
(...)
Apr 26 22:59:25 ideapad.local systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Apr 26 22:59:45 ideapad.local dnf[2553]: yes: standard output: Broken pipe
Apr 26 23:00:24 ideapad.local runuser[84776]: pam_unix(runuser:session): session closed for user akmods
Apr 26 23:01:03 ideapad.local systemd[1]: man-db-cache-update.service: Deactivated successfully.
Apr 26 23:01:03 ideapad.local systemd[1]: Finished man-db-cache-update.service.
Apr 26 23:01:03 ideapad.local systemd[1]: man-db-cache-update.service: Consumed 2min 33.670s CPU time.
Apr 26 23:01:03 ideapad.local systemd[1]: run-rcd0909b00df94bd7b452f2a3ceacf630.service: Deactivated successfully.
Apr 26 23:01:03 ideapad.local systemd[1]: run-re0c1e9e92342484abd30ecaa1fb58e4d.service: Deactivated successfully.



[1]
dnf system-upgrade log
The following boots appear to contain upgrade logs:
1 / 5587618b3248441f921dfa8d52d24567: 2022-04-26 21:13:23 34?35

journalctl -b 5587618b3248441f921dfa8d52d24567 > system_upgrade_full.log

Comment 10 Sergio Basto 2022-05-11 09:57:30 UTC
we got another report here https://bugzilla.rpmfusion.org/show_bug.cgi?id=6177

The quick fix is run `depmod -ae` or reinstall the kernel

Comment 11 Sergio Basto 2022-05-16 22:47:02 UTC
Looking at latest changes (commits) of system_upgrade ( https://github.com/rpm-software-management/dnf-plugins-extras/commits/master/plugins/system_upgrade.py )

I found out that we can consult dnf history [1] and we can see that Return-Code  is failure [2], any guess why ? 


[1] 
dnf history 2215..2217

ID     | Command line                                                                                           | Date and time    | Action(s)      | Altered
-------------------------------------------------------------------------------------------------------------------------------------------------------------
2217 | -y install --disablerepo=* /tmp/akmods.C15x84YV/results/kmod-VirtualBox-5.17.4-200.fc35.x86_64-6.1.34- | 2022-04-26 23:20 | Install        |    1  <
2216 | -y install --disablerepo=* /tmp/akmods.G8phDm8j/results/kmod-nvidia-5.17.4-200.fc35.x86_64-510.60.02-1 | 2022-04-26 23:20 | Install        |    1 ><
2215 | system-upgrade upgrade                                                                                 | 2022-04-26 21:20 | C, D, E, I, O, | 7952 >E


[2] 
dnf history info 2216
Transaction ID : 2216
Begin time     : Tue 26 Apr 2022 11:20:16 PM WEST
Begin rpmdb    : 6bf75c92264d4a9a636b088015e12b9cf256d57d
End time       : Thu 01 Jan 1970 01:00:00 AM CET (-1651011616 seconds)
End rpmdb      :
User           : System <unset>
Return-Code    : Failure: 1
Releasever     :
Command Line   : -y install --disablerepo=* /tmp/akmods.G8phDm8j/results/kmod-nvidia-5.17.4-200.fc35.x86_64-510.60.02-1.fc35.x86_64.rpm
Comment        :
Packages Altered:
** Install kmod-nvidia-5.17.4-200.fc35.x86_64-3:510.60.02-1.fc35.x86_64 @@commandline@commandlinet shows

Comment 12 D. Hugh Redelmeier 2022-05-31 19:33:42 UTC
I hit this problem moving to F36.

- once on a system using the proprietary nvidia driver (module) from RPMFusion

- once on a system needing the broadcom-wl driver (module) from RPMFusion

I'd experienced this long ago with broadcom-wl and solved it with "sudo depmod".  I don't remember how I figured that out.

After reading this bz, I tried "sudo depmod -ae" on my system with the nvidia driver but got an error message:
  depmod: WARNING: -e needs -E or -F
So "sudo depmod" (which implies -a) seems better.

(I hate having to use the nvidia driver but the nouveau driver crashes on my system.  Hard.)

Comment 13 Sergio Basto 2022-05-31 22:45:56 UTC
try just `depmod -a`

Comment 14 T.C. Hollingsworth 2022-06-01 13:36:32 UTC
Also happening when just updating Fedora 36 using offline upgrade when there is a kernel update, had to `depmod -a` to get it to work again.

$ sudo dnf history info 155 156
Transaction ID : 156
Begin time     : Sat 28 May 2022 12:56:09 PM MST
Begin rpmdb    : 72071564f8eb6206b52309d9c8775aecc57079ca
End time       : Wed 31 Dec 1969 05:00:00 PM MST (-1653767769 seconds)
End rpmdb      : 
User           : System <unset>
Return-Code    : Failure: 1
Releasever     : 
Command Line   : -y install --disablerepo=* /tmp/akmods.eHY2sgJ5/results/kmod-wl-5.17.11-300.fc36.x86_64-6.30.223.271-41.fc36.x86_64.rpm
Comment        : 
Packages Altered:
 ** Install kmod-wl-5.17.11-300.fc36.x86_64-6.30.223.271-41.fc36.x86_64 @@commandline
-------------------------------------------------------------------------------
Transaction ID : 155
Begin time     : Sat 28 May 2022 12:42:42 PM MST
Begin rpmdb    : 07f9640cf98a319055b089699aa8e0538c353daa
End time       : Sat 28 May 2022 12:56:01 PM MST (13 minutes)
End rpmdb      : 72071564f8eb6206b52309d9c8775aecc57079ca
User           : System <unset>
Return-Code    : Success
Releasever     : 36
Command Line   : system-upgrade upgrade
Comment        : 
Packages Altered:
    Reason Change Box2D-2.4.1-7.fc36.x86_64                                  @fedora
    Install       kernel-5.17.11-300.fc36.x86_64                             @updates
    Install       kernel-core-5.17.11-300.fc36.x86_64                        @updates
    Install       kernel-devel-5.17.11-300.fc36.x86_64                       @updates
    Install       kernel-modules-5.17.11-300.fc36.x86_64                     @updates
    Install       kernel-modules-extra-5.17.11-300.fc36.x86_64               @updates
    Upgrade       btrfs-progs-5.18-1.fc36.x86_64                             @updates
    Upgraded      btrfs-progs-5.16.2-1.fc36.x86_64                           @@System
    Upgrade       kernel-devel-matched-5.17.11-300.fc36.x86_64               @updates
    Upgraded      kernel-devel-matched-5.17.9-300.fc36.x86_64                @@System
    Upgrade       kernel-headers-5.17.11-300.fc36.x86_64                     @updates
    Upgraded      kernel-headers-5.17.6-300.fc36.x86_64                      @@System
    Upgrade       kernel-tools-5.17.11-300.fc36.x86_64                       @updates
    Upgraded      kernel-tools-5.17.6-300.fc36.x86_64                        @@System
    Upgrade       kernel-tools-libs-5.17.11-300.fc36.x86_64                  @updates
    Upgraded      kernel-tools-libs-5.17.6-300.fc36.x86_64                   @@System
    Upgrade       libbytesize-2.7-1.fc36.x86_64                              @updates
    Upgraded      libbytesize-2.6-3.fc36.x86_64                              @@System
    Upgrade       perf-5.17.11-300.fc36.x86_64                               @updates
    Upgraded      perf-5.17.6-300.fc36.x86_64                                @@System
    Upgrade       python3-bytesize-2.7-1.fc36.x86_64                         @updates
    Upgraded      python3-bytesize-2.6-3.fc36.x86_64                         @@System
    Upgrade       systemd-250.6-1.fc36.x86_64                                @updates
    Upgraded      systemd-250.3-8.fc36.x86_64                                @@System
    Upgrade       systemd-container-250.6-1.fc36.x86_64                      @updates
    Upgraded      systemd-container-250.3-8.fc36.x86_64                      @@System
    Upgrade       systemd-libs-250.6-1.fc36.x86_64                           @updates
    Upgraded      systemd-libs-250.3-8.fc36.x86_64                           @@System
    Upgrade       systemd-networkd-250.6-1.fc36.x86_64                       @updates
    Upgraded      systemd-networkd-250.3-8.fc36.x86_64                       @@System
    Upgrade       systemd-oomd-defaults-250.6-1.fc36.noarch                  @updates
    Upgraded      systemd-oomd-defaults-250.3-8.fc36.noarch                  @@System
    Upgrade       systemd-pam-250.6-1.fc36.x86_64                            @updates
    Upgraded      systemd-pam-250.3-8.fc36.x86_64                            @@System
    Upgrade       systemd-resolved-250.6-1.fc36.x86_64                       @updates
    Upgraded      systemd-resolved-250.3-8.fc36.x86_64                       @@System
    Upgrade       systemd-rpm-macros-250.6-1.fc36.noarch                     @updates
    Upgraded      systemd-rpm-macros-250.3-8.fc36.noarch                     @@System
    Upgrade       systemd-udev-250.6-1.fc36.x86_64                           @updates
    Upgraded      systemd-udev-250.3-8.fc36.x86_64                           @@System
    Upgrade       xorg-x11-server-Xwayland-22.1.2-1.fc36.x86_64              @updates
    Upgraded      xorg-x11-server-Xwayland-22.1.1-1.fc36.x86_64              @@System
    Upgrade       bind-libs-32:9.16.29-1.fc36.x86_64                         @updates-testing
    Upgraded      bind-libs-32:9.16.28-1.fc36.x86_64                         @@System
    Upgrade       bind-license-32:9.16.29-1.fc36.noarch                      @updates-testing
    Upgraded      bind-license-32:9.16.28-1.fc36.noarch                      @@System
    Upgrade       bind-utils-32:9.16.29-1.fc36.x86_64                        @updates-testing
    Upgraded      bind-utils-32:9.16.28-1.fc36.x86_64                        @@System
    Upgrade       container-selinux-2:2.187.0-1.fc36.noarch                  @updates-testing
    Upgraded      container-selinux-2:2.183.0-4.fc36.noarch                  @@System
    Upgrade       dnf-plugins-core-4.2.1-1.fc36.noarch                       @updates-testing
    Upgraded      dnf-plugins-core-4.2.0-1.fc36.noarch                       @@System
    Upgrade       dnf-utils-4.2.1-1.fc36.noarch                              @updates-testing
    Upgraded      dnf-utils-4.2.0-1.fc36.noarch                              @@System
    Upgrade       firefox-100.0.2-2.fc36.x86_64                              @updates-testing
    Upgraded      firefox-100.0.2-1.fc36.x86_64                              @@System
    Upgrade       glibc-2.35-10.fc36.x86_64                                  @updates-testing
    Upgraded      glibc-2.35-9.fc36.x86_64                                   @@System
    Upgrade       glibc-all-langpacks-2.35-10.fc36.x86_64                    @updates-testing
    Upgraded      glibc-all-langpacks-2.35-9.fc36.x86_64                     @@System
    Upgrade       glibc-common-2.35-10.fc36.x86_64                           @updates-testing
    Upgraded      glibc-common-2.35-9.fc36.x86_64                            @@System
    Upgrade       glibc-devel-2.35-10.fc36.x86_64                            @updates-testing
    Upgraded      glibc-devel-2.35-9.fc36.x86_64                             @@System
    Upgrade       glibc-doc-2.35-10.fc36.noarch                              @updates-testing
    Upgraded      glibc-doc-2.35-9.fc36.noarch                               @@System
    Upgrade       glibc-gconv-extra-2.35-10.fc36.x86_64                      @updates-testing
    Upgraded      glibc-gconv-extra-2.35-9.fc36.x86_64                       @@System
    Upgrade       glibc-headers-x86-2.35-10.fc36.noarch                      @updates-testing
    Upgraded      glibc-headers-x86-2.35-9.fc36.noarch                       @@System
    Upgrade       glibc-langpack-en-2.35-10.fc36.x86_64                      @updates-testing
    Upgraded      glibc-langpack-en-2.35-9.fc36.x86_64                       @@System
    Upgrade       glibc-minimal-langpack-2.35-10.fc36.x86_64                 @updates-testing
    Upgraded      glibc-minimal-langpack-2.35-9.fc36.x86_64                  @@System
    Upgrade       gnutls-3.7.6-1.fc36.x86_64                                 @updates-testing
    Upgraded      gnutls-3.7.5-1.fc36.x86_64                                 @@System
    Upgrade       gnutls-c++-3.7.6-1.fc36.x86_64                             @updates-testing
    Upgraded      gnutls-c++-3.7.5-1.fc36.x86_64                             @@System
    Upgrade       gnutls-dane-3.7.6-1.fc36.x86_64                            @updates-testing
    Upgraded      gnutls-dane-3.7.5-1.fc36.x86_64                            @@System
    Upgrade       gnutls-devel-3.7.6-1.fc36.x86_64                           @updates-testing
    Upgraded      gnutls-devel-3.7.5-1.fc36.x86_64                           @@System
    Upgrade       gnutls-utils-3.7.6-1.fc36.x86_64                           @updates-testing
    Upgraded      gnutls-utils-3.7.5-1.fc36.x86_64                           @@System
    Upgrade       ibus-1.5.26-7.fc36.x86_64                                  @updates-testing
    Upgraded      ibus-1.5.26-6.fc36.x86_64                                  @@System
    Upgrade       ibus-gtk2-1.5.26-7.fc36.x86_64                             @updates-testing
    Upgraded      ibus-gtk2-1.5.26-6.fc36.x86_64                             @@System
    Upgrade       ibus-gtk3-1.5.26-7.fc36.x86_64                             @updates-testing
    Upgraded      ibus-gtk3-1.5.26-6.fc36.x86_64                             @@System
    Upgrade       ibus-gtk4-1.5.26-7.fc36.x86_64                             @updates-testing
    Upgraded      ibus-gtk4-1.5.26-6.fc36.x86_64                             @@System
    Upgrade       ibus-libs-1.5.26-7.fc36.x86_64                             @updates-testing
    Upgraded      ibus-libs-1.5.26-6.fc36.x86_64                             @@System
    Upgrade       ibus-setup-1.5.26-7.fc36.noarch                            @updates-testing
    Upgraded      ibus-setup-1.5.26-6.fc36.noarch                            @@System
    Upgrade       libdv-1.0.0-36.fc36.x86_64                                 @updates-testing
    Upgraded      libdv-1.0.0-35.fc36.x86_64                                 @@System
    Upgrade       libnotify-0.7.12-1.fc36.x86_64                             @updates-testing
    Upgraded      libnotify-0.7.11-1.fc36.x86_64                             @@System
    Upgrade       logrotate-3.20.1-2.fc36.x86_64                             @updates-testing
    Upgraded      logrotate-3.19.0-2.fc36.x86_64                             @@System
    Upgrade       mariadb-3:10.5.16-1.fc36.x86_64                            @updates-testing
    Upgraded      mariadb-3:10.5.15-1.fc36.x86_64                            @@System
    Upgrade       mariadb-backup-3:10.5.16-1.fc36.x86_64                     @updates-testing
    Upgraded      mariadb-backup-3:10.5.15-1.fc36.x86_64                     @@System
    Upgrade       mariadb-common-3:10.5.16-1.fc36.x86_64                     @updates-testing
    Upgraded      mariadb-common-3:10.5.15-1.fc36.x86_64                     @@System
    Upgrade       mariadb-cracklib-password-check-3:10.5.16-1.fc36.x86_64    @updates-testing
    Upgraded      mariadb-cracklib-password-check-3:10.5.15-1.fc36.x86_64    @@System
    Upgrade       mariadb-embedded-3:10.5.16-1.fc36.x86_64                   @updates-testing
    Upgraded      mariadb-embedded-3:10.5.15-1.fc36.x86_64                   @@System
    Upgrade       mariadb-errmsg-3:10.5.16-1.fc36.x86_64                     @updates-testing
    Upgraded      mariadb-errmsg-3:10.5.15-1.fc36.x86_64                     @@System
    Upgrade       mariadb-gssapi-server-3:10.5.16-1.fc36.x86_64              @updates-testing
    Upgraded      mariadb-gssapi-server-3:10.5.15-1.fc36.x86_64              @@System
    Upgrade       mariadb-server-3:10.5.16-1.fc36.x86_64                     @updates-testing
    Upgraded      mariadb-server-3:10.5.15-1.fc36.x86_64                     @@System
    Upgrade       mariadb-server-utils-3:10.5.16-1.fc36.x86_64               @updates-testing
    Upgraded      mariadb-server-utils-3:10.5.15-1.fc36.x86_64               @@System
    Upgrade       mokutil-2:0.6.0-3.fc36.x86_64                              @updates-testing
    Upgraded      mokutil-2:0.6.0-2.fc36.x86_64                              @@System
    Upgrade       openldap-2.6.2-1.fc36.x86_64                               @updates-testing
    Upgraded      openldap-2.6.1-2.fc36.x86_64                               @@System
    Upgrade       openldap-compat-2.6.2-1.fc36.x86_64                        @updates-testing
    Upgraded      openldap-compat-2.6.1-2.fc36.x86_64                        @@System
    Upgrade       python3-construct-2.10.68-1.fc36.noarch                    @updates-testing
    Upgraded      python3-construct-2.10.67-4.fc36.noarch                    @@System
    Upgrade       python3-dnf-plugins-core-4.2.1-1.fc36.noarch               @updates-testing
    Upgraded      python3-dnf-plugins-core-4.2.0-1.fc36.noarch               @@System
    Upgrade       qt6-qtdeclarative-6.3.0-2.fc36.x86_64                      @updates-testing
    Upgraded      qt6-qtdeclarative-6.3.0-1.fc36.x86_64                      @@System
    Upgrade       qt6-qtsvg-6.3.0-3.fc36.x86_64                              @updates-testing
    Upgraded      qt6-qtsvg-6.3.0-1.fc36.x86_64                              @@System
    Upgrade       valgrind-1:3.19.0-3.fc36.x86_64                            @updates-testing
    Upgraded      valgrind-1:3.19.0-1.fc36.x86_64                            @@System
    Removed       kernel-5.17.8-100.fc34.x86_64                              @@System
    Removed       kernel-core-5.17.8-100.fc34.x86_64                         @@System
    Removed       kernel-devel-5.17.8-100.fc34.x86_64                        @@System
    Removed       kernel-modules-5.17.8-100.fc34.x86_64                      @@System
    Removed       kernel-modules-extra-5.17.8-100.fc34.x86_64                @@System
    Removed       kmod-wl-5.17.8-100.fc34.x86_64-6.30.223.271-41.fc34.x86_64 @@System
Scriptlet output:
   1 '/etc/resolv.conf' -> '../run/systemd/resolve/stub-resolv.conf'

Comment 15 D. Hugh Redelmeier 2022-11-19 02:19:48 UTC
Same problem upgrading F36 => F37 via dnf.
Same cure worked: "sudo depmod"

Comment 16 D. Hugh Redelmeier 2022-11-19 03:42:42 UTC
Further to my last comment (#15):

I was reporting a problem with my desktop's upgrade to F37.

I had no trouble upgrading my notebook F36 => F37, even though it has an nvidia GPU too.
There are two differences that might matter.

For updating, I used DNF on my desktop but used Gnome "Software" on my notebook.

My desktop has an older card (GeForce GTX 650) that requires "legacy" nvidia drivers.
My notebook still uses current drivers.

The desktop's kmods are:
kmod-nvidia-470xx-470.141.03-3.fc37.x86_64
kmod-nvidia-470xx-6.0.8-300.fc37.x86_64-470.141.03-3.fc37.x86_64

The notebook's kmods are:
kmod-nvidia-520.56.06-1.fc37.x86_64
kmod-nvidia-6.0.8-300.fc37.x86_64-520.56.06-1.fc37.x86_64

Comment 17 Sergio Basto 2023-03-04 17:41:10 UTC
(In reply to Sergio Basto from comment #11)
> Looking at latest changes (commits) of system_upgrade (
> https://github.com/rpm-software-management/dnf-plugins-extras/commits/master/
> plugins/system_upgrade.py )
> 
> I found out that we can consult dnf history [1] and we can see that
> Return-Code  is failure [2], any guess why ? 
> 
> 
> [1] 
> dnf history 2215..2217
> 
> ID     | Command line                                                       
> | Date and time    | Action(s)      | Altered
> -----------------------------------------------------------------------------
> -----------------------------------------------------------------------------
> ---
> 2217 | -y install --disablerepo=*
> /tmp/akmods.C15x84YV/results/kmod-VirtualBox-5.17.4-200.fc35.x86_64-6.1.34-
> | 2022-04-26 23:20 | Install        |    1  <
> 2216 | -y install --disablerepo=*
> /tmp/akmods.G8phDm8j/results/kmod-nvidia-5.17.4-200.fc35.x86_64-510.60.02-1
> | 2022-04-26 23:20 | Install        |    1 ><
> 2215 | system-upgrade upgrade                                               
> | 2022-04-26 21:20 | C, D, E, I, O, | 7952 >E
> 
> 
> [2] 
> dnf history info 2216
> Transaction ID : 2216
> Begin time     : Tue 26 Apr 2022 11:20:16 PM WEST
> Begin rpmdb    : 6bf75c92264d4a9a636b088015e12b9cf256d57d
> End time       : Thu 01 Jan 1970 01:00:00 AM CET (-1651011616 seconds)
> End rpmdb      :
> User           : System <unset>
> Return-Code    : Failure: 1
> Releasever     :
> Command Line   : -y install --disablerepo=*
> /tmp/akmods.G8phDm8j/results/kmod-nvidia-5.17.4-200.fc35.x86_64-510.60.02-1.
> fc35.x86_64.rpm
> Comment        :
> Packages Altered:
> ** Install kmod-nvidia-5.17.4-200.fc35.x86_64-3:510.60.02-1.fc35.x86_64
> @@commandline@commandlinet shows

one different that have new akmods, is building for kmods for default kernel ...

after digging now I think is the a duplicated of bug #1518401

/usr/lib/kernel/install.d/95-akmodsposttrans.install have [1] 

[1]
# Exit early if system-update.target is active - rhbz#1518401
/bin/systemctl is-active system-update.target &>/dev/null
RET=$?
[ $RET == 0 ] && exit 0
/bin/systemctl restart akmods@${KERNEL_VERSION}.service --no-block >/dev/null 2>&1


but in in my system upgrade I see [2]
[2]
Dec 24 01:10:08 ideapad.local dracut[74782]: *** Creating initramfs image file '/boot/initramfs-6.0.14-300.fc37.x86_64.img' done ***
Dec 24 01:10:09 ideapad.local systemd[1]: Created slice system-akmods\x2dkeygen.slice - Slice /system/akmods-keygen.
Dec 24 01:10:09 ideapad.local systemd[1]: akmods-keygen - Akmods Secure boot MOK Key Generation was skipped because all trigger condition checks failed.
Dec 24 01:10:09 ideapad.local systemd[1]: Created slice system-akmods.slice - Slice /system/akmods.
Dec 24 01:10:09 ideapad.local systemd[1]: Reached target akmods-keygen.target.
Dec 24 01:10:09 ideapad.local systemd[1]: Starting akmods.14-300.fc37.x86_64.service - Builds and install new kmods from akmod for a given kernel...
Dec 24 01:10:09 ideapad.local systemd[1]: akmods.14-300.fc37.x86_64.service: Main process exited, code=killed, status=15/TERM
Dec 24 01:10:09 ideapad.local systemd[1]: akmods.14-300.fc37.x86_64.service: Failed with result 'signal'.
Dec 24 01:10:09 ideapad.local systemd[1]: Stopped akmods.14-300.fc37.x86_64.service - Builds and install new kmods from akmod for a given kernel.
Dec 24 01:10:09 ideapad.local systemd[1]: Starting akmods.14-300.fc37.x86_64.service - Builds and install new kmods from akmod for a given kernel...
Dec 24 01:10:09 ideapad.local dnf[1422]:   Running scriptlet: kernel-core-6.0.14-300.fc37.x86_64             15898/15898
Dec 24 01:10:09 ideapad.local systemd-inhibit[87174]: Checking kmods exist for 6.0.14-300.fc37.x86_64[  OK  ]
Dec 24 01:10:10 ideapad.local runuser[87324]: pam_unix(runuser:session): session opened for user akmods(uid=966) by (uid=0)

Comment 18 John Pilkington 2023-04-23 20:29:00 UTC
I had this problem on f35 > f36.  depmod -ae and a reboot fixed it.  Now it's back again on f36 > f37

The upgraded system is running well with the rpmfusion nvidia470xx driver and the f36 kernel, but fails to boot to the KDE plasma desktop under the f37 kernel.

The contents of /lib/modules/6.2.11-100.fc36.x86_64/extra/nvidia470xx and /lib/modules/6.2.11-200.fc37.x86_64/extra/nvidia470xx are similar but different and have believable creation dates.  

I'm also seeing 'unreportable' system errors apparently relating to nouveau.  Under the f36 kernel the nVidia config graphical display can be activated.

Comment 19 John Pilkington 2023-04-25 12:44:49 UTC
After 'dnf upgrade' including kernel-6.2.12-200.fc37.x86_86 and multiple kf5 5.105.0 packages, the system now boots to full Xorg graphics with the fc37 kernel.

That system also includes Windows 10 and boots via the HP EFI firmware.  A similar single-boot box using BIOS upgraded from fc36 > fc37 without problems.  Both boxes now seem ok.

Comment 20 Francis.Montagnac 2023-06-03 16:32:36 UTC
I got bitten too often by this, ie: depmod missing, even after a simple dnf update and even with akmods-shutdown enabled
and dig thus deeper.

akmods is called during the posttrans scriptlet of the kernel-core RPM but by 2 scripts:

  /usr/lib/kernel/install.d/95-akmodsposttrans.install
  /etc/kernel/postinst.d/akmodsposttrans
    indirectly by /usr/lib/kernel/install.d/95-kernel-hooks.install
  
Those two scripts call, in the systemd case:

  /bin/systemctl restart akmods@${1}.service --no-block

This service calls akmods for the given kernel.

A restart and not a start since the akmods@.service sets RemainAfterExit=yes.

The restart done by the second akmodsposttrans script produce the Main process exited, code=killed in Comment 17:

The depmod is done during the postinstall of the kmod RPM, thus stopping (as does a restart) akmods
may kill it.

I have seen once this in /var/cache/akmods/nvidia-340xx/.last.log

  <snip>
  Installing:
   kmod-nvidia-340xx-6.2.8-200.fc37.x86_64   x86_64  1:340.108-24.fc37 @commandline  3.4 M
  <snip>
    Running scriptlet: kmod-nvidia-340xx-6.2.8-200.fc37.x86_64-1:340.108-24   1/1 
  warning: %post(kmod-nvidia-340xx-6.2.8-200.fc37.x86_64-1:340.108-24.fc37.x86_64) scriptlet failed, signal 15

I suggest thus to remove the /etc/kernel/postinst.d/akmodsposttrans script.

The scripts in /usr/lib/kernel/install.d/ are preferred I think (at least on Fedora).

In addition, I don't see the point of setting RemainAfterExit=yes for this service. Setting RemainAfterExit=no and
calling "systemctl start" will work as well and will be safer: allowing to call multiple times "systemctl start"
while running only one instance ("systemctl start" of a running service is a noop).

Finally, improving the akmods command to do the depmod if needed would be great. I added an override to the
akmods-shutdown service to do that (for fedora 30 to 38). It does the following:

need_depmod () {
    local kernel=$1
    local mod
    local needed
    for mod in $(find /lib/modules/$kernel/extra -name '*.ko*' \
                      -printf "%f\n"
                ); do
        mod=${mod%%.*}
        if ! modinfo -n -k $kernel $mod >& /dev/null; then
            echo "$PROG: ** $mod not found for kernel $kernel"
            needed=1
            break
        fi
    done
    if [ "$needed" ]; then
        echo "$PROG: calling depmod -a $kernel"
        depmod -a $kernel
    fi
}

Comment 21 Francis.Montagnac 2023-06-04 14:12:43 UTC
Having only one akmodsposttrans will of course not solve the problem to have
to wait after the dnf transaction for akmods to finish before rebooting.

I think that this can be done as follows:

1. Modify akmods@.service to call akmods on stop (similar to akmods-shutdown):

      [Unit]
      Description=Builds and install new kmods from akmod for a given kernel on stop
      Wants=akmods-keygen.target
      After=akmods-keygen.target

      [Service]
      Type=oneshot
      RemainAfterExit=yes
      ExecStart=/bin/true
      ExecStop=/usr/sbin/akmods --from-kernel-posttrans --kernels %i
      TimeoutStopSec=infinity

    No need to use systemd-inhibit: this do not prevent root to reboot.

    No need of an [Install] section: this service will not be enabled.

    infinity better than 5min like in akmods-shutdown: you will not prevent
    the user to press the power button to stop his machine :-(

2. Modify akmodsposttrans:

    Replace:

      /bin/systemctl restart akmods@${1}.service --no-block >/dev/null 2>&1

    by:

      /bin/systemctl start akmods@${1}.service || exit
      /bin/systemctl stop akmods@${1}.service --no-block

Rebooting just after the dnf transaction will not kill akmods any more since,
as for start, stopping a service that is already stopping is a noop.

This has still to be improved for the case where systemctl start/stop is not
possible (ex: when running in a chroot), but this is a corner case.

Comment 22 Nicolas Chauvet (kwizart) 2023-06-12 13:50:53 UTC
(In reply to Francis.Montagnac from comment #20)
...
> akmods is called during the posttrans scriptlet of the kernel-core RPM but
> by 2 scripts:
> 
>   /usr/lib/kernel/install.d/95-akmodsposttrans.install
>   /etc/kernel/postinst.d/akmodsposttrans
>     indirectly by /usr/lib/kernel/install.d/95-kernel-hooks.install

This is a interesting finding. Not sure if it will fix this particular issue, but it may fix other you mentioned.
There is a need to verify when was made available (and used) each directory pattern...


> I have seen once this in /var/cache/akmods/nvidia-340xx/.last.log
...
> %post(kmod-nvidia-340xx-6.2.8-200.fc37.x86_64-1:340.108-24.fc37.x86_64)
> scriptlet failed, signal 15
> 
> I suggest thus to remove the /etc/kernel/postinst.d/akmodsposttrans script.
Why ? kernel post-install shouldn't interfere with kmod-foo postinstall, also fedora kmodtool (are you sure that you are using the original fedora version unmodified?), expects post/postun error to be bypassed on purpose (aka using "|| :").

> In addition, I don't see the point of setting RemainAfterExit=yes for this
> service. Setting RemainAfterExit=no and
> calling "systemctl start" will work as well and will be safer: allowing to
> call multiple times "systemctl start"
> while running only one instance ("systemctl start" of a running service is a
> noop).
Agreed.



> Finally, improving the akmods command to do the depmod if needed would be
Why ? using depmod on kmod registration is mandatory, so the only appropriate way is to have it made available on post/postun of any kmod-foo
Please remind that akmod is an optional feature and one could only rely on pre-built kmod instead and not use akmods.

I prefer that the original error to be fixed (and even, still don't get why an error could be output), instead of spreading the problem everywhere.


Also worth to mention that I discourage running akmod-shutdown at all. Specially with https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer
Using TimeoutStopSec=infinity is a pretty bad "non-solution" with that respect.

I'm more interested in Conflict with shutdown, instead... 
Also in order to fix the original report, I think it would be more interesting for akmod to register as system-update.target
(using Before=shutdown.target system-update.target as /usr/lib/systemd/system/dnf-system-upgrade.service does might help).

There are already various changes that I've landed in rawhide recently, so I'm going to fix the "agreed" issue before going further with that. So that can be evaluated on a saner basis.

Comment 23 Francis.Montagnac 2023-06-12 15:11:54 UTC
>>   /usr/lib/kernel/install.d/95-akmodsposttrans.install
>>   /etc/kernel/postinst.d/akmodsposttrans
>>     indirectly by /usr/lib/kernel/install.d/95-kernel-hooks.install

> This is a interesting finding. Not sure if it will fix this particular issue, but it may fix other you mentioned.

> There is a need to verify when was made available (and used) each directory pattern...

Not sure what you are asking here sorry.

>> I suggest thus to remove the /etc/kernel/postinst.d/akmodsposttrans script.
> Why ? 

Because those scripts do the same thing and are run from the posttrans of kernel-core through:

  /bin/kernel-install add KERNEL_VERSION ...

that calls:

  /usr/lib/kernel/install.d/95-akmodsposttrans.install add KERNEL_VERSION ...
  /usr/lib/kernel/install.d/95-kernel-hooks.install add KERNEL_VERSION ...
    that calls: /etc/kernel/postinst.d/akmodsposttrans KERNEL_VERSION ...

> kernel post-install shouldn't interfere with kmod-foo postinstall,

I think this is unrelated. Those scripts are only run from kernel
post-install (posttrans).

> also fedora kmodtool (are you sure that you are using the original fedora
> version unmodified?), expects post/postun error to be bypassed on purpose
> (aka using "|| :").

I have kmodtool unmodified. (but this is IMO unrelated).

>> Finally, improving the akmods command to do the depmod if needed would be ...

Forget that. See my Comment 21 : Modify akmods@.service to call akmods on stop (similar to akmods-shutdown):

> using depmod on kmod registration is mandatory, so the only appropriate way
> is to have it made available on post/postun of any kmod-foo

I agree, but one have to prevent depmod to be killed. That may currently appens if
a reboot is done by root just after a kernel update. I have an old laptop on
which depmod takes around 20 seconds.

> I'm more interested in Conflict with shutdown, instead... 

+1

> Also in order to fix the original report, I think it would be more interesting
> for akmod to register as system-update.target

This would be not needed if you modify akmods@.service to call akmods on stop.

> There are already various changes that I've landed in rawhide recently, so
> I'm going to fix the "agreed" issue before going further with that. So that
> can be evaluated on a saner basis.

I understand, bu I think instead that one should definitively fix the problem
that the akmods done in kernel post-trans can be killed by a reboot. Modifying
akmods@.service to call akmods on stop will fix that.

I just noticed also that akmod-foo uses only nohup, and not systemctl.
(Example: rpm -q --scripts akmod-nvidia-470xx <snip>
  posttrans scriptlet (using /bin/sh):
  nohup /usr/sbin/akmods --from-akmod-posttrans --akmod nvidia-470xx &> /dev/null &
)
It seems that one needs another generic service for that, for example: akmods-akmod@.service.

Comment 24 Nicolas Chauvet (kwizart) 2023-06-12 16:06:03 UTC
(In reply to Francis.Montagnac from comment #23)
..
> >> I suggest thus to remove the /etc/kernel/postinst.d/akmodsposttrans script.
> > Why ? 
> 
> Because those scripts do the same thing and are run from the posttrans of
> kernel-core through:
...
I got the duplicate reasoning perfectly fine in the first step, but here you cut the context of the failing kmod-nvidia-340xx ?
Why

> > kernel post-install shouldn't interfere with kmod-foo postinstall,
> 
> I think this is unrelated. Those scripts are only run from kernel
> post-install (posttrans).

This is exactly what I mean, this is unrelated. So why this would be a solution for kmod-nvidia-340xx failing ?


> > also fedora kmodtool (are you sure that you are using the original fedora
> > version unmodified?), expects post/postun error to be bypassed on purpose
> > (aka using "|| :").
> 
> I have kmodtool unmodified. (but this is IMO unrelated).
Well, knowing why kmod-nvidia-340xx would fail is a big question! It is --logically-- forbidden to fail on post. So there is a bigger error here (rpm is failing for some reason).

Here is a extract of the kmod-nvidia-$(uname -r)-%{evr}.rpm post script:
if [ -f /boot/System.map-6.3.5-100.fc37.x86_64 ] ; then   /usr/sbin/depmod -aeF /boot/System.map-6.3.5-100.fc37.x86_64 6.3.5-100.fc37.x86_64 >/dev/null ; elif [ -f /lib/modules/6.3.5-100.fc37.x86_64/System.map ] ; then   /usr/sbin/depmod -aeF /lib/modules/6.3.5-100.fc37.x86_64/System.map 6.3.5-100.fc37.x86_64 >/dev/null ; else   /usr/sbin/depmod -a >/dev/null ; fi || :

Can you run it on your system and report where it failed (when running as root) ?

> I agree, but one have to prevent depmod to be killed. That may currently
> appens if
> a reboot is done by root just after a kernel update. I have an old laptop on
> which depmod takes around 20 seconds.
If system is slow, depmod is not the only component that can fail. RPM could also not fully extract binaries on filesystem or report to be installed.
If ever we need to recover from an early reboot breakage, I would recommends to consider reinstalling than fixing all in-between small steps...

> > I'm more interested in Conflict with shutdown, instead... 
> 
> +1
> 
> > Also in order to fix the original report, I think it would be more interesting
> > for akmod to register as system-update.target
> 
> This would be not needed if you modify akmods@.service to call akmods on
> stop.
Why do you think that moving from start to stop ever change anything ? (specially as this won't prevent timeout on root reboots).
This is totally unusual and sounds like a bad hack.

> > There are already various changes that I've landed in rawhide recently, so
> > I'm going to fix the "agreed" issue before going further with that. So that
> > can be evaluated on a saner basis.
> 
> I understand, bu I think instead that one should definitively fix the problem
> that the akmods done in kernel post-trans can be killed by a reboot.
> Modifying
> akmods@.service to call akmods on stop will fix that.
> 
> I just noticed also that akmod-foo uses only nohup, and not systemctl.
> (Example: rpm -q --scripts akmod-nvidia-470xx <snip>
>   posttrans scriptlet (using /bin/sh):
>   nohup /usr/sbin/akmods --from-akmod-posttrans --akmod nvidia-470xx &>
> /dev/null &
> )

This is unrelated to kernel upgrades. Here it is raw akmod-foo upgrade. (say moving to akmod-nvidia-525 to akmod-nvidia-535), but this could appears often as distro upgrade. So this raise the following questions:
- Protect this task from unexpected reboot to the best possible (running as a service as suggested or using systemd-run  instead of nohup).
- Avoid any duplicate tasks when there are both akmod version upgrade transaction and kernel transactions...

Comment 25 Nicolas Chauvet (kwizart) 2023-06-12 17:07:49 UTC
> - Avoid any duplicate tasks when there are both akmod version upgrade transaction and kernel transactions...
We might need to put a temporary lock in akmod-foo pre-transaction task, so any later akmods@.service task will noop

Thanks for the catch on all theses issues Francis !

Comment 26 Francis.Montagnac 2023-06-13 06:57:02 UTC
>(In reply to Nicolas Chauvet (kwizart) from comment #24)
> (In reply to Francis.Montagnac from comment #23)

> I got the duplicate reasoning perfectly fine in the first step, but here you
> cut the context of the failing kmod-nvidia-340xx ?
> Why

Because the kmod-nvidia-340xx RPM is build by akmodposttrans and is not failing
but killed. See below.

>>> kernel post-install shouldn't interfere with kmod-foo postinstall,

>> I think this is unrelated. Those scripts are only run from kernel
>> post-install (posttrans).
 
> This is exactly what I mean, this is unrelated. So why this would be a
> solution for kmod-nvidia-340xx failing ?

As I said in Comment 20 kmod-nvidia-340xx is not failing but killed:

  /var/cache/akmods/nvidia-340xx/.last.log
  warning: %post(kmod-nvidia-340xx-6.2.8-200.fc37.x86_64-1:340.108-24.fc37.x86_64) scriptlet failed, signal 15

This is most probably due to something stopping akmods@.service

>>> also fedora kmodtool (are you sure that you are using the original fedora
>>> version unmodified?), expects post/postun error to be bypassed on purpose
>>> (aka using "|| :").

>> I have kmodtool unmodified. (but this is IMO unrelated).

> Well, knowing why kmod-nvidia-340xx would fail is a big question! It is
> --logically-- forbidden to fail on post. So there is a bigger error here
> (rpm is failing for some reason).

Not failing but killed.

> Here is a extract of the kmod-nvidia-$(uname -r)-%{evr}.rpm post script:
> if [ -f /boot/System.map-6.3.5-100.fc37.x86_64 ] ; then   /usr/sbin/depmod
> -aeF /boot/System.map-6.3.5-100.fc37.x86_64 6.3.5-100.fc37.x86_64 >/dev/null
> ; elif [ -f /lib/modules/6.3.5-100.fc37.x86_64/System.map ] ; then  
> /usr/sbin/depmod -aeF /lib/modules/6.3.5-100.fc37.x86_64/System.map
> 6.3.5-100.fc37.x86_64 >/dev/null ; else   /usr/sbin/depmod -a >/dev/null ;
> fi || :
 
> Can you run it on your system and report where it failed (when running as
> root) ?

It works of course, using the first case: -f /boot/System.map-6.3.5-100.fc37.x86_64

>> I agree, but one have to prevent depmod to be killed. That may currently
>> appens if a reboot is done by root just after a kernel update. I have an old
>> laptop on which depmod takes around 20 seconds.

> If system is slow, depmod is not the only component that can fail.

Yes, except if a proper timeout is set up for them.

A timeout of infinity is appropriate for proper and crucial components like
akmods. I don't see any context where akmods may block infinitely.

Do you see such cases ?

> RPM could also not fully extract binaries on filesystem or report to be
> installed.

This is really unlikely since the kmod-foo RPMs only contains a few files.

> If ever we need to recover from an early reboot breakage, I would recommends
> to consider reinstalling than fixing all in-between small steps...

Right, except that akmods is slightly wrong to detect if a kmod-foo is usable:
it only checks for the kmod-foo RPM to be installed, assumes thus that the
depmod has been done.

>>> Also in order to fix the original report, I think it would be more interesting
>>> for akmod to register as system-update.target

>> This would be not needed if you modify akmods@.service to call akmods on
>> stop.

> Why do you think that moving from start to stop ever change anything ?

If one do:

  dnf -y update && reboot

that brings up a new kernel:

  - the akmodsposttrans will:

    - start synchronously akmods@.service
    - stop akmods@.service asynchonously (--no-block)

  - now the dnf udate is finished

  - the reboot will call stop on akmods@.service, but that will be a noop since
    it is already stopping.

> (specially as this won't prevent timeout on root reboots).

As stated in https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer
services may request longer timeouts:

  The short shutdown timer might not be long enough for libvirt to shut down
  VMs. Databases and virtual machines really must not be killed forcibly.

  Service files may already request longer timeouts, but would need to be
  modified to do so.

I suggest thus to set up akmods@.service with TimeoutStopSec=infinity

> This is totally unusual and sounds like a bad hack.

This is what akmods-shutdown.service actually do.

One might also think that services of Type=oneshot are a hack :-)

>> I just noticed also that akmod-foo uses only nohup, and not systemctl.
>> (Example: rpm -q --scripts akmod-nvidia-470xx <snip>
>>   posttrans scriptlet (using /bin/sh):
>>   nohup /usr/sbin/akmods --from-akmod-posttrans --akmod nvidia-470xx &>
>> /dev/null &
>> )

> This is unrelated to kernel upgrades. Here it is raw akmod-foo upgrade. (say
> moving to akmod-nvidia-525 to akmod-nvidia-535), but this could appears
> often as distro upgrade. So this raise the following questions:
> - Protect this task from unexpected reboot to the best possible (running as
> a service as suggested or using systemd-run  instead of nohup).
> - Avoid any duplicate tasks when there are both akmod version upgrade
> transaction and kernel transactions...

Right.

Comment 27 Francis.Montagnac 2023-07-09 07:23:51 UTC
Have you tried my proposal ? Thanks.

Comment 28 Francis.Montagnac 2023-07-23 08:48:32 UTC
Created attachment 1977120 [details]
akmods-on-stop

Comment 29 Francis.Montagnac 2023-07-23 08:49:18 UTC
Created attachment 1977121 [details]
95-akmodsposttrans.install

Comment 30 Francis.Montagnac 2023-07-23 08:50:03 UTC
I tried it and it works as expected.

See the akmods@.service and 95-akmodsposttrans.install attached.

Context:

  - a machine (a VM) having akmod-VirtualBox and akmod-nvidia installed and running the 6.2.11 kernel
  - update only the kernel and kernel-devel: will install the 6.3.12 kernel
  - reboot as soon as this update is finished

One see akmods@KERNEL_VERSION working during the shutdown: not killed.
It is rather long on the machine: almost 4 minutes.

Details:

  1. modify akmods with (see Comment 21 ):

rsync -pit akmods@.service /usr/lib/systemd/system/
rsync -pit 95-akmodsposttrans.install /usr/lib/kernel/install.d/
rm /etc/kernel/postinst.d/akmodsposttrans

  2. launch the update:

dnf -y update kernel kernel-devel \
  && { logger reboot-akmods-on-stop; reboot; }'


The kmod-nvidia was effectively compiled and installed before the current boot:

    last reboot | head -1
    reboot   system boot  6.3.12-200.fc38. Sun Jul 23 09:59   still running

    rpm -qi kmod-nvidia-$(uname -r) | grep Date
    Install Date: Sun 23 Jul 2023 09:58:15 AM CEST
    Build Date  : Sun 23 Jul 2023 09:54:51 AM CEST

Excerpt of the journal:
  
  journalctl --since  09:50:10  | grep -E akmods | grep -v -F audit[

  ### akmods@KERNEL_VERSION
  Jul 23 09:54:50 localhost systemd[1]: Created slice system-akmods.slice - Slice /system/akmods.
  Jul 23 09:54:50 localhost systemd[1]: Starting akmods.12-200.fc38.x86_64.service - Builds and install new kmods from akmod for a given kernel on stop...
  Jul 23 09:54:50 localhost systemd[1]: Finished akmods.12-200.fc38.x86_64.service - Builds and install new kmods from akmod for a given kernel on stop.
  Jul 23 09:54:50 localhost systemd[1]: Stopping akmods.12-200.fc38.x86_64.service - Builds and install new kmods from akmod for a given kernel on stop...
  Jul 23 09:54:50 localhost akmods[30951]: Checking kmods exist for 6.3.12-200.fc38.x86_64[  OK  ]
  Jul 23 09:54:51 localhost runuser[31118]: pam_unix(runuser:session): session opened for user akmods(uid=957) by (uid=0)
  Jul 23 09:55:22 localhost root[34758]: reboot-akmods-on-stop

  ### akmods.service
  Jul 23 09:55:23 localhost systemd[1]: Removed slice system-akmods\x2dkeygen.slice - Slice /system/akmods-keygen.
  Jul 23 09:55:23 localhost systemd[1]: akmods.service: Deactivated successfully.
  Jul 23 09:55:23 localhost systemd[1]: Stopped akmods.service - Builds and install new kmods from akmod packages.
  Jul 23 09:55:23 localhost systemd[1]: akmods.service: Consumed 1.559s CPU time.

  ### akmods@KERNEL_VERSION
  Jul 23 09:57:54 localhost runuser[31118]: pam_unix(runuser:session): session closed for user akmods
  Jul 23 09:58:43 localhost akmods[30951]: Building and installing nvidia-kmod[  OK  ]
  Jul 23 09:58:43 localhost runuser[38151]: pam_unix(runuser:session): session opened for user akmods(uid=957) by (uid=0)
  Jul 23 09:58:58 localhost runuser[38151]: pam_unix(runuser:session): session closed for user akmods
  Jul 23 09:59:18 localhost akmods[30951]: Building and installing VirtualBox-kmod[  OK  ]
  Jul 23 09:59:18 localhost akmods[30951]: Checking kmods exist for 6.2.11-300.fc38.x86_64[  OK  ]
  Jul 23 09:59:18 localhost systemd[1]: akmods.12-200.fc38.x86_64.service: Deactivated successfully.
  Jul 23 09:59:18 localhost systemd[1]: Stopped akmods.12-200.fc38.x86_64.service - Builds and install new kmods from akmod for a given kernel on stop.
  Jul 23 09:59:18 localhost systemd[1]: akmods.12-200.fc38.x86_64.service: Consumed 8min 4.472s CPU time.
  Jul 23 09:59:18 localhost systemd[1]: Removed slice system-akmods.slice - Slice /system/akmods.
  Jul 23 09:59:18 localhost systemd[1]: system-akmods.slice: Consumed 8min 4.473s CPU time.
  Jul 23 09:59:18 localhost systemd[1]: Stopped target akmods-keygen.target.
  Jul 23 09:59:44 localhost systemd[1]: Created slice system-akmods\x2dkeygen.slice - Slice /system/akmods-keygen.
  Jul 23 09:59:48 localhost systemd[1]: akmods-keygen - Akmods Secure boot MOK Key Generation was skipped because no trigger condition checks were met.
  Jul 23 09:59:48 localhost systemd[1]: Reached target akmods-keygen.target.
  Jul 23 09:59:48 localhost systemd[1]: Starting akmods.service - Builds and install new kmods from akmod packages...
  Jul 23 09:59:54 localhost akmods[664]: Checking kmods exist for 6.3.12-200.fc38.x86_64[  OK  ]
  Jul 23 09:59:54 localhost systemd[1]: Finished akmods.service - Builds and install new kmods from akmod packages.

Comment 31 Francis.Montagnac 2023-07-23 09:27:56 UTC
(In reply to Francis.Montagnac from comment #30)
>   1. modify akmods with (see Comment 21 ):

> rsync -pit akmods@.service /usr/lib/systemd/system/
> rsync -pit 95-akmodsposttrans.install /usr/lib/kernel/install.d/
> rm /etc/kernel/postinst.d/akmodsposttrans

Wrong for the permissions. Use instead:

 rsync -pit --chmod 644 akmods@.service /usr/lib/systemd/system/
 rsync -pit --chmod 775 95-akmodsposttrans.install /usr/lib/kernel/install.d/
 rm /etc/kernel/postinst.d/akmodsposttrans

Comment 32 Francis.Montagnac 2023-07-24 06:32:44 UTC
This could be simplified:

  - suppress RemainAfterExit=yes from akmods@.service: useless
  - only call systemctl --no-block start in 95-akmodsposttrans.install
     - the ExecStop=/usr/sbin/akmods... will be called

but there is a better way:

  - factorise the call to systemd in the akmods command itself:
    - use systemd-run: more simpler than the akmods@.service unit

  - only call nohup /usr/sbin/akmods --from-kernel-posttrans ... in 95-akmodsposttrans.install
    like done in all the akmod-foo RPMs (with --from-akmod-posttrans)

This will prevent modifying ALL the kmod-foo RPMs.

Find attached a modified version of the akmods command that does that.

Minimal test:

  bash -xp /usr/sbin/akmods --from-posttrans --akmod nvidia
  <snip>
  + '[' -n from-posttrans ']'
  + '[' -e /usr/bin/systemd-run ']'
  + /usr/bin/systemd-run --no-block --property=SyslogIdentifier=akmods-from-posttrans '--property=ExecStop=/usr/sbin/akmods --akmod nvidia' --property=TimeoutStopSec=infinity /bin/true
  Running as unit: run-rc9b78b6b9cce47b5abc8940421e72ef0.service
  + exit 0

journal:

  Jul 24 08:08:25 localhost systemd[1]: Started run-rc9b78b6b9cce47b5abc8940421e72ef0.service - /bin/true.
  Jul 24 08:08:25 localhost akmods-from-posttrans[9653]: Checking kmods exist for 6.3.12-200.fc38.x86_64[  OK  ]
  Jul 24 08:08:25 localhost systemd[1]: run-rc9b78b6b9cce47b5abc8940421e72ef0.service: Deactivated successfully.

>> - Avoid any duplicate tasks when there are both akmod version upgrade transaction and kernel transactions...
> We might need to put a temporary lock in akmod-foo pre-transaction task, so any later akmods@.service task will noop

Having looked at the akmods command I think that this is not needed since akmods does already a lock on
/run/akmods/akmods.lock, at the end of the init function.

Comment 33 Francis.Montagnac 2023-07-24 06:33:25 UTC
Created attachment 1977190 [details]
akmods command