Description of problem: When upgrading a system to RHEL8 using leapp, a **kernel-workaround** package is installed that ships an empty /usr/sbin/new-kernel-pkg script. If the customer is *not** using BLS (by not adding GRUB_ENABLE_BLSCFG=true in /etc/default/grub, likely because the customer uses Puppet with outdated templates from RHEL7), this ends up not generating the initramfs when updating kernels. Exact reason is on line 80 in /usr/lib/kernel/install.d/20-grub.install: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 80 if [[ "x${GRUB_ENABLE_BLSCFG}" = "xtrue" ]] || [[ ! -f /sbin/new-kernel-pkg ]]; then 81 eval "$(grub2-get-kernel-settings)" || true 82 [[ -d "$BLS_DIR" ]] || mkdir -m 0700 -p "$BLS_DIR" : 138 /sbin/new-kernel-pkg --package "kernel${flavor}" --install "$KERNEL_VERSION" || exit $? 139 /sbin/new-kernel-pkg --package "kernel${flavor}" --mkinitrd --dracut --depmod --update "$KERNEL_VERSION" | | exit $? 140 /sbin/new-kernel-pkg --package "kernel${flavor}" --rpmposttrans "$KERNEL_VERSION" || exit $? 141 # If grubby is used there's no need to run other installation plugins 142 exit 77 -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Above, the code block on lines 81+ is not entered because of the lack of "GRUB_ENABLE_BLSCFG" in /etc/default/grub and the presence of /sbin/new-kernel-pkg shipped by **kernel-workaround**. It then goes to line 138-140 which do nothing since the script is an empty script. Version-Release number of selected component (if applicable): leapp-repository-0.13.0-2.el7_9.noarch
Just for the information, before/after the removal of the kernel & kernel-workaround rpms, user will need to remove the old-kernel entry from the bootloader manually for now. E.g. /bin/kernel-install remove 3.10.0-1160.25.1.el7.x86_64 /lib/modules/3.10.0-1160.25.1.el7.x86_64/vmlinuz
The documentation is going to be updated to cover the problem until we fix it.
FYI, the documentation has been updated.
Hi Petr, There is a second scenario where the issue can happen: in case /etc/default/grub doesn't end with a newline **prior** to executing "leapp upgrade". This makes the reboot fail with this: ~~~ [ 266.092340] upgrade[609]: ============================================================ [ 266.093274] upgrade[609]: ERRORS [ 266.093995] upgrade[609]: ============================================================ [ 266.094790] upgrade[609]: 2021-06-05 15:20:04.141285 [ERROR] Actor: kernelcmdlineconfig [ 266.095526] upgrade[609]: Message: Failed to append extra arguments to kernel command line. [ 266.096381] upgrade[609]: Summary: [ 266.096944] upgrade[609]: Details: Command ['grubby', '--update-kernel=/boot/vmlinuz-4.18.0-305.3.1.el8_4.x86_64', '--args=net.ifnames=0'] failed with exit code 1. [ 266.098461] upgrade[609]: ============================================================ [ 266.099562] upgrade[609]: END OF ERRORS [ 266.100421] upgrade[609]: ============================================================ ~~~ This ends up getting a broken Grub stanza in /etc/default/grub which is equivalent to not having GRUB_ENABLE_BLSCFG=true: ~~~ ... GRUB_DISABLE_RECOVERY="true"GRUB_ENABLE_BLSCFG=true ~~~ I'll be writing a KCS on this asap.
Hi Petr, comment #5 is a dup of BZ #1937383 actually, linking the KCS there as well.
Hi Renaud, thanks for the info and KCS. I already pinged guys about that. The second issue will be probably catched - the most probably we will write an inhibitor for upgrade when the grub file is invalid (the LF is missing).
A puppet-agent pushing an old default grub file can be a problem afterwards, on RHEL 8. But that does not explain why GRUB_ENABLE_BLSCFG was not added during the upgrade step, as there is no puppet-agent in this sequence. Adding a customer case with the same symptoms, the rhel8 initrd has not being generated and it ends in the emergency shell. What happened initially - the DNF transaction was "successful" but the RHEL 8 kernel was partly installed, dracut having not being executed (post-script silently failed). - due to this, subsequent grubby commands failed, leading to the emergency shell, and a broken upgrade. - the BLS entries were not there in /boot/loader/entries and /etc/default/grub was unchanged. Why? Because "something" prevented to modify `/etc/default/grub` during the real upgrade (during the reboot step on the dedicated "RHEL-UpgrateInitramfs"). I'm indeed able to reproduce the *very same behaviour* by setting an immutable bit (`chattr +i`) on /etc/default/grub* files. * grubby is upgraded, hence the old `/sbin/new-kernel-pkg` script is erased. ~~~ Jun 16 17:22:35 localhost upgrade[1912]: Upgrading : grubby-8.40-47.el8.x86_64 341/2351 ~~~ * grub2-tools is upgraded ~~~ Jun 16 17:22:48 localhost upgrade[1912]: Upgrading : grub2-tools-1:2.02-148.el8.x86_64 399/2351 Jun 16 17:22:48 localhost upgrade[1912]: Running scriptlet: grub2-tools-1:2.02-148.el8.x86_64 399/2351 ~~~ * so its post-script executes `/sbin/grub2-switch-to-blscfg`: ~~~ if [ "$1" = 2 ]; then /sbin/grub2-switch-to-blscfg --backup-suffix=.rpmsave &>/dev/null || : fi ~~~ * this script adds `GRUB_ENABLE_BLSCFG=true` in `/etc/default/grub` if it's not there (line 280): ~~~ 269 GENERATE=0 270 if grep '^GRUB_ENABLE_BLSCFG=.*' "${etcdefaultgrub}" \ 271 | grep -vq '^GRUB_ENABLE_BLSCFG="*true"*\s*$' ; then 272 if ! sed -i"${backupsuffix}" \ 273 -e 's,^GRUB_ENABLE_BLSCFG=.*,GRUB_ENABLE_BLSCFG=true,' \ 274 "${etcdefaultgrub}" ; then 275 gettext_printf "Updating %s failed\n" "${etcdefaultgrub}" 276 exit 1 277 fi 278 GENERATE=1 279 elif ! grep -q '^GRUB_ENABLE_BLSCFG=.*' "${etcdefaultgrub}" ; then 280 if ! echo 'GRUB_ENABLE_BLSCFG=true' >> "${etcdefaultgrub}" ; then 281 gettext_printf "Updating %s failed\n" "${etcdefaultgrub}" 282 exit 1 283 fi 284 GENERATE=1 285 fi ~~~ It didn't happen, and the error has not been caught deliberately ( &>/dev/null || : ). That's why you had no BLS entries. * After that the `kernel-workaround` package is installed, it only contains an **empty** `/sbin/new-kernel-pkg` script, in order to prevent a conflict while upgrading grubby, since RHEL 7 kernel package **requires** this script (in use from the RPM postscript). ~~~ Jun 16 17:24:42 localhost upgrade[1912]: Installing : kernel-workaround-0.1-1.el8.noarch 1066/2351 ~~~ * And finally, at the very end, the `kernel-core` postscript is executed and fails silently. Its postscript calls: ~~~ /bin/kernel-install add 4.18.0-477.13.1.el8_8.x86_64 /lib/modules/4.18.0-477.13.1.el8_8.x86_64/vmlinuz || exit $? ~~~ * The `kernel-install` script executes sequentially the files installed by grub/grubby/dracut in `/usr/lib/kernel/install.d`, in particular **`20-grub.install`**: ~~~ 88 if [[ "x${GRUB_ENABLE_BLSCFG}" = "xtrue" ]] || [[ ! -f /sbin/new-kernel-pkg ]]; then ~~~ Due to the lack of `GRUB_ENABLE_BLSCFG` **and** the presence of `/sbin/new-kernel-pkg`, it does not enter into this code block, and then `/sbin/new-kernel-pkg is called`, but it does nothing anymore since the script is empty! ~~~ 146 /sbin/new-kernel-pkg --package "kernel${flavor}" --install "$KERNEL_VERSION" || exit $? 147 /sbin/new-kernel-pkg --package "kernel${flavor}" --mkinitrd --dracut --depmod --update "$KERNEL_VERSION" || exit $? ~~~ Leading to a initrd not generated for the RHEL 8 kernel... * The kernel having not being properly installed, grubby failed while removing the "enforcing=0" parameter from the kernel cmdline, and you fall into the emergency shell. Why adding `GRUB_ENABLE_BLSCFG=true` before the reboot helped? * In short this time: - grub2-switch-to-blscfg didn't generate converted the existing entries to BLS because the variable is already set (line 270-271). So it didn't call again `grub2-mkconfig`, leading to a non-updated grub.cfg for BLS configurations, containing only RHEL 7 entries. - due to the presence of the variable, later the kernel-core postscript entered in the code block which creates the bls entry (line 96) and then 50-dracut.install is executed, so the initramfs is created. - the kernel being properly installed, grubby worked again. * The solution here is to simply execute `grub2-mkconfig -o /boot/grub2/grub.cfg` (the grub2-switch-to-blscfg is really useful only if you want to keep el7 kernels).
Hello, I have found a new scenario where this issue happens: If the package grub2-tools is installed but is not present in the RPM database, the IPU will proceed as normal until the point where it fails with: ~~~ Jul 13 11:37:32 localhost upgrade[38298]: ============================================================ Jul 13 11:37:32 localhost upgrade[38298]: ERRORS Jul 13 11:37:32 localhost upgrade[38298]: ============================================================ Jul 13 11:37:32 localhost upgrade[38298]: 2023-07-13 13:37:31.928060 [ERROR] Actor: kernelcmdlineconfig Jul 13 11:37:32 localhost upgrade[38298]: Message: Failed to append extra arguments to kernel command line. Jul 13 11:37:32 localhost upgrade[38298]: Summary: Jul 13 11:37:32 localhost upgrade[38298]: Details: Command ['grubby', '--update-kernel=/boot/vmlinuz-4.18.0- 477.15.1.el8_8.x86_64', '--args', 'net.ifnames=0', '--remove-args', 'enforcing=0'] failed with exit code 1. Jul 13 11:37:32 localhost upgrade[38298]: ============================================================ Jul 13 11:37:32 localhost upgrade[38298]: END OF ERRORS Jul 13 11:37:32 localhost upgrade[38298]: ============================================================ ~~~ The error is not really caused by the package grub2-tools not being installed during the IPU, on the contrary, it will be installed. However, checking the post-install script: ~~~ if [ "$1" = 2 ]; then /sbin/grub2-switch-to-blscfg --backup-suffix=.rpmsave &>/dev/null || : fi ~~~ grub2-switch-to-blscfg will only be called when the transaction is an upgrade. In this case, as the package was not present in the RPM database, it's marked as a new installation instead of an upgrade and grub2-switch-to-blscfg is not executed.