Bug 1955099 - Updating the kernel on leapp'ed systems doesn't create the initramfs depending on Grub BLS state
Summary: Updating the kernel on leapp'ed systems doesn't create the initramfs dependin...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: leapp-repository
Version: 7.9
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Leapp Notifications Bot
QA Contact: upgrades-and-conversions
URL:
Whiteboard:
Depends On:
Blocks: 1818077 1818088
TreeView+ depends on / blocked
 
Reported: 2021-04-29 12:45 UTC by Renaud Métrich
Modified: 2023-08-04 14:57 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OAMG-4839 0 None None None 2023-05-11 07:30:16 UTC
Red Hat Knowledge Base (Solution) 6004981 0 None None None 2021-04-29 13:30:30 UTC
Red Hat Knowledge Base (Solution) 6100621 0 None None None 2021-06-05 13:43:30 UTC

Description Renaud Métrich 2021-04-29 12:45:45 UTC
Description of problem:

When upgrading a system to RHEL8 using leapp, a **kernel-workaround** package is installed that ships an empty /usr/sbin/new-kernel-pkg script.
If the customer is *not** using BLS (by not adding GRUB_ENABLE_BLSCFG=true in /etc/default/grub, likely because the customer uses Puppet with outdated templates from RHEL7), this ends up not generating the initramfs when updating kernels.

Exact reason is on line 80 in /usr/lib/kernel/install.d/20-grub.install:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
 80         if [[ "x${GRUB_ENABLE_BLSCFG}" = "xtrue" ]] || [[ ! -f /sbin/new-kernel-pkg ]]; then
 81             eval "$(grub2-get-kernel-settings)" || true
 82             [[ -d "$BLS_DIR" ]] || mkdir -m 0700 -p "$BLS_DIR"
 :
138         /sbin/new-kernel-pkg --package "kernel${flavor}" --install "$KERNEL_VERSION" || exit $?
139         /sbin/new-kernel-pkg --package "kernel${flavor}" --mkinitrd --dracut --depmod --update "$KERNEL_VERSION" |    | exit $?
140         /sbin/new-kernel-pkg --package "kernel${flavor}" --rpmposttrans "$KERNEL_VERSION" || exit $?
141         # If grubby is used there's no need to run other installation plugins
142         exit 77
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Above, the code block on lines 81+ is not entered because of the lack of "GRUB_ENABLE_BLSCFG" in /etc/default/grub and the presence of /sbin/new-kernel-pkg shipped by **kernel-workaround**.

It then goes to line 138-140 which do nothing since the script is an empty script.


Version-Release number of selected component (if applicable):

leapp-repository-0.13.0-2.el7_9.noarch

Comment 2 Petr Stodulka 2021-05-03 15:20:52 UTC
Just for the information, before/after the removal of the kernel & kernel-workaround rpms, user will need to remove the old-kernel entry from the bootloader manually for now. E.g.

    /bin/kernel-install remove 3.10.0-1160.25.1.el7.x86_64 /lib/modules/3.10.0-1160.25.1.el7.x86_64/vmlinuz

Comment 3 Petr Stodulka 2021-05-17 10:52:21 UTC
The documentation is going to be updated to cover the problem until we fix it.

Comment 4 Petr Stodulka 2021-05-19 13:53:03 UTC
FYI, the documentation has been updated.

Comment 5 Renaud Métrich 2021-06-05 13:25:03 UTC
Hi Petr,

There is a second scenario where the issue can happen: in case /etc/default/grub doesn't end with a newline **prior** to executing "leapp upgrade".

This makes the reboot fail with this:
~~~
[  266.092340] upgrade[609]: ============================================================
[  266.093274] upgrade[609]:                            ERRORS
[  266.093995] upgrade[609]: ============================================================
[  266.094790] upgrade[609]: 2021-06-05 15:20:04.141285 [ERROR] Actor: kernelcmdlineconfig
[  266.095526] upgrade[609]: Message: Failed to append extra arguments to kernel command line.
[  266.096381] upgrade[609]: Summary:
[  266.096944] upgrade[609]:     Details: Command ['grubby', '--update-kernel=/boot/vmlinuz-4.18.0-305.3.1.el8_4.x86_64', '--args=net.ifnames=0'] failed with exit code 1.
[  266.098461] upgrade[609]: ============================================================
[  266.099562] upgrade[609]:                        END OF ERRORS
[  266.100421] upgrade[609]: ============================================================
~~~

This ends up getting a broken Grub stanza in /etc/default/grub which is equivalent to not having GRUB_ENABLE_BLSCFG=true:
~~~
...
GRUB_DISABLE_RECOVERY="true"GRUB_ENABLE_BLSCFG=true
~~~

I'll be writing a KCS on this asap.

Comment 6 Renaud Métrich 2021-06-07 07:33:56 UTC
Hi Petr, comment #5 is a dup of BZ #1937383 actually, linking the KCS there as well.

Comment 7 Petr Stodulka 2021-06-07 11:53:02 UTC
Hi Renaud, thanks for the info and KCS. I already pinged guys about that. The second issue will be probably catched - the most probably we will write an inhibitor for upgrade when the grub file is invalid (the LF is missing).

Comment 10 Christophe Besson 2023-06-23 09:46:09 UTC
A puppet-agent pushing an old default grub file can be a problem afterwards, on RHEL 8.
But that does not explain why GRUB_ENABLE_BLSCFG was not added during the upgrade step, as there is no puppet-agent in this sequence.

Adding a customer case with the same symptoms, the rhel8 initrd has not being generated and it ends in the emergency shell.

What happened initially

- the DNF transaction was "successful" but the RHEL 8 kernel was partly installed, dracut having not being executed (post-script silently failed).
- due to this, subsequent grubby commands failed, leading to the emergency shell, and a broken upgrade.
- the BLS entries were not there in /boot/loader/entries and /etc/default/grub was unchanged.

Why?

Because "something" prevented to modify `/etc/default/grub` during the real upgrade (during the reboot step on the dedicated "RHEL-UpgrateInitramfs").
I'm indeed able to reproduce the *very same behaviour* by setting an immutable bit (`chattr +i`) on /etc/default/grub* files.

* grubby is upgraded, hence the old `/sbin/new-kernel-pkg` script is erased.
~~~
Jun 16 17:22:35 localhost upgrade[1912]:   Upgrading        : grubby-8.40-47.el8.x86_64                         341/2351
~~~

* grub2-tools is upgraded
~~~
Jun 16 17:22:48 localhost upgrade[1912]:   Upgrading        : grub2-tools-1:2.02-148.el8.x86_64                 399/2351
Jun 16 17:22:48 localhost upgrade[1912]:   Running scriptlet: grub2-tools-1:2.02-148.el8.x86_64                 399/2351
~~~

* so its post-script executes `/sbin/grub2-switch-to-blscfg`:
~~~
if [ "$1" = 2 ]; then
    /sbin/grub2-switch-to-blscfg --backup-suffix=.rpmsave &>/dev/null || :
fi
~~~

* this script adds `GRUB_ENABLE_BLSCFG=true` in `/etc/default/grub` if it's not there (line 280):
~~~
269 GENERATE=0
270 if grep '^GRUB_ENABLE_BLSCFG=.*' "${etcdefaultgrub}" \
271         | grep -vq '^GRUB_ENABLE_BLSCFG="*true"*\s*$' ; then
272     if ! sed -i"${backupsuffix}" \
273             -e 's,^GRUB_ENABLE_BLSCFG=.*,GRUB_ENABLE_BLSCFG=true,' \
274             "${etcdefaultgrub}" ; then
275         gettext_printf "Updating %s failed\n" "${etcdefaultgrub}"
276         exit 1
277     fi
278     GENERATE=1
279 elif ! grep -q '^GRUB_ENABLE_BLSCFG=.*' "${etcdefaultgrub}" ; then
280     if ! echo 'GRUB_ENABLE_BLSCFG=true' >> "${etcdefaultgrub}" ; then
281         gettext_printf "Updating %s failed\n" "${etcdefaultgrub}"
282         exit 1
283     fi
284     GENERATE=1
285 fi
~~~
It didn't happen, and the error has not been caught deliberately ( &>/dev/null || : ).
That's why you had no BLS entries.

* After that the `kernel-workaround` package is installed, it only contains an **empty** `/sbin/new-kernel-pkg` script, in order to prevent a conflict while upgrading grubby, since RHEL 7 kernel package **requires** this script (in use from the RPM postscript).
~~~
Jun 16 17:24:42 localhost upgrade[1912]:   Installing       : kernel-workaround-0.1-1.el8.noarch               1066/2351
~~~

* And finally, at the very end, the `kernel-core` postscript is executed and fails silently.
Its postscript calls:
~~~
/bin/kernel-install add 4.18.0-477.13.1.el8_8.x86_64 /lib/modules/4.18.0-477.13.1.el8_8.x86_64/vmlinuz || exit $?
~~~

* The `kernel-install` script executes sequentially the files installed by grub/grubby/dracut in `/usr/lib/kernel/install.d`, in particular **`20-grub.install`**:
~~~
 88         if [[ "x${GRUB_ENABLE_BLSCFG}" = "xtrue" ]] || [[ ! -f /sbin/new-kernel-pkg ]]; then
~~~
Due to the lack of `GRUB_ENABLE_BLSCFG` **and** the presence of `/sbin/new-kernel-pkg`, it does not enter into this code block, and then `/sbin/new-kernel-pkg is called`, but it does nothing anymore since the script is empty!
~~~
146         /sbin/new-kernel-pkg --package "kernel${flavor}" --install "$KERNEL_VERSION" || exit $?
147         /sbin/new-kernel-pkg --package "kernel${flavor}" --mkinitrd --dracut --depmod --update "$KERNEL_VERSION" || exit     $?
~~~
Leading to a initrd not generated for the RHEL 8 kernel...

* The kernel having not being properly installed, grubby failed while removing the "enforcing=0" parameter from the kernel cmdline, and you fall into the emergency shell.


Why adding `GRUB_ENABLE_BLSCFG=true` before the reboot helped?

* In short this time:
  - grub2-switch-to-blscfg didn't generate converted the existing entries to BLS because the variable is already set (line 270-271). So it didn't call again `grub2-mkconfig`, leading to a non-updated grub.cfg for BLS configurations, containing only RHEL 7 entries.
  - due to the presence of the variable, later the kernel-core postscript entered in the code block which creates the bls entry (line 96) and then 50-dracut.install is executed, so the initramfs is created.
  - the kernel being properly installed, grubby worked again.

* The solution here is to simply execute `grub2-mkconfig -o /boot/grub2/grub.cfg` (the grub2-switch-to-blscfg is really useful only if you want to keep el7 kernels).

Comment 11 Jesús Pérez Martínez 2023-07-14 09:54:48 UTC
Hello,

I have found a new scenario where this issue happens:

If the package grub2-tools is installed but is not present in the RPM database, the IPU will proceed as normal until the point where it fails with:
~~~
Jul 13 11:37:32 localhost upgrade[38298]: ============================================================
Jul 13 11:37:32 localhost upgrade[38298]:                            ERRORS
Jul 13 11:37:32 localhost upgrade[38298]: ============================================================
Jul 13 11:37:32 localhost upgrade[38298]: 2023-07-13 13:37:31.928060 [ERROR] Actor: kernelcmdlineconfig
Jul 13 11:37:32 localhost upgrade[38298]: Message: Failed to append extra arguments to kernel command line.
Jul 13 11:37:32 localhost upgrade[38298]: Summary:
Jul 13 11:37:32 localhost upgrade[38298]:     Details: Command ['grubby', '--update-kernel=/boot/vmlinuz-4.18.0-
477.15.1.el8_8.x86_64', '--args', 'net.ifnames=0', '--remove-args', 'enforcing=0'] failed with exit code 1.
Jul 13 11:37:32 localhost upgrade[38298]: ============================================================
Jul 13 11:37:32 localhost upgrade[38298]:                        END OF ERRORS
Jul 13 11:37:32 localhost upgrade[38298]: ============================================================
~~~

The error is not really caused by the package grub2-tools not being installed during the IPU, on the contrary, it will be installed. However, checking the post-install script:
~~~
if [ "$1" = 2 ]; then
	/sbin/grub2-switch-to-blscfg --backup-suffix=.rpmsave &>/dev/null || :
fi
~~~

grub2-switch-to-blscfg will only be called when the transaction is an upgrade. In this case, as the package was not present in the RPM database, it's marked as a new installation instead of an upgrade and grub2-switch-to-blscfg is not executed.


Note You need to log in before you can comment on or make changes to this bug.