1955099 – Updating the kernel on leapp'ed systems doesn't create the initramfs depending on Grub BLS state

This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1955099 - Updating the kernel on leapp'ed systems doesn't create the initramfs depending on Grub BLS state

Summary: Updating the kernel on leapp'ed systems doesn't create the initramfs dependin...

Keywords:
Status:	CLOSED MIGRATED
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	leapp-repository
Sub Component:
Version:	7.9
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Leapp Notifications Bot
QA Contact:	upgrades-and-conversions
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1818077 1818088
TreeView+	depends on / blocked

Reported:	2021-04-29 12:45 UTC by Renaud Métrich
Modified:	2024-06-14 01:24 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-09-12 11:09:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OAMG-4839	None	None	None	2023-05-11 07:30:16 UTC
Red Hat Issue Tracker	RHEL-3277	None	Migrated	None	2023-09-12 11:06:52 UTC
Red Hat Knowledge Base (Solution)	6004981	None	None	None	2021-04-29 13:30:30 UTC
Red Hat Knowledge Base (Solution)	6100621	None	None	None	2021-06-05 13:43:30 UTC

Description Renaud Métrich 2021-04-29 12:45:45 UTC

Description of problem:

When upgrading a system to RHEL8 using leapp, a **kernel-workaround** package is installed that ships an empty /usr/sbin/new-kernel-pkg script.
If the customer is *not** using BLS (by not adding GRUB_ENABLE_BLSCFG=true in /etc/default/grub, likely because the customer uses Puppet with outdated templates from RHEL7), this ends up not generating the initramfs when updating kernels.

Exact reason is on line 80 in /usr/lib/kernel/install.d/20-grub.install:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
 80         if [[ "x${GRUB_ENABLE_BLSCFG}" = "xtrue" ]] || [[ ! -f /sbin/new-kernel-pkg ]]; then
 81             eval "$(grub2-get-kernel-settings)" || true
 82             [[ -d "$BLS_DIR" ]] || mkdir -m 0700 -p "$BLS_DIR"
 :
138         /sbin/new-kernel-pkg --package "kernel${flavor}" --install "$KERNEL_VERSION" || exit $?
139         /sbin/new-kernel-pkg --package "kernel${flavor}" --mkinitrd --dracut --depmod --update "$KERNEL_VERSION" |    | exit $?
140         /sbin/new-kernel-pkg --package "kernel${flavor}" --rpmposttrans "$KERNEL_VERSION" || exit $?
141         # If grubby is used there's no need to run other installation plugins
142         exit 77
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Above, the code block on lines 81+ is not entered because of the lack of "GRUB_ENABLE_BLSCFG" in /etc/default/grub and the presence of /sbin/new-kernel-pkg shipped by **kernel-workaround**.

It then goes to line 138-140 which do nothing since the script is an empty script.


Version-Release number of selected component (if applicable):

leapp-repository-0.13.0-2.el7_9.noarch

Comment 2 Petr Stodulka 2021-05-03 15:20:52 UTC

Just for the information, before/after the removal of the kernel & kernel-workaround rpms, user will need to remove the old-kernel entry from the bootloader manually for now. E.g.

    /bin/kernel-install remove 3.10.0-1160.25.1.el7.x86_64 /lib/modules/3.10.0-1160.25.1.el7.x86_64/vmlinuz

Comment 3 Petr Stodulka 2021-05-17 10:52:21 UTC

The documentation is going to be updated to cover the problem until we fix it.

Comment 4 Petr Stodulka 2021-05-19 13:53:03 UTC

FYI, the documentation has been updated.

Comment 5 Renaud Métrich 2021-06-05 13:25:03 UTC

Hi Petr,

There is a second scenario where the issue can happen: in case /etc/default/grub doesn't end with a newline **prior** to executing "leapp upgrade".

This makes the reboot fail with this:
~~~
[  266.092340] upgrade[609]: ============================================================
[  266.093274] upgrade[609]:                            ERRORS
[  266.093995] upgrade[609]: ============================================================
[  266.094790] upgrade[609]: 2021-06-05 15:20:04.141285 [ERROR] Actor: kernelcmdlineconfig
[  266.095526] upgrade[609]: Message: Failed to append extra arguments to kernel command line.
[  266.096381] upgrade[609]: Summary:
[  266.096944] upgrade[609]:     Details: Command ['grubby', '--update-kernel=/boot/vmlinuz-4.18.0-305.3.1.el8_4.x86_64', '--args=net.ifnames=0'] failed with exit code 1.
[  266.098461] upgrade[609]: ============================================================
[  266.099562] upgrade[609]:                        END OF ERRORS
[  266.100421] upgrade[609]: ============================================================
~~~

This ends up getting a broken Grub stanza in /etc/default/grub which is equivalent to not having GRUB_ENABLE_BLSCFG=true:
~~~
...
GRUB_DISABLE_RECOVERY="true"GRUB_ENABLE_BLSCFG=true
~~~

I'll be writing a KCS on this asap.

Comment 6 Renaud Métrich 2021-06-07 07:33:56 UTC

Hi Petr, comment #5 is a dup of BZ #1937383 actually, linking the KCS there as well.

Comment 7 Petr Stodulka 2021-06-07 11:53:02 UTC

Hi Renaud, thanks for the info and KCS. I already pinged guys about that. The second issue will be probably catched - the most probably we will write an inhibitor for upgrade when the grub file is invalid (the LF is missing).

Comment 10 Christophe Besson 2023-06-23 09:46:09 UTC

A puppet-agent pushing an old default grub file can be a problem afterwards, on RHEL 8.
But that does not explain why GRUB_ENABLE_BLSCFG was not added during the upgrade step, as there is no puppet-agent in this sequence.

Adding a customer case with the same symptoms, the rhel8 initrd has not being generated and it ends in the emergency shell.

What happened initially

- the DNF transaction was "successful" but the RHEL 8 kernel was partly installed, dracut having not being executed (post-script silently failed).
- due to this, subsequent grubby commands failed, leading to the emergency shell, and a broken upgrade.
- the BLS entries were not there in /boot/loader/entries and /etc/default/grub was unchanged.

Why?

Because "something" prevented to modify `/etc/default/grub` during the real upgrade (during the reboot step on the dedicated "RHEL-UpgrateInitramfs").
I'm indeed able to reproduce the *very same behaviour* by setting an immutable bit (`chattr +i`) on /etc/default/grub* files.

* grubby is upgraded, hence the old `/sbin/new-kernel-pkg` script is erased.
~~~
Jun 16 17:22:35 localhost upgrade[1912]:   Upgrading        : grubby-8.40-47.el8.x86_64                         341/2351
~~~

* grub2-tools is upgraded
~~~
Jun 16 17:22:48 localhost upgrade[1912]:   Upgrading        : grub2-tools-1:2.02-148.el8.x86_64                 399/2351
Jun 16 17:22:48 localhost upgrade[1912]:   Running scriptlet: grub2-tools-1:2.02-148.el8.x86_64                 399/2351
~~~

* so its post-script executes `/sbin/grub2-switch-to-blscfg`:
~~~
if [ "$1" = 2 ]; then
    /sbin/grub2-switch-to-blscfg --backup-suffix=.rpmsave &>/dev/null || :
fi
~~~

* this script adds `GRUB_ENABLE_BLSCFG=true` in `/etc/default/grub` if it's not there (line 280):
~~~
269 GENERATE=0
270 if grep '^GRUB_ENABLE_BLSCFG=.*' "${etcdefaultgrub}" \
271         | grep -vq '^GRUB_ENABLE_BLSCFG="*true"*\s*$' ; then
272     if ! sed -i"${backupsuffix}" \
273             -e 's,^GRUB_ENABLE_BLSCFG=.*,GRUB_ENABLE_BLSCFG=true,' \
274             "${etcdefaultgrub}" ; then
275         gettext_printf "Updating %s failed\n" "${etcdefaultgrub}"
276         exit 1
277     fi
278     GENERATE=1
279 elif ! grep -q '^GRUB_ENABLE_BLSCFG=.*' "${etcdefaultgrub}" ; then
280     if ! echo 'GRUB_ENABLE_BLSCFG=true' >> "${etcdefaultgrub}" ; then
281         gettext_printf "Updating %s failed\n" "${etcdefaultgrub}"
282         exit 1
283     fi
284     GENERATE=1
285 fi
~~~
It didn't happen, and the error has not been caught deliberately ( &>/dev/null || : ).
That's why you had no BLS entries.

* After that the `kernel-workaround` package is installed, it only contains an **empty** `/sbin/new-kernel-pkg` script, in order to prevent a conflict while upgrading grubby, since RHEL 7 kernel package **requires** this script (in use from the RPM postscript).
~~~
Jun 16 17:24:42 localhost upgrade[1912]:   Installing       : kernel-workaround-0.1-1.el8.noarch               1066/2351
~~~

* And finally, at the very end, the `kernel-core` postscript is executed and fails silently.
Its postscript calls:
~~~
/bin/kernel-install add 4.18.0-477.13.1.el8_8.x86_64 /lib/modules/4.18.0-477.13.1.el8_8.x86_64/vmlinuz || exit $?
~~~

* The `kernel-install` script executes sequentially the files installed by grub/grubby/dracut in `/usr/lib/kernel/install.d`, in particular **`20-grub.install`**:
~~~
 88         if [[ "x${GRUB_ENABLE_BLSCFG}" = "xtrue" ]] || [[ ! -f /sbin/new-kernel-pkg ]]; then
~~~
Due to the lack of `GRUB_ENABLE_BLSCFG` **and** the presence of `/sbin/new-kernel-pkg`, it does not enter into this code block, and then `/sbin/new-kernel-pkg is called`, but it does nothing anymore since the script is empty!
~~~
146         /sbin/new-kernel-pkg --package "kernel${flavor}" --install "$KERNEL_VERSION" || exit $?
147         /sbin/new-kernel-pkg --package "kernel${flavor}" --mkinitrd --dracut --depmod --update "$KERNEL_VERSION" || exit     $?
~~~
Leading to a initrd not generated for the RHEL 8 kernel...

* The kernel having not being properly installed, grubby failed while removing the "enforcing=0" parameter from the kernel cmdline, and you fall into the emergency shell.


Why adding `GRUB_ENABLE_BLSCFG=true` before the reboot helped?

* In short this time:
  - grub2-switch-to-blscfg didn't generate converted the existing entries to BLS because the variable is already set (line 270-271). So it didn't call again `grub2-mkconfig`, leading to a non-updated grub.cfg for BLS configurations, containing only RHEL 7 entries.
  - due to the presence of the variable, later the kernel-core postscript entered in the code block which creates the bls entry (line 96) and then 50-dracut.install is executed, so the initramfs is created.
  - the kernel being properly installed, grubby worked again.

* The solution here is to simply execute `grub2-mkconfig -o /boot/grub2/grub.cfg` (the grub2-switch-to-blscfg is really useful only if you want to keep el7 kernels).

Comment 11 Jesús Pérez Martínez 2023-07-14 09:54:48 UTC

Hello,

I have found a new scenario where this issue happens:

If the package grub2-tools is installed but is not present in the RPM database, the IPU will proceed as normal until the point where it fails with:
~~~
Jul 13 11:37:32 localhost upgrade[38298]: ============================================================
Jul 13 11:37:32 localhost upgrade[38298]:                            ERRORS
Jul 13 11:37:32 localhost upgrade[38298]: ============================================================
Jul 13 11:37:32 localhost upgrade[38298]: 2023-07-13 13:37:31.928060 [ERROR] Actor: kernelcmdlineconfig
Jul 13 11:37:32 localhost upgrade[38298]: Message: Failed to append extra arguments to kernel command line.
Jul 13 11:37:32 localhost upgrade[38298]: Summary:
Jul 13 11:37:32 localhost upgrade[38298]:     Details: Command ['grubby', '--update-kernel=/boot/vmlinuz-4.18.0-
477.15.1.el8_8.x86_64', '--args', 'net.ifnames=0', '--remove-args', 'enforcing=0'] failed with exit code 1.
Jul 13 11:37:32 localhost upgrade[38298]: ============================================================
Jul 13 11:37:32 localhost upgrade[38298]:                        END OF ERRORS
Jul 13 11:37:32 localhost upgrade[38298]: ============================================================
~~~

The error is not really caused by the package grub2-tools not being installed during the IPU, on the contrary, it will be installed. However, checking the post-install script:
~~~
if [ "$1" = 2 ]; then
	/sbin/grub2-switch-to-blscfg --backup-suffix=.rpmsave &>/dev/null || :
fi
~~~

grub2-switch-to-blscfg will only be called when the transaction is an upgrade. In this case, as the package was not present in the RPM database, it's marked as a new installation instead of an upgrade and grub2-switch-to-blscfg is not executed.

Comment 12 RHEL Program Management 2023-09-12 11:04:25 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 13 RHEL Program Management 2023-09-12 11:09:26 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.

Note You need to log in before you can comment on or make changes to this bug.