Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2166233

Summary: grubby fails to add the kernel entry when upgrading from RHEL6 using redhat-upgrade-tool
Product: Red Hat Enterprise Linux 7 Reporter: Renaud Métrich <rmetrich>
Component: kernelAssignee: Denys Vlasenko <dvlasenk>
kernel sub component: Packaging QA Contact: zhijwang <zhijwang>
Status: CLOSED MIGRATED Docs Contact:
Severity: high    
Priority: high CC: bmader, bwelterl, dvlasenk, hkrzesin, jstancek, kernel-qe, lilu, mkluson, mreznik, nmurray, ppaddhar, prjagtap, pstodulk, ptalbert, rhandlin, tmeszaro, zhijwang
Version: 7.9Keywords: MigratedToJIRA, Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-12 11:55:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2108243    

Description Renaud Métrich 2023-02-01 08:37:47 UTC
Description of problem:

Since fixing BZ #1893756, the bootloader entry is created in posttrans scriptlet only.
In the past (e.g. up to 3.10.0-1160.59.1.el7 included), it was done in 2 phases:
- postinstall to create the entry without the initrd (because initrd is not created yet)
- posttrans to update the entry with the initrd

Due to this change, the upgrade from RHEL6 fails due to grubby failing in error when adding the kernel:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
grubby fatal error: unable to find a suitable template
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

This leads to rebooting the system with no kernel entry available at all, making the system completely unbootable.

The reason for this is detailed below:
1. Initially after running redhat-upgrade-tool and before rebooting, there are 2 bootloader entries at least:

  title System Upgrade (redhat-upgrade-tool)
  --> the RHEL7 kernel used to upgrade
  title Red Hat Enterprise Linux Server (2.6.32-754.35.1.el6.x86_64)
  --> the RHEL6 kernel

2. Upon rebooting to perform the upgrade, the "System Upgrade" entry is deleted (this is to avoid breaking if system upgrade failed)
3. The upgrade happens, which deletes the RHEL6 kernel and associated entry
4. %posttrans of the RHEL7 kernel executes, which makes grubby fail since it cannot copy the kernel arguments for any kernel since there are none left


Version-Release number of selected component (if applicable):

kernel-3.10.0-1160.62.1.el7 and later

How reproducible:

Always

Steps to Reproduce:
1. Setup a RHEL6 system and update it to latest
2. Install redhat-upgrade-tool

  # yum -y install preupgrade-assistant preupgrade-assistant-el6toel7 redhat-upgrade-tool
  # preupg

3. Prepare the upgrade with latest RHEL7 bits

  # redhat-upgrade-tool --nogpgcheck --network 7.9 --instrepo http://192.168.122.1/rhel79 --addrepo=latest='http://rhsm-pulp.corp.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os' --cleanup-post

  Here above the RHEL7.9 DVD is mounted on HTTP server at "/rhel79" location and Pulp is used to fetch latest packages (including the kernel).

4. Reboot to perform the system upgrade

Actual results:

No RHEL7 entry in Grub configuration, and following messages displayed during upgrade:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[  416.029243] upgrade[2955]: grubby fatal error: unable to find a suitable template^M
[  416.064057] upgrade[2955]: [127/658] (80%) cleaning kernel-2.6.32-754.el6...^M
 :
[  443.656832] upgrade[2955]: running %posttrans script for kernel-3.10.0-1160.83.1.el7^M
[  473.216297] upgrade[2955]: grubby fatal error: unable to find a suitable template^M
[  512.360489] upgrade[2955]: grubby fatal error: unable to find a suitable template^M
 :
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Expected results:

RHEL7 entry in Grub configuration

Additional info:

I don't know if it's up to Kernel package to fix this issue, or if we need to fix this in redhat-upgrade-tool, knowing that this would require an update of the tool in RHEL6 and I'm not sure there are still developers knowing the internals.

At the time of the upgrade, we have a few facts available to "detect" we are running from an upgrade:
- running kernel is always 3.10.0-1160.el7
- UPGRADE=1 is set in the environment
- DRACUT_SYSTEMD=1 is set in the environment
- UDEVVERSION=219 is set in the environment
- NEWROOT=/sysroot is set in the environment
- action=Boot is set in the environment

Maybe we could restore the old scriptlet (2 phases entry creation) when having all conditions.

Comment 3 Renaud Métrich 2023-02-01 08:51:51 UTC
I'm setting the Priority/Severity as HIGH because it's preventing customers from upgrading their RHEL6 systems.

Using the RHEL7.9 DVD level for the upgrade and not latest bits is usually not possible when having additional repositories (optional, supplementary, etc.) because many newer packages in these repositories require more recent components that RHEL 7.9 DVD level.

Comment 5 Petr Stodulka 2023-03-09 10:07:14 UTC
Hi, just confirming that Renaud is right. I've investigated the issue (https://bugzilla.redhat.com/show_bug.cgi?id=2108243#c10) and I see that the valid fix - and the best way to fix the issue - is to fix the scriptlet - either by providing more robust script or just reverting the change. As I am informed, we have customers that are nowadays upgrading or preparing for the upgrade from RHEL 6 to RHEL 7 and if they use up-to-date packages, as required officially, they will hit this crucial issue.

Comment 6 Petr Stodulka 2023-03-09 10:09:34 UTC
The bug has been introduce by the fix for the following BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1893756

Comment 7 Denys Vlasenko 2023-03-13 08:29:50 UTC
Testing the original fix proposed in mr 313. Interrupted install still works:

# yum install kernel-3.10.0-1160.89.1.el7.kpq.test.x86_64.rpm strace tcpdump mc gimp bzip2 traceroute gdb gcc firefox
...
  Installing : kernel-3.10.0-1160.89.1.el7.kpq.test.x86_64                       39/40 
  Installing : 1:mc-4.8.7-11.el7.x86_64                                          40/40 
^Z
[1]+  Stopped                 yum install kernel-3.10.0-1160.89.1.el7.kpq.test.x86_64.rpm strace tcpdump mc gimp bzip2 traceroute gdb gcc firefox
# reboot
...
# uname -sr
Linux 3.10.0-1160.89.1.el7.kpq.test.x86_64



Testing rhel6->rhel7 upgrade:

Install latest rhel6 (server, not client).

Get rhel7 install ISO image: rhel-server-7.9-x86_64-dvd.iso
(sha256sum:2cb36122a74be084c551bc7173d2d38a1cfb75c8ffbc1489c630c916d1b31b25 size:4526702592)

Get these packages:
preupgrade-assistant-2.6.2-1.el6.noarch.rpm
preupgrade-assistant-el6toel7-0.8.0-3.el6.noarch.rpm
preupgrade-assistant-el6toel7-data-0.20200704-1.el6.noarch.rpm
redhat-upgrade-tool-0.8.0-9.el6.noarch.rpm
(for example from https://access.redhat.com/downloads/content/69/ver=/rhel---6/6.10/x86_64/packages)
yum -y install *.rpm createrepo

Get the test kernel, in my case kernel-3.10.0-1160.89.1.el7.kpq.test.x86_64.rpm.

createrepo /path/to/test_kernel  # a dir with kernel-3.10.0-1160.89.1.el7.kpq.test.x86_64.rpm

Run a local http server which exports /path/to/test_kernel on http://127.0.0.1/

Run "preupg", it should finish with no errors precluding rhel6->rhel7 migration

Final step is to run "redhat-upgrade-tool", then reboot when prompted, and watch
boot process to see whether grub menu is not broken.
(Note that failed test makes machine unbootable).

redhat-upgrade-tool --nogpgcheck --iso rhel-server-7.9-x86_64-dvd.iso --cleanup-post
# ^^^ this should work - old kernel with no %posttrans changes is used, from ISO image

redhat-upgrade-tool --nogpgcheck --iso rhel-server-7.9-x86_64-dvd.iso --addrepo=latest='http://rhsm-pulp.corp.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os' --cleanup-post
# ^^^ this should FAIL - kernel with buggy %posttrans change used, from rhsm-pulp

redhat-upgrade-tool --nogpgcheck --iso rhel-server-7.9-x86_64-dvd.iso --addrepo=latest='http://127.0.0.1/' --cleanup-post
# ^^^ this works in my testing (and I verified that the kernel used is indeed the test one)

Comment 9 Petr Stodulka 2023-03-13 16:37:32 UTC
*** Bug 2108243 has been marked as a duplicate of this bug. ***

Comment 10 Jan Stancek 2023-03-16 09:31:07 UTC
(In reply to Petr Stodulka from comment #5)
> Hi, just confirming that Renaud is right. I've investigated the issue
> (https://bugzilla.redhat.com/show_bug.cgi?id=2108243#c10) and I see that the
> valid fix - and the best way to fix the issue - is to fix the scriptlet -
> either by providing more robust script or just reverting the change.

Creating boot entries before there is initrd is guaranteed and over time proven to cause issues for customers.
I'd rather see systemd (new-kernel-pkg) or grubby be made more robust - for example by storing kernel parameters somewhere, if it is last kernel being uninstalled.

Comment 11 Petr Stodulka 2023-03-20 12:00:06 UTC
> Creating boot entries before there is initrd is guaranteed and over time proven to cause issues for customers.
> I'd rather see systemd (new-kernel-pkg) or grubby be made more robust - for example by storing kernel parameters somewhere, if it is last kernel being uninstalled.

Has there been any situation in the original bug, when the kernel posttrans scriptlet has not been executed? In case the scriptlet has been always executed, nothing should prevent kernel to deal with the situation. Fixing the issue anywhere else than in kernel scriptlet seems to me too much work when speaking about RHEL 7.9. Especially in case we speak about corner-corner case which we know that people could hit:
* if they in-place upgrade 6 -> 7 (in 100% cases on intel)
* if they boot to rescue kernel / live OS and from there remove all installed kernel packages manually and then installing a kernel again (which I would consider as unsupported action if somone does something like that)

Comment 12 Denys Vlasenko 2023-04-09 18:38:02 UTC
(In reply to Petr Stodulka from comment #11)
> > Creating boot entries before there is initrd is guaranteed and over time proven to cause issues for customers.
> > I'd rather see systemd (new-kernel-pkg) or grubby be made more robust - for example by storing kernel parameters somewhere, if it is last kernel being uninstalled.
> 
> Has there been any situation in the original bug, when the kernel posttrans
> scriptlet has not been executed?

Yes, indeed.

The typical scenario when this happens in real world is when admin simply runs 
"yum update".

This tries updating many packages, and if any package's update scripts
is buggy in a way that "yum update" hangs, admin has little choice than killing it.

In this case, if a newer kernel was already installed, there will be a new
grub entry for it, but no initramfs.

On next reboot, grub will not be able to find initramfs, and boot will fail.

I think we had about 15 user complaints about this happening.

Comment 13 Petr Stodulka 2023-04-14 12:21:26 UTC
Hi Denys, thanks for the info. Hearing for the first time about such issues on RHEL, but it's true that real systems contain a lot of custom & 3rd-party content too which could affect it also. Not mentioning all possible configurations of real systems.

Comment 15 Linqing Lu 2023-05-26 13:52:50 UTC
(In reply to Petr Stodulka from comment #6)
> The bug has been introduce by the fix for the following BZ: Red
> Hathttps://bugzilla.redhat.com/show_bug.cgi?id=1893756

@zhijwang 
Hi Zhijun,

Can you also take this bug as it's a follow up for 1893756?
Let us know if you need a hand.

Thanks!

Comment 16 zhijwang 2023-05-29 08:47:18 UTC
(In reply to Linqing Lu from comment #15)
> Hi Zhijun,
> 
> Can you also take this bug as it's a follow up for 1893756?
> Let us know if you need a hand.
> 
> Thanks!

Sure, I will take it. Thanks Linqing!

Comment 19 RHEL Program Management 2023-09-12 11:54:38 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 20 RHEL Program Management 2023-09-12 11:55:23 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.