Description of problem: Since fixing BZ #1893756, the bootloader entry is created in posttrans scriptlet only. In the past (e.g. up to 3.10.0-1160.59.1.el7 included), it was done in 2 phases: - postinstall to create the entry without the initrd (because initrd is not created yet) - posttrans to update the entry with the initrd Due to this change, the upgrade from RHEL6 fails due to grubby failing in error when adding the kernel: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- grubby fatal error: unable to find a suitable template -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- This leads to rebooting the system with no kernel entry available at all, making the system completely unbootable. The reason for this is detailed below: 1. Initially after running redhat-upgrade-tool and before rebooting, there are 2 bootloader entries at least: title System Upgrade (redhat-upgrade-tool) --> the RHEL7 kernel used to upgrade title Red Hat Enterprise Linux Server (2.6.32-754.35.1.el6.x86_64) --> the RHEL6 kernel 2. Upon rebooting to perform the upgrade, the "System Upgrade" entry is deleted (this is to avoid breaking if system upgrade failed) 3. The upgrade happens, which deletes the RHEL6 kernel and associated entry 4. %posttrans of the RHEL7 kernel executes, which makes grubby fail since it cannot copy the kernel arguments for any kernel since there are none left Version-Release number of selected component (if applicable): kernel-3.10.0-1160.62.1.el7 and later How reproducible: Always Steps to Reproduce: 1. Setup a RHEL6 system and update it to latest 2. Install redhat-upgrade-tool # yum -y install preupgrade-assistant preupgrade-assistant-el6toel7 redhat-upgrade-tool # preupg 3. Prepare the upgrade with latest RHEL7 bits # redhat-upgrade-tool --nogpgcheck --network 7.9 --instrepo http://192.168.122.1/rhel79 --addrepo=latest='http://rhsm-pulp.corp.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os' --cleanup-post Here above the RHEL7.9 DVD is mounted on HTTP server at "/rhel79" location and Pulp is used to fetch latest packages (including the kernel). 4. Reboot to perform the system upgrade Actual results: No RHEL7 entry in Grub configuration, and following messages displayed during upgrade: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- [ 416.029243] upgrade[2955]: grubby fatal error: unable to find a suitable template^M [ 416.064057] upgrade[2955]: [127/658] (80%) cleaning kernel-2.6.32-754.el6...^M : [ 443.656832] upgrade[2955]: running %posttrans script for kernel-3.10.0-1160.83.1.el7^M [ 473.216297] upgrade[2955]: grubby fatal error: unable to find a suitable template^M [ 512.360489] upgrade[2955]: grubby fatal error: unable to find a suitable template^M : -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Expected results: RHEL7 entry in Grub configuration Additional info: I don't know if it's up to Kernel package to fix this issue, or if we need to fix this in redhat-upgrade-tool, knowing that this would require an update of the tool in RHEL6 and I'm not sure there are still developers knowing the internals. At the time of the upgrade, we have a few facts available to "detect" we are running from an upgrade: - running kernel is always 3.10.0-1160.el7 - UPGRADE=1 is set in the environment - DRACUT_SYSTEMD=1 is set in the environment - UDEVVERSION=219 is set in the environment - NEWROOT=/sysroot is set in the environment - action=Boot is set in the environment Maybe we could restore the old scriptlet (2 phases entry creation) when having all conditions.
I'm setting the Priority/Severity as HIGH because it's preventing customers from upgrading their RHEL6 systems. Using the RHEL7.9 DVD level for the upgrade and not latest bits is usually not possible when having additional repositories (optional, supplementary, etc.) because many newer packages in these repositories require more recent components that RHEL 7.9 DVD level.
Hi, just confirming that Renaud is right. I've investigated the issue (https://bugzilla.redhat.com/show_bug.cgi?id=2108243#c10) and I see that the valid fix - and the best way to fix the issue - is to fix the scriptlet - either by providing more robust script or just reverting the change. As I am informed, we have customers that are nowadays upgrading or preparing for the upgrade from RHEL 6 to RHEL 7 and if they use up-to-date packages, as required officially, they will hit this crucial issue.
The bug has been introduce by the fix for the following BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1893756
Testing the original fix proposed in mr 313. Interrupted install still works: # yum install kernel-3.10.0-1160.89.1.el7.kpq.test.x86_64.rpm strace tcpdump mc gimp bzip2 traceroute gdb gcc firefox ... Installing : kernel-3.10.0-1160.89.1.el7.kpq.test.x86_64 39/40 Installing : 1:mc-4.8.7-11.el7.x86_64 40/40 ^Z [1]+ Stopped yum install kernel-3.10.0-1160.89.1.el7.kpq.test.x86_64.rpm strace tcpdump mc gimp bzip2 traceroute gdb gcc firefox # reboot ... # uname -sr Linux 3.10.0-1160.89.1.el7.kpq.test.x86_64 Testing rhel6->rhel7 upgrade: Install latest rhel6 (server, not client). Get rhel7 install ISO image: rhel-server-7.9-x86_64-dvd.iso (sha256sum:2cb36122a74be084c551bc7173d2d38a1cfb75c8ffbc1489c630c916d1b31b25 size:4526702592) Get these packages: preupgrade-assistant-2.6.2-1.el6.noarch.rpm preupgrade-assistant-el6toel7-0.8.0-3.el6.noarch.rpm preupgrade-assistant-el6toel7-data-0.20200704-1.el6.noarch.rpm redhat-upgrade-tool-0.8.0-9.el6.noarch.rpm (for example from https://access.redhat.com/downloads/content/69/ver=/rhel---6/6.10/x86_64/packages) yum -y install *.rpm createrepo Get the test kernel, in my case kernel-3.10.0-1160.89.1.el7.kpq.test.x86_64.rpm. createrepo /path/to/test_kernel # a dir with kernel-3.10.0-1160.89.1.el7.kpq.test.x86_64.rpm Run a local http server which exports /path/to/test_kernel on http://127.0.0.1/ Run "preupg", it should finish with no errors precluding rhel6->rhel7 migration Final step is to run "redhat-upgrade-tool", then reboot when prompted, and watch boot process to see whether grub menu is not broken. (Note that failed test makes machine unbootable). redhat-upgrade-tool --nogpgcheck --iso rhel-server-7.9-x86_64-dvd.iso --cleanup-post # ^^^ this should work - old kernel with no %posttrans changes is used, from ISO image redhat-upgrade-tool --nogpgcheck --iso rhel-server-7.9-x86_64-dvd.iso --addrepo=latest='http://rhsm-pulp.corp.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os' --cleanup-post # ^^^ this should FAIL - kernel with buggy %posttrans change used, from rhsm-pulp redhat-upgrade-tool --nogpgcheck --iso rhel-server-7.9-x86_64-dvd.iso --addrepo=latest='http://127.0.0.1/' --cleanup-post # ^^^ this works in my testing (and I verified that the kernel used is indeed the test one)
*** Bug 2108243 has been marked as a duplicate of this bug. ***
(In reply to Petr Stodulka from comment #5) > Hi, just confirming that Renaud is right. I've investigated the issue > (https://bugzilla.redhat.com/show_bug.cgi?id=2108243#c10) and I see that the > valid fix - and the best way to fix the issue - is to fix the scriptlet - > either by providing more robust script or just reverting the change. Creating boot entries before there is initrd is guaranteed and over time proven to cause issues for customers. I'd rather see systemd (new-kernel-pkg) or grubby be made more robust - for example by storing kernel parameters somewhere, if it is last kernel being uninstalled.
> Creating boot entries before there is initrd is guaranteed and over time proven to cause issues for customers. > I'd rather see systemd (new-kernel-pkg) or grubby be made more robust - for example by storing kernel parameters somewhere, if it is last kernel being uninstalled. Has there been any situation in the original bug, when the kernel posttrans scriptlet has not been executed? In case the scriptlet has been always executed, nothing should prevent kernel to deal with the situation. Fixing the issue anywhere else than in kernel scriptlet seems to me too much work when speaking about RHEL 7.9. Especially in case we speak about corner-corner case which we know that people could hit: * if they in-place upgrade 6 -> 7 (in 100% cases on intel) * if they boot to rescue kernel / live OS and from there remove all installed kernel packages manually and then installing a kernel again (which I would consider as unsupported action if somone does something like that)
(In reply to Petr Stodulka from comment #11) > > Creating boot entries before there is initrd is guaranteed and over time proven to cause issues for customers. > > I'd rather see systemd (new-kernel-pkg) or grubby be made more robust - for example by storing kernel parameters somewhere, if it is last kernel being uninstalled. > > Has there been any situation in the original bug, when the kernel posttrans > scriptlet has not been executed? Yes, indeed. The typical scenario when this happens in real world is when admin simply runs "yum update". This tries updating many packages, and if any package's update scripts is buggy in a way that "yum update" hangs, admin has little choice than killing it. In this case, if a newer kernel was already installed, there will be a new grub entry for it, but no initramfs. On next reboot, grub will not be able to find initramfs, and boot will fail. I think we had about 15 user complaints about this happening.
Hi Denys, thanks for the info. Hearing for the first time about such issues on RHEL, but it's true that real systems contain a lot of custom & 3rd-party content too which could affect it also. Not mentioning all possible configurations of real systems.
(In reply to Petr Stodulka from comment #6) > The bug has been introduce by the fix for the following BZ: Red > Hathttps://bugzilla.redhat.com/show_bug.cgi?id=1893756 @zhijwang Hi Zhijun, Can you also take this bug as it's a follow up for 1893756? Let us know if you need a hand. Thanks!
(In reply to Linqing Lu from comment #15) > Hi Zhijun, > > Can you also take this bug as it's a follow up for 1893756? > Let us know if you need a hand. > > Thanks! Sure, I will take it. Thanks Linqing!