Bug 1755841
Summary: | After Leapp upgrade 7.6->8.0, the machine boots with 7.6 kernel | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jiri Stransky <jstransk> | ||||||||
Component: | leapp-repository | Assignee: | Leapp team <leapp-notifications> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Alois Mahdal <amahdal> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 7.6 | CC: | fmartine, jfrancoa, mbocek, michele, mreznik, msekleta, pbabbar, pstodulk | ||||||||
Target Milestone: | rc | Keywords: | Upgrades | ||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | leapp-repository-0.10.0-2.el7_8 | Doc Type: | No Doc Update | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2020-04-29 01:45:58 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1727807 | ||||||||||
Attachments: |
|
Description
Jiri Stransky
2019-09-26 10:35:25 UTC
Created attachment 1619445 [details]
successful upgrade 7.7->8.0
Created attachment 1619446 [details]
broken upgrade 7.7->8.0
Created attachment 1621762 [details]
sosreport broken 7.6->8.0 upgrade
Interesting thing i noticed now, the machine is upgraded to RHEL 8, and RHEL 8 kernel is installed, but the machine is running a RHEL 7 kernel. [root@controller-0 ~]# ll /dev/log srw-rw-rw-. 1 root root 0 říj 1 13:01 /dev/log [root@controller-0 ~]# cat /etc/redhat-release Red Hat Enterprise Linux release 8.0 (Ootpa) [root@controller-0 ~]# uname -a Linux controller-0 3.10.0-957.21.3.el7.x86_64 #1 SMP Fri Jun 14 02:54:29 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux [root@controller-0 ~]# rpm -qa | grep kernel | sort kernel-3.10.0-957.21.3.el7.x86_64 kernel-4.18.0-80.11.2.el8_0.x86_64 kernel-4.18.0-80.4.2.el8_0.x86_64 kernel-core-4.18.0-80.11.2.el8_0.x86_64 kernel-core-4.18.0-80.4.2.el8_0.x86_64 kernel-headers-4.18.0-80.11.2.el8_0.x86_64 kernel-modules-4.18.0-80.11.2.el8_0.x86_64 kernel-modules-4.18.0-80.4.2.el8_0.x86_64 kernel-modules-extra-4.18.0-80.11.2.el8_0.x86_64 kernel-modules-extra-4.18.0-80.4.2.el8_0.x86_64 kernel-rpm-macros-116-1.el8.noarch kernel-tools-4.18.0-80.11.2.el8_0.x86_64 kernel-tools-libs-4.18.0-80.11.2.el8_0.x86_64 kernel-workaround-0.1-1.el8.noarch jstranky: It happens from time to time. We have no clear information, why the entry with the old kernel is used as default as we run cmd to set as default entry with the new kernel. This could be resolved when se start to remove all RHEL 7 rpm leftovers (unfortunately, there is no way to calculate the upgrade transaction with remove of the original kernel). controller-0 systemd[1]: systemd-journald-dev-log.socket: Failed to create symlink /run/systemd/journal/dev-log → /dev/log, ignoring: File exists This looks like some kind of race condition indeed. Unfortunately do not know what is actually creating "/dev/log". Maybe "systemd" can help to check? I think the cause of the failure in systemd-journald-dev-log.socket could perhaps be that the service is meant for running with RHEL 8 kernel but the system is in fact running on RHEL 7 kernel? I don't know what could be the root cause of running on RHEL 7 kernel though... I'll amend the title to what we presently think is the root cause (or not entirely root but closer to root than the /dev/log issue). Is there any chance this can be looked into? I think presently this is the nastiest bug for OpenStack because we don't have a workaround, this intermittently breaks our upgrade testing. The bug has higher priority now (I am setting the priority in the BZ as well to reflect it). That means - in the worst case, I will look at it again next week. @Jiri, would you be able to prepare a reproducer? Can we expect to hit the issue at least once out of let's say 10 runs? @Jiri, were there any actions done right after the upgrade? Do you have any custom actors? I see there are 2x RHEL8 kernels after upgrade: kernel.x86_64 4.18.0-80.4.2.el8_0 @System kernel.x86_64 4.18.0-80.11.2.el8_0 @rhosp-rhel-8.0-baseos We would need logs from a system right after the upgrade without any tunings. But the reproducer is preferable. Thanks... I don't think we did anything extra after the RHEL upgrade besides investigating. I'll try to provide a reproducer, unfortunately the issue is intermittent and i haven't hit it recently (even in a single env with all 3 openstack controller VMs configured the same, i only hit it on some of them). I'll keep a machine on standby in case i hit it again. Hello, This very much looks like Bug #1640979 that was caused by GRUB not being able to sort entries correctly if these started with a number. The bug was fixed by the following commit https://github.com/rhboot/grub2/commit/291907f1cf6f51ec3929e20af3a99d00d9bc9e34, but the fix is for the GRUB core and that is not updated when the grub2 package is upgraded. The GRUB core is installed in the gap that exists between the end of the Master Boot Record (MBR) and the start of the first partition. To update the GRUB core, the grub2-install command has to be executed. So I think that LEAPP should executed grub2-install after installing the RHEL8 packages to make sure that the GRUB used is the latest from RHEL8 and not the one from RHEL7. Fixed in upstream: https://github.com/oamg/leapp-repository/commit/99518933eda0a7522edb2e0fd2e63b450d7706ab Vast majority of tests already passed, so with that, and with nature of the original issue in mind: VERIFIED on all platforms with: - leapp-0.10.0-2.el7_8 - leapp-repository-0.10.0-2.el7_8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1959 |