1755841 – After Leapp upgrade 7.6->8.0, the machine boots with 7.6 kernel

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1755841 - After Leapp upgrade 7.6->8.0, the machine boots with 7.6 kernel

Summary: After Leapp upgrade 7.6->8.0, the machine boots with 7.6 kernel

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	leapp-repository
Sub Component:
Version:	7.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Leapp team
QA Contact:	Alois Mahdal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1727807
TreeView+	depends on / blocked

Reported:	2019-09-26 10:35 UTC by Jiri Stransky
Modified:	2020-04-29 01:46 UTC (History)
CC List:	8 users (show)
Fixed In Version:	leapp-repository-0.10.0-2.el7_8
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-04-29 01:45:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
successful upgrade 7.7->8.0 (4.18 MB, application/gzip) 2019-09-26 10:41 UTC, Jiri Stransky	no flags	Details
broken upgrade 7.7->8.0 (4.18 MB, application/gzip) 2019-09-26 10:42 UTC, Jiri Stransky	no flags	Details
sosreport broken 7.6->8.0 upgrade (15.87 MB, application/x-xz) 2019-10-02 08:38 UTC, Jiri Stransky	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:1959	0	None	None	None	2020-04-29 01:46:15 UTC

Description Jiri Stransky 2019-09-26 10:35:25 UTC

Description of problem:

Sometimes our Leapp upgrades from 7.7 to 8.0 succeed but the system is in an incorrect state afterwards, i've noticed issues with syslog and iptables. The clearest sign of the problem is incorrect /dev/log socket -- it should be a symlink to journald, but it isn't.

Here are outputs from 2 machines, both were upgraded from 7.7 to 8.0 with Leapp:

Successful upgrade and correct state:

[root@controller-0 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.0 (Ootpa)
[root@controller-0 ~]# ll /dev/log
lrwxrwxrwx. 1 root root 28 Sep 25 11:47 /dev/log -> /run/systemd/journal/dev-log
[root@controller-0 ~]# logger -t test test
[root@controller-0 ~]#

Successful upgrade but incorrect state:

[root@controller-2 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.0 (Ootpa)
[root@controller-2 ~]# ll /dev/log
srw-rw-rw-. 1 root root 0 Sep 25 13:30 /dev/log
[root@controller-2 ~]# logger -t test test
logger: socket /dev/log: Connection refused
[root@controller-2 ~]#

I realize 7.7 -> 8.0 is not supported upgrade but i'm thinking this issue could perhaps be hit in a supported upgrade too.


How reproducible:

Intermittent, might be some sort of race condition, occurs quite rarely in our testing (we hit it just twice so far, but by different engineers and in different environments).


Steps to Reproduce:

We run Leapp upgrade in testing without RHSM and skipping the OS release check to let us upgrade from 7.7. Essentially:

LEAPP_SKIP_CHECK_OS_RELEASE=1 LEAPP_DEVEL_SKIP_RHSM=1 sudo -E leapp upgrade --debug

I will upload logs both from working and broken upgrades.

Comment 2 Jiri Stransky 2019-09-26 10:41:44 UTC

Created attachment 1619445 [details]
successful upgrade 7.7->8.0

Comment 3 Jiri Stransky 2019-09-26 10:42:23 UTC

Created attachment 1619446 [details]
broken upgrade 7.7->8.0

Comment 5 Jiri Stransky 2019-10-02 08:38:43 UTC

Created attachment 1621762 [details]
sosreport broken 7.6->8.0 upgrade

Comment 6 Jiri Stransky 2019-10-02 08:42:41 UTC

Interesting thing i noticed now, the machine is upgraded to RHEL 8, and RHEL 8 kernel is installed, but the machine is running a RHEL 7 kernel.

[root@controller-0 ~]# ll /dev/log 
srw-rw-rw-. 1 root root 0 říj  1 13:01 /dev/log

[root@controller-0 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.0 (Ootpa)

[root@controller-0 ~]# uname -a
Linux controller-0 3.10.0-957.21.3.el7.x86_64 #1 SMP Fri Jun 14 02:54:29 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux

[root@controller-0 ~]# rpm -qa | grep kernel | sort
kernel-3.10.0-957.21.3.el7.x86_64
kernel-4.18.0-80.11.2.el8_0.x86_64
kernel-4.18.0-80.4.2.el8_0.x86_64
kernel-core-4.18.0-80.11.2.el8_0.x86_64
kernel-core-4.18.0-80.4.2.el8_0.x86_64
kernel-headers-4.18.0-80.11.2.el8_0.x86_64
kernel-modules-4.18.0-80.11.2.el8_0.x86_64
kernel-modules-4.18.0-80.4.2.el8_0.x86_64
kernel-modules-extra-4.18.0-80.11.2.el8_0.x86_64
kernel-modules-extra-4.18.0-80.4.2.el8_0.x86_64
kernel-rpm-macros-116-1.el8.noarch
kernel-tools-4.18.0-80.11.2.el8_0.x86_64
kernel-tools-libs-4.18.0-80.11.2.el8_0.x86_64
kernel-workaround-0.1-1.el8.noarch

Comment 7 Petr Stodulka 2019-10-02 09:04:31 UTC

jstranky: It happens from time to time. We have no clear information, why the entry with the old kernel is used as default as we run cmd to set as default entry with the new kernel. This could be resolved when se start to remove all RHEL 7 rpm leftovers (unfortunately, there is no way to calculate the upgrade transaction with remove of the original kernel).

Comment 8 Michal Reznik 2019-10-03 14:05:57 UTC

controller-0 systemd[1]: systemd-journald-dev-log.socket: Failed to create symlink /run/systemd/journal/dev-log → /dev/log, ignoring: File exists  

This looks like some kind of race condition indeed. Unfortunately do not know what is actually creating "/dev/log". Maybe "systemd" can help to check?

Comment 9 Jiri Stransky 2019-10-03 15:26:34 UTC

I think the cause of the failure in systemd-journald-dev-log.socket could perhaps be that the service is meant for running with RHEL 8 kernel but the system is in fact running on RHEL 7 kernel? I don't know what could be the root cause of running on RHEL 7 kernel though...

Comment 10 Jiri Stransky 2019-10-07 14:37:03 UTC

I'll amend the title to what we presently think is the root cause (or not entirely root but closer to root than the /dev/log issue).

Comment 11 Jiri Stransky 2019-11-12 16:24:13 UTC

Is there any chance this can be looked into? I think presently this is the nastiest bug for OpenStack because we don't have a workaround, this intermittently breaks our upgrade testing.

Comment 12 Petr Stodulka 2019-11-13 13:57:52 UTC

The bug has higher priority now (I am setting the priority in the BZ as well to reflect it).

Comment 13 Petr Stodulka 2019-11-13 13:58:23 UTC

That means - in the worst case, I will look at it again next week.

Comment 14 Michal Reznik 2019-11-13 14:33:44 UTC

@Jiri, would you be able to prepare a reproducer? Can we expect to hit the issue at least once out of let's say 10 runs?

Comment 15 Michal Reznik 2019-11-13 15:03:23 UTC

@Jiri, were there any actions done right after the upgrade? Do you have any custom actors?

I see there are 2x RHEL8 kernels after upgrade:

kernel.x86_64                                         4.18.0-80.4.2.el8_0                                      @System                                 
kernel.x86_64                                         4.18.0-80.11.2.el8_0                                     @rhosp-rhel-8.0-baseos

We would need logs from a system right after the upgrade without any tunings. But the reproducer is preferable. Thanks...

Comment 16 Jiri Stransky 2019-11-14 15:35:34 UTC

I don't think we did anything extra after the RHEL upgrade besides investigating. I'll try to provide a reproducer, unfortunately the issue is intermittent and i haven't hit it recently (even in a single env with all 3 openstack controller VMs configured the same, i only hit it on some of them). I'll keep a machine on standby in case i hit it again.

Comment 17 Javier Martinez Canillas 2019-12-16 13:47:03 UTC

Hello,

This very much looks like Bug #1640979 that was caused by GRUB not being able to sort entries correctly if these started with a number.

The bug was fixed by the following commit https://github.com/rhboot/grub2/commit/291907f1cf6f51ec3929e20af3a99d00d9bc9e34, but the fix is for the GRUB core and that is not updated when the grub2 package is upgraded. The GRUB core is installed in the gap that exists between the end of the Master Boot Record (MBR) and the start of the first partition.

To update the GRUB core, the grub2-install command has to be executed. So I think that LEAPP should executed grub2-install after installing the RHEL8 packages to make sure that the GRUB used is the latest from RHEL8 and not the one from RHEL7.

Comment 19 Michal Reznik 2020-02-05 08:46:05 UTC

Fixed in upstream:

https://github.com/oamg/leapp-repository/commit/99518933eda0a7522edb2e0fd2e63b450d7706ab

Comment 24 Alois Mahdal 2020-04-24 15:22:38 UTC

Vast majority of tests already passed, so with that, and with nature of the original issue in mind:

VERIFIED on all platforms with:

      - leapp-0.10.0-2.el7_8
      - leapp-repository-0.10.0-2.el7_8

Comment 26 errata-xmlrpc 2020-04-29 01:45:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1959

Note You need to log in before you can comment on or make changes to this bug.