Hide Forgot
Description of problem: Sometimes our Leapp upgrades from 7.7 to 8.0 succeed but the system is in an incorrect state afterwards, i've noticed issues with syslog and iptables. The clearest sign of the problem is incorrect /dev/log socket -- it should be a symlink to journald, but it isn't. Here are outputs from 2 machines, both were upgraded from 7.7 to 8.0 with Leapp: Successful upgrade and correct state: [root@controller-0 ~]# cat /etc/redhat-release Red Hat Enterprise Linux release 8.0 (Ootpa) [root@controller-0 ~]# ll /dev/log lrwxrwxrwx. 1 root root 28 Sep 25 11:47 /dev/log -> /run/systemd/journal/dev-log [root@controller-0 ~]# logger -t test test [root@controller-0 ~]# Successful upgrade but incorrect state: [root@controller-2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux release 8.0 (Ootpa) [root@controller-2 ~]# ll /dev/log srw-rw-rw-. 1 root root 0 Sep 25 13:30 /dev/log [root@controller-2 ~]# logger -t test test logger: socket /dev/log: Connection refused [root@controller-2 ~]# I realize 7.7 -> 8.0 is not supported upgrade but i'm thinking this issue could perhaps be hit in a supported upgrade too. How reproducible: Intermittent, might be some sort of race condition, occurs quite rarely in our testing (we hit it just twice so far, but by different engineers and in different environments). Steps to Reproduce: We run Leapp upgrade in testing without RHSM and skipping the OS release check to let us upgrade from 7.7. Essentially: LEAPP_SKIP_CHECK_OS_RELEASE=1 LEAPP_DEVEL_SKIP_RHSM=1 sudo -E leapp upgrade --debug I will upload logs both from working and broken upgrades.
Created attachment 1619445 [details] successful upgrade 7.7->8.0
Created attachment 1619446 [details] broken upgrade 7.7->8.0
Created attachment 1621762 [details] sosreport broken 7.6->8.0 upgrade
Interesting thing i noticed now, the machine is upgraded to RHEL 8, and RHEL 8 kernel is installed, but the machine is running a RHEL 7 kernel. [root@controller-0 ~]# ll /dev/log srw-rw-rw-. 1 root root 0 říj 1 13:01 /dev/log [root@controller-0 ~]# cat /etc/redhat-release Red Hat Enterprise Linux release 8.0 (Ootpa) [root@controller-0 ~]# uname -a Linux controller-0 3.10.0-957.21.3.el7.x86_64 #1 SMP Fri Jun 14 02:54:29 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux [root@controller-0 ~]# rpm -qa | grep kernel | sort kernel-3.10.0-957.21.3.el7.x86_64 kernel-4.18.0-80.11.2.el8_0.x86_64 kernel-4.18.0-80.4.2.el8_0.x86_64 kernel-core-4.18.0-80.11.2.el8_0.x86_64 kernel-core-4.18.0-80.4.2.el8_0.x86_64 kernel-headers-4.18.0-80.11.2.el8_0.x86_64 kernel-modules-4.18.0-80.11.2.el8_0.x86_64 kernel-modules-4.18.0-80.4.2.el8_0.x86_64 kernel-modules-extra-4.18.0-80.11.2.el8_0.x86_64 kernel-modules-extra-4.18.0-80.4.2.el8_0.x86_64 kernel-rpm-macros-116-1.el8.noarch kernel-tools-4.18.0-80.11.2.el8_0.x86_64 kernel-tools-libs-4.18.0-80.11.2.el8_0.x86_64 kernel-workaround-0.1-1.el8.noarch
jstranky: It happens from time to time. We have no clear information, why the entry with the old kernel is used as default as we run cmd to set as default entry with the new kernel. This could be resolved when se start to remove all RHEL 7 rpm leftovers (unfortunately, there is no way to calculate the upgrade transaction with remove of the original kernel).
controller-0 systemd[1]: systemd-journald-dev-log.socket: Failed to create symlink /run/systemd/journal/dev-log → /dev/log, ignoring: File exists This looks like some kind of race condition indeed. Unfortunately do not know what is actually creating "/dev/log". Maybe "systemd" can help to check?
I think the cause of the failure in systemd-journald-dev-log.socket could perhaps be that the service is meant for running with RHEL 8 kernel but the system is in fact running on RHEL 7 kernel? I don't know what could be the root cause of running on RHEL 7 kernel though...
I'll amend the title to what we presently think is the root cause (or not entirely root but closer to root than the /dev/log issue).
Is there any chance this can be looked into? I think presently this is the nastiest bug for OpenStack because we don't have a workaround, this intermittently breaks our upgrade testing.
The bug has higher priority now (I am setting the priority in the BZ as well to reflect it).
That means - in the worst case, I will look at it again next week.
@Jiri, would you be able to prepare a reproducer? Can we expect to hit the issue at least once out of let's say 10 runs?
@Jiri, were there any actions done right after the upgrade? Do you have any custom actors? I see there are 2x RHEL8 kernels after upgrade: kernel.x86_64 4.18.0-80.4.2.el8_0 @System kernel.x86_64 4.18.0-80.11.2.el8_0 @rhosp-rhel-8.0-baseos We would need logs from a system right after the upgrade without any tunings. But the reproducer is preferable. Thanks...
I don't think we did anything extra after the RHEL upgrade besides investigating. I'll try to provide a reproducer, unfortunately the issue is intermittent and i haven't hit it recently (even in a single env with all 3 openstack controller VMs configured the same, i only hit it on some of them). I'll keep a machine on standby in case i hit it again.
Hello, This very much looks like Bug #1640979 that was caused by GRUB not being able to sort entries correctly if these started with a number. The bug was fixed by the following commit https://github.com/rhboot/grub2/commit/291907f1cf6f51ec3929e20af3a99d00d9bc9e34, but the fix is for the GRUB core and that is not updated when the grub2 package is upgraded. The GRUB core is installed in the gap that exists between the end of the Master Boot Record (MBR) and the start of the first partition. To update the GRUB core, the grub2-install command has to be executed. So I think that LEAPP should executed grub2-install after installing the RHEL8 packages to make sure that the GRUB used is the latest from RHEL8 and not the one from RHEL7.
Fixed in upstream: https://github.com/oamg/leapp-repository/commit/99518933eda0a7522edb2e0fd2e63b450d7706ab
Vast majority of tests already passed, so with that, and with nature of the original issue in mind: VERIFIED on all platforms with: - leapp-0.10.0-2.el7_8 - leapp-repository-0.10.0-2.el7_8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1959