Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
We have now several customers (at least 3) facing a boot issue after they updated their system to RHEL 7.8's systemd.
They all run on VMWare, but it's probably not related, since I can reproduce on KVM myself with some hacks (see below reproducer).
In a nutshell, the boot proceeds then it enters emergency.target because initrd-switch-root.service fails:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
● initrd-switch-root.service - Switch Root
Loaded: loaded (/usr/lib/systemd/system/initrd-switch-root.service; static; vendor preset: disabled)
Active: failed (Result: signal) since Fri 2020-04-17 14:36:17 CEST; 5min ago
Process: 502 ExecStart=/usr/bin/systemctl --no-block --force switch-root /sysroot (code=killed, signal=TERM)
Main PID: 502 (code=killed, signal=TERM)
Apr 17 14:36:17 vm-up76 systemd[1]: Starting Switch Root...
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
The condition to hit this seems to require to have:
- systemd in the initramfs is the "old" systemd prior to system update e.g. "systemd-219-62.el7_6.7.x86_64")
- no serial console configured
In real customer scenarios, there is indeed an old systemd because the customers updated in 2 phases:
- kernel + microcode
- rest of the system
Due to this, *no* initramfs is rebuilt after updating systemd.
My reproducer uses the following:
- update everything except systemd --> builds a new initramfs with old systemd inside
- reboot then update systemd --> initramfs not rebuilt
Version-Release number of selected component (if applicable):
systemd-219-62.el7_6.7.x86_64 -> systemd-219-73.el7_8.XX
How reproducible:
Always on customer sites
Using a hack in my lab
Steps to Reproduce:
1. Install a system with 2 CPUs with RHEL 7.6 DVD
2. Update the system to RHEL 7.6 Latest and reboot
3. Update the system to RHEL 7.8 latest *except* systemd and reboot
4. Update systemd to latest
Actual results:
Booting with the initramfs which contains old systemd enters Emergency mode, 100% reproducible
Expected results:
Additional info:
Rebuilding the initramfs with latest systemd fixes the issue for some reason.
We need to understand why ...
Indeed, if updating systemd requires a initramfs rebuild, then systemd post-install shall be updated to do so
In order to reproduce easily, I perform the following hack:
1. Update the system to RHEL 7.6 Latest and reboot
2. Edit /usr/lib/systemd/system/initrd-cleanup.service to delay its end
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
ExecStart=/bin/bash -c '/usr/bin/systemctl --no-block isolate initrd-switch-root.target && sleep 5'
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
3. Update the system to RHEL 7.8 latest *except* systemd and reboot
4. Update systemd to latest
Doing so triggers the issue.
I then get the following journal (with "debug"):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Trying to enqueue job initrd-switch-root.target/start/isolate
Installed new job systemd-udevd-control.socket/stop as 80
Installed new job timers.target/stop as 90
Installed new job initrd.target/stop as 85
Installed new job swap.target/stop as 81
Installed new job paths.target/stop as 100
Installed new job remote-fs.target/stop as 96
Installed new job systemd-udev-trigger.service/stop as 91
Installed new job local-fs.target/stop as 95
Installed new job sockets.target/stop as 99
Installed new job systemd-tmpfiles-setup-dev.service/stop as 102
HERE: job canceled
Job initrd-cleanup.service/start finished, result=canceled
Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=1 reply_cookie=0 error=n/a
Installed new job initrd-cleanup.service/stop as 94
Installed new job dracut-cmdline.service/stop as 82
Installed new job systemd-udevd-kernel.socket/stop as 78
Installed new job dracut-pre-udev.service/stop as 92
Installed new job dracut-initqueue.service/stop as 88
Installed new job remote-fs-pre.target/stop as 101
Installed new job initrd-switch-root.service/start as 55
Installed new job plymouth-switch-root.service/start as 58
Installed new job initrd-switch-root.target/start as 54
Installed new job slices.target/stop as 89
Installed new job basic.target/stop as 83
Installed new job initrd-udevadm-cleanup-db.service/start as 77
Installed new job sysinit.target/stop as 97
Installed new job dracut-pre-pivot.service/stop as 86
Installed new job systemd-sysctl.service/stop as 87
Installed new job systemd-udevd.service/stop as 79
Installed new job kmod-static-nodes.service/stop as 93
Enqueued job initrd-switch-root.target/start as 54
[...]
initrd-cleanup.service changed start -> stop-sigterm
Received SIGCHLD from PID 492 (bash).
Child 492 (bash) died (code=killed, status=15/TERM)
Child 492 belongs to initrd-cleanup.service
initrd-cleanup.service: main process exited, code=killed, status=15/TERM
initrd-cleanup.service changed stop-sigterm -> dead
Job initrd-cleanup.service/stop finished, result=done
Stopped Cleaning Up and Shutting Down Daemons.
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
The weird thing is that emergency.target enters not because of initrd-cleanup.service, but initrd-switch-root.service which doesn't print any suspicious log!
Folks from Alibaba are also running into the same problem and they proposed solution upstream.
https://github.com/systemd-rhel/rhel-7/pull/117
Even though the proposed fix is a hack we have decided to go ahead and merge it (after the issues pointed out in code review get fixed) due to number of cases attached to the BZ.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (systemd bug fix and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2020:5007