Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Cause: When multipath is configured with "find_multipaths smart" (which it is when booting into anaconda) and a new storage device appears, it starts a systemd timer to wait for another path to the device to appear. If this timer expires while the initramfs is cleaning up to pivot to the regular filesystem during boot, it will restart multipathd, which will stop systemd from cleaning up the initramfs.
Consequence: systems can hang booting into anaconda during installation, if storage devices appear late enough in the initramfs portion of the bootup.
Fix: The systemd timers now conflict with initramfs cleanup, so they will automatically get stopped when the system cleans up to pivot to the regular file system. They also no longer restart multipathd if it has stopped running
Result: Systems no longer hang while booting into anaconda for installation.
+++ This bug was initially created as a clone of Bug #1916168 +++
Description of problem:
Reserve a server failed and system drop into emergency mode, from the console log found the system hung at "Started cancel waiting for multipath siblings of".
Version-Release number of selected component (if applicable):
RHEL-8.4.0-20210114.n.0 BaseOS x86_64
How reproducible:
100%
Steps to Reproduce:
1. install OS to server
2.
3.
Actual results:
install failed
Expected results:
install successful
Additional info:
] Started Open-iSCSI.
Starting dracut initqueue hook...
[
OK
] Started Create Volatile Files and Directories.
[
OK
] Reached target System Initialization.
[
OK
] Reached target Basic System.
[
OK
] Started cancel waiting for multipath siblings of nvme0n1.
[
OK
] Started cancel waiting for multipath siblings of sdh.
[
OK
] Started cancel waiting for multipath siblings of sde.
[
OK
] Started cancel waiting for multipath siblings of sdg.
[
OK
] Started cancel waiting for multipath siblings of sdd.
[
OK
] Started cancel waiting for multipath siblings of sdf.
[
OK
] Started cancel waiting for multipath siblings of sdb.
[
OK
] Started cancel waiting for multipath siblings of sda.
[
OK
] Started cancel waiting for multipath siblings of sdc.
[
OK
] Started cancel waiting for multipath siblings of nvme0n1.
[
OK
] Started cancel waiting for multipath siblings of sdd.
[
OK
] Started cancel waiting for multipath siblings of sdg.
[
OK
] Started cancel waiting for multipath siblings of sde.
[
OK
] Started cancel waiting for multipath siblings of sdh.
[
OK
] Started cancel waiting for multipath siblings of sda.
[
OK
] Started cancel waiting for multipath siblings of sdb.
[
OK
] Started cancel waiting for multipath siblings of sdf.
[
OK
] Started cancel waiting for multipath siblings of sdc.
[-- MARK -- Thu Jan 14 10:30:00 2021]
[-- MARK -- Thu Jan 14 10:35:00 2021]
[-- MARK -- Thu Jan 14 10:40:00 2021]
[-- MARK -- Thu Jan 14 10:45:00 2021]
[-- MARK -- Thu Jan 14 10:50:00 2021]
[-- MARK -- Thu Jan 14 10:55:00 2021]
[ 1632.404021] dracut-initqueue[1029]: Warning: dracut-initqueue timeout - starting timeout scripts
[ 1639.128347] dracut-initqueue[1029]: Warning: dracut-initqueue timeout - starting timeout scripts
[ 1645.834957] dracut-initqueue[1029]: Warning: dracut-initqueue timeout - starting timeout scripts
[ 1652.531923] dracut-initqueue[1029]: Warning: dracut-initqueue timeout - starting timeout scripts
[ 1659.227969] dracut-initqueue[1029]: Warning: dracut-initqueue timeout - starting timeout scripts
https://beaker.engineering.redhat.com/recipes/9390807#tasks
< comments trimmed >
--- Additional comment from Ben Marzinski on 2022-07-18 18:54:22 UTC ---
The "Started waiting (...) siblings of sda" messages should have gone away if they booted with nompath. Were they? "nompath" should disable both multipathd and multipath path claiming. It should pretty much completely disable multipath during that boot. So if the issue still exists when the node is booted with that commandline option, then it very likely has nothing to do with multipath.
--- Additional comment from Ben Marzinski on 2022-08-19 22:31:15 UTC ---
So, in another bug that is likely a duplicate, Bug 2059813, booting with "inst.nompath" still hangs, but booting with "nompath" does not. This makes sense, since "nompath" will take effect in the initramfs where the bug is, but "inst.nompath" will only effect anaconda, which is never reached because of the bug.
--- Additional comment from Ben Marzinski on 2022-08-19 22:49:27 UTC ---
After looking into Bug 2059813, which is likely a duplicate, it seems likely that this is a multipath issue. The problem is that when multipath is configured with find_multipaths "smart", which it is when booting into anaoconda, mulitpath creates systemd timers to wait for possible siblings of path devices. If these timers expire after the intramfs starts cleaning up, they restart multipathd, which conflicts with the initramfs cleanup, and causes it to stop. The solution is to make the timers themselves conflict with the initramfs cleanup, so they will be stopped when cleanup starts. Also even if they trigger, they will no longer start up multipathd.
To verify that this is actually the problem, could you try booting with:
https://fedorapeople.org/groups/anaconda/rhbz2059813/boot.2059813.iso
instead of your regular installation iso. This boot iso won't actually be able to install a system, since it doesn't contain any of the necessary installation sources. It will just boot you into anaconda. But since this iso has the multipath fix, you should be able to successfully boot into anaconda, without hanging in the initramfs.
Created attachment 1907478[details]
Patch to fix the hang.
This is the patch from the test iso that fixes the issue. When multipath is configured with find_multipaths "smart" (which it is in the installer boot initramfs) it waits to see if multiple paths will appear for devices. It sets systemd timers to stop this waiting. If these timers triggered while the initramfs was cleaning up to pivot to the actual root filesystem, they would restart multipathd, which would cause the cleanup to hang. The fix makes the timers conflict with initrd-cleanup.service, so that they get disabled when the initramfs starts cleaning up. Also, they no longer force multipathd to restart if it has already been stopped.
A test iso with a patch to resolve this issue is available here:
https://people.redhat.com/bmarzins/isos/bz2121277/rhel-9.1-patched-boot.iso
Can you try booting with this iso instead of your regular installation iso. This boot iso won't actually be able to install a system, since it doesn't contain any of the necessary installation sources. It will just boot you into anaconda. But since it has the multipath fix, you should be able to successfully boot into anaconda, without hanging in the initramfs.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (device-mapper-multipath bug fix and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2022:8313