RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2121277 - [RHEL9.1] system hung at Started cancel waiting for multipath siblings of x
Summary: [RHEL9.1] system hung at Started cancel waiting for multipath siblings of x
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: device-mapper-multipath
Version: 9.0
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Lin Li
URL:
Whiteboard:
: 2123372 2123663 (view as bug list)
Depends On: 1916168
Blocks: 1997272 1916117 1934584 1965064 1997257 2024217 2123372
TreeView+ depends on / blocked
 
Reported: 2022-08-25 03:27 UTC by Ben Marzinski
Modified: 2022-11-15 13:04 UTC (History)
35 users (show)

Fixed In Version: device-mapper-multipath-0.8.7-12.el9
Doc Type: Bug Fix
Doc Text:
Cause: When multipath is configured with "find_multipaths smart" (which it is when booting into anaconda) and a new storage device appears, it starts a systemd timer to wait for another path to the device to appear. If this timer expires while the initramfs is cleaning up to pivot to the regular filesystem during boot, it will restart multipathd, which will stop systemd from cleaning up the initramfs. Consequence: systems can hang booting into anaconda during installation, if storage devices appear late enough in the initramfs portion of the bootup. Fix: The systemd timers now conflict with initramfs cleanup, so they will automatically get stopped when the system cleans up to pivot to the regular file system. They also no longer restart multipathd if it has stopped running Result: Systems no longer hang while booting into anaconda for installation.
Clone Of: 1916168
: 2123372 (view as bug list)
Environment:
Last Closed: 2022-11-15 11:16:24 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to fix the hang. (1.61 KB, patch)
2022-08-25 04:03 UTC, Ben Marzinski
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-132276 0 None None None 2022-08-25 03:34:57 UTC
Red Hat Product Errata RHBA-2022:8313 0 None None None 2022-11-15 11:16:41 UTC

Description Ben Marzinski 2022-08-25 03:27:38 UTC
+++ This bug was initially created as a clone of Bug #1916168 +++

Description of problem:
Reserve a server failed and system drop into emergency mode, from the console log found the system hung at "Started cancel waiting for multipath siblings of".

Version-Release number of selected component (if applicable):
RHEL-8.4.0-20210114.n.0 BaseOS x86_64

How reproducible:
100%

Steps to Reproduce:
1. install OS to server 
2.
3.

Actual results:
install failed 

Expected results:
install successful

Additional info:


] Started Open-iSCSI.  
         Starting dracut initqueue hook...  
[      
  OK     
] Started Create Volatile Files and Directories.  
[      
  OK     
] Reached target System Initialization.  
[      
  OK     
] Reached target Basic System.  
[      
  OK     
] Started cancel waiting for multipath siblings of nvme0n1.  
[      
  OK     
] Started cancel waiting for multipath siblings of sdh.  
[      
  OK     
] Started cancel waiting for multipath siblings of sde.  
[      
  OK     
] Started cancel waiting for multipath siblings of sdg.  
[      
  OK     
] Started cancel waiting for multipath siblings of sdd.  
[      
  OK     
] Started cancel waiting for multipath siblings of sdf.  
[      
  OK     
] Started cancel waiting for multipath siblings of sdb.  
[      
  OK     
] Started cancel waiting for multipath siblings of sda.  
[      
  OK     
] Started cancel waiting for multipath siblings of sdc.  
[      
  OK     
] Started cancel waiting for multipath siblings of nvme0n1.  
[      
  OK     
] Started cancel waiting for multipath siblings of sdd.  
[      
  OK     
] Started cancel waiting for multipath siblings of sdg.  
[      
  OK     
] Started cancel waiting for multipath siblings of sde.  
[      
  OK     
] Started cancel waiting for multipath siblings of sdh.  
[      
  OK     
] Started cancel waiting for multipath siblings of sda.  
[      
  OK     
] Started cancel waiting for multipath siblings of sdb.  
[      
  OK     
] Started cancel waiting for multipath siblings of sdf.  
[      
  OK     
] Started cancel waiting for multipath siblings of sdc.  
[-- MARK -- Thu Jan 14 10:30:00 2021] 
[-- MARK -- Thu Jan 14 10:35:00 2021] 
[-- MARK -- Thu Jan 14 10:40:00 2021] 
[-- MARK -- Thu Jan 14 10:45:00 2021] 
[-- MARK -- Thu Jan 14 10:50:00 2021] 
[-- MARK -- Thu Jan 14 10:55:00 2021] 
[ 1632.404021] dracut-initqueue[1029]: Warning: dracut-initqueue timeout - starting timeout scripts  
[ 1639.128347] dracut-initqueue[1029]: Warning: dracut-initqueue timeout - starting timeout scripts  
[ 1645.834957] dracut-initqueue[1029]: Warning: dracut-initqueue timeout - starting timeout scripts  
[ 1652.531923] dracut-initqueue[1029]: Warning: dracut-initqueue timeout - starting timeout scripts  
[ 1659.227969] dracut-initqueue[1029]: Warning: dracut-initqueue timeout - starting timeout scripts  


https://beaker.engineering.redhat.com/recipes/9390807#tasks


< comments trimmed >

--- Additional comment from Ben Marzinski on 2022-07-18 18:54:22 UTC ---

The "Started waiting (...) siblings of sda" messages should have gone away if they booted with nompath. Were they?  "nompath" should disable both multipathd and multipath path claiming.  It should pretty much completely disable multipath during that boot. So if the issue still exists when the node is booted with that commandline option, then it very likely has nothing to do with multipath.

--- Additional comment from Ben Marzinski on 2022-08-19 22:31:15 UTC ---

So, in another bug that is likely a duplicate, Bug 2059813, booting with "inst.nompath" still hangs, but booting with "nompath" does not. This makes sense, since "nompath" will take effect in the initramfs where the bug is, but "inst.nompath" will only effect anaconda, which is never reached because of the bug.

--- Additional comment from Ben Marzinski on 2022-08-19 22:49:27 UTC ---

After looking into Bug 2059813, which is likely a duplicate, it seems likely that this is a multipath issue. The problem is that when multipath is configured with find_multipaths "smart", which it is when booting into anaoconda, mulitpath creates systemd timers to wait for possible siblings of path devices.  If these timers expire after the intramfs starts cleaning up, they restart multipathd, which conflicts with the initramfs cleanup, and causes it to stop. The solution is to make the timers themselves conflict with the initramfs cleanup, so they will be stopped when cleanup starts. Also even if they trigger, they will no longer start up multipathd.

To verify that this is actually the problem, could you try booting with:

https://fedorapeople.org/groups/anaconda/rhbz2059813/boot.2059813.iso

instead of your regular installation iso.  This boot iso won't actually be able to install a system, since it doesn't contain any of the necessary installation sources. It will just boot you into anaconda. But since this iso has the multipath fix, you should be able to successfully boot into anaconda, without hanging in the initramfs.

Comment 3 Ben Marzinski 2022-08-25 04:03:27 UTC
Created attachment 1907478 [details]
Patch to fix the hang.

This is the patch from the test iso that fixes the issue. When multipath is configured with find_multipaths "smart" (which it is in the installer boot initramfs) it waits to see if multiple paths will appear for devices. It sets systemd timers to stop this waiting. If these timers triggered while the initramfs was cleaning up to pivot to the actual root filesystem, they would restart multipathd, which would cause the cleanup to hang.  The fix makes the timers conflict with initrd-cleanup.service, so that they get disabled when the initramfs starts cleaning up.  Also, they no longer force multipathd to restart if it has already been stopped.

Comment 8 Ben Marzinski 2022-09-02 14:51:13 UTC
*** Bug 2123663 has been marked as a duplicate of this bug. ***

Comment 9 Ben Marzinski 2022-09-02 15:24:29 UTC
*** Bug 2123372 has been marked as a duplicate of this bug. ***

Comment 10 Ben Marzinski 2022-09-02 23:22:54 UTC
A test iso with a patch to resolve this issue is available here:

https://people.redhat.com/bmarzins/isos/bz2121277/rhel-9.1-patched-boot.iso

Can you try booting with this iso instead of your regular installation iso. This boot iso won't actually be able to install a system, since it doesn't contain any of the necessary installation sources. It will just boot you into anaconda. But since it has the multipath fix, you should be able to successfully boot into anaconda, without hanging in the initramfs.

Comment 21 errata-xmlrpc 2022-11-15 11:16:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (device-mapper-multipath bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8313


Note You need to log in before you can comment on or make changes to this bug.