Bug 1700451
Summary: | Booting with a large number of multipath devices drops into emergency shell | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Ben Marzinski <bmarzins> | |
Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | |
Status: | CLOSED ERRATA | QA Contact: | Lin Li <lilin> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | --- | CC: | agk, bmarson, bmarzins, heinzm, jbrassow, lilin, msnitzer, prajnoha, rhandlin, rpeterso, toneata, ttracy, zkabelac | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | 8.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | device-mapper-multipath-0.8.0-2.el8 | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
When multipath is determining whether it should claim a block device as a path device in udev, it checks if multipathd is running by openning a socket connection to it. If multipathd hasn't started up yet adn there are a large number of block devics, this can hang, causing udev to hang as well.
Consequence:
udev processing for block devices can be delayed on bootup, possibly causing the bootup to fail to the emergency shell.
Fix:
multipath now tries to connect to the multipathd socket in a nonblocking manner. If that fails, it looks at the error to determine if multipathd will be starting up.
Result:
multipath no longer causes udev processing of block devices to hang in setups with a large number of block devices.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1723746 (view as bug list) | Environment: | ||
Last Closed: | 2019-11-05 22:18:16 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1723746 |
Description
Ben Marzinski
2019-04-16 15:20:04 UTC
Think there is a more fundamental issue for this particular testbed: the root volume is _not_ using multipathing; so the initramfs shouldn't even be doing anything related to multipath. The initramfs isn't doing anything with multipathing. But the /home directory isn't being mounted in the initramfs, and after the pivot root, the multipath stall waiting for multipathd on all the other devices is causing the local device used for /home to not get re-initialized in time. Now, it does seem that since the local device is currently there with active LVs already using it for the root directory and /boot, that udev should know about it and not need to recheck it before it can mount /home. But regardless, multipath shouldn't have to wait for multipathd to get started before it can even begin to check if a device should be claimed as a multipath path. The intention of the code was not to wait, it's just that the socket autoactivation makes this happen. I have a patch that checks if multipathd is running without accessing the socket, so that it doesn't trigger this. The other solution is to drop the autoactivation, since multipathd should always be running. A different solution was agreed upon upstream, so the test packages do not reflect the actual solution. Instead of not accessing the multipathd socket, the multipath -u command now tries to open it non-blocking, and on failure checks the error code to see if multipathd will be starting up later. Hello Barry, Could you provide test result following up your steps with fixed version ? I will reproduce this issue with a large number of scsi_debug devices. Thanks in advance! Hello Barry, Could you provide me steps to reproduce? I want to reproduce it using your steps. Thanks in advance! I simply have a large number of multipath devices. In my case, there are 48x8(multipath) LUNS. Upon booting, a local volume group does not properly initialize and we drop into maintenance mode Barry Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3578 |