RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1920795 - multipathd deadlocks with marginal paths
Summary: multipathd deadlocks with marginal paths
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: device-mapper-multipath
Version: 8.3
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 8.4
Assignee: Ben Marzinski
QA Contact: Lin Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-27 03:22 UTC by Ben Marzinski
Modified: 2021-09-06 15:19 UTC (History)
7 users (show)

Fixed In Version: device-mapper-multipath-0.8.4-8.el8
Doc Type: Bug Fix
Doc Text:
Cause: The marginal path monitoring thread grabs locks in the wrong order when updating a path's state Consequence: multipathd can deadlock if marginal path detection is enabled. Fix: The marginal path monitoring thread now grabs locks in the proper order Result: multipathd no longer can deadlock when the marginal path detection code is enabled.
Clone Of:
Environment:
Last Closed: 2021-05-18 15:06:56 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ben Marzinski 2021-01-27 03:22:53 UTC
Description of problem:
It's possible for multipathd to deadlock if the marginal_path_* shaky path detection is enabled. This happens because the marginal path io stats thread and
the path checker thread grab locks in different orders.  It is also possible for
the marginal path io stats thread to cause multipathd to crash on shutdown 

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.8.4-7.el8

How reproducible:
Very hard. I've seen it once.

Steps to Reproduce:
1. Configure marginal_path_* shaky path detection

defaults {
        marginal_path_double_failed_time 60
        marginal_path_err_sample_time 120
        marginal_path_err_rate_threshold 1
        marginal_path_err_recheck_gap_time 10
}

2. fail and restore a multipath device twice, within one minute

# echo offline > /sys/block/<device>/device/state

wait till multipathd notces the path has failed

# echo running > /sys/block/<device>/device/state

wait till multipathd has restored the path

# echo offline > /sys/block/<device>/device/state

wait till multipathd has restored the path
# echo running > /sys/block/<device>/device/state

3. Continue step 2 on multiple different devices, hoping that the checker loop will try to add the device to the list of shaky devices at the same time as a device is being removed.

Actual results:
multipathd deadlocks, no longer checks paths, and becomes unresponsive

Expected results:
multipathd does not deadlock.

Additional info:
The shutdown crash was found through code inspection while looking into this issue. I'm not sure if it even can be produced in a real world scenario.

Comment 2 Ben Marzinski 2021-02-05 16:24:03 UTC
Fixed the deadlock, and a related potential crash on shutdown.

Comment 11 errata-xmlrpc 2021-05-18 15:06:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (device-mapper-multipath bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:1685


Note You need to log in before you can comment on or make changes to this bug.