RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1086417 - multipathd stuck trying to switch to group that does not exist
Summary: multipathd stuck trying to switch to group that does not exist
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: device-mapper-multipath
Version: 6.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: yanfu,wang
URL:
Whiteboard:
Depends On:
Blocks: 1110007
TreeView+ depends on / blocked
 
Reported: 2014-04-10 20:41 UTC by mchristie
Modified: 2019-12-16 04:29 UTC (History)
11 users (show)

Fixed In Version: device-mapper-multipath-0.4.9-76.el6
Doc Type: Bug Fix
Doc Text:
Cause: If multipathd failed to add a path to the multipath device table. It would not correctly orphan the path. This would cause multipath to treat the path as if it belonged to a multipath device, when it did not Consequence: Multipathd could keep trying to switch to a non-existent pathgroup if it failed to add a path to the multipath device. Fix: multipathd now correctly orphans paths that can't be added to the multipath device table Result: Multipathd will no longer treat paths that couldn't get added to a multipath device as belonging to the multipath device, and will no longer keep trying to switch to a non-existent pathgroup.
Clone Of:
: 1110007 (view as bug list)
Environment:
Last Closed: 2014-10-14 07:43:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1555 0 normal SHIPPED_LIVE device-mapper-multipath bug fix and enhancement update 2014-10-14 01:27:56 UTC

Description mchristie 2014-04-10 20:41:55 UTC
Description of problem:

multipathg gets in some state where it just keeps reporting this over and over:


Jan 17 10:23:21 ionr7c5 multipathd: DM message failed [switch_group 1#012]
Jan 17 10:23:21 ionr7c5 multipathd: mpathaaq: switch to path group #1


It seems the problem is that we lost all paths:

Jan 17 01:25:34 ionr7c5 multipathd: mpathaaq: load table [0 97656272
multipath 1 queue_if_no_path 1 alua 0 0]

multipathd then tried to add a path back

Jan 17 01:25:35 ionr7c5 multipathd: sdmg: add path (uevent)

This failed multiple times:

Jan 17 01:25:35 ionr7c5 kernel: device-mapper: table: 252:9: multipath:
error getting device
Jan 17 01:25:35 ionr7c5 kernel: device-mapper: ioctl: error adding
target to table
Jan 17 01:25:35 ionr7c5 multipathd: mpathaaq: failed in domap for
addition of new path sdmg
Jan 17 01:25:35 ionr7c5 multipathd: mpathaaq: uev_add_path sleep

multipathd gave up:

Jan 17 01:25:38 ionr7c5 multipathd: mpathaaq: giving up reload
Jan 17 01:25:38 ionr7c5 multipathd: mpathaaq: Entering recovery mode:
max_retries=3
Jan 17 01:25:38 ionr7c5 multipathd: uevent trigger error

I think then multipathd had already internally setup some paths, so the
path tester kept running and it later tried to switch groups. That then
failed, because the table was never loaded into the kernel above.
multipathd then retried this forever, because the path tester kept
telling it the path was ok.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Remove FC for longer than dev_loss_tmo seconds.
2. Plug cable back in.
3. Repeat.

Actual results:


Expected results:


Additional info:

This is difficult to reproduce. Trying to replicate with extra logging on.

User will run with multipath tools with debugging code added if you want to send me a rpm with extra debug code added.

Comment 2 Ben Marzinski 2014-05-07 03:27:40 UTC
It's pretty straightforward to make multipathd correctly orphan a path, if it fails in adding it.  That will stop the repeating failing attempts to switch the path group.  However, that doesn't explain the root cause of your issue, which is device-mapper not being able to get the path device. It seems that most likely the cause is that something else was exclusively opening the path device.  When this occurs, does /sys/block/<devname>/holders show anything? or does fuser or lsof show the device in use? 

If it doesn't appear that the device is currently opened exclusively, you could try removing and re-adding it to see if device-mapper is now able to get the device.

# multipathd del path <devname>
# multipathd add path <devname>

Comment 3 Ben Marzinski 2014-05-23 02:29:27 UTC
Multipath will now correctly orphan paths that don't get added correctly.

Comment 5 Ben Marzinski 2014-07-29 18:20:49 UTC
The easiest way to hit this is to

1. create a multipath device
2. remove a path with

# multipathd del path <name>

3. exclusively open the path device, by doing something like mounting a filesystem on it
4. add the path back to multipath with

# multipath add path <name>

The add will fail.  With this fix the device will be orphaned.  Without it, the device will still be listed as part of the multipath device, and the checker will
keep running on it.

Comment 9 errata-xmlrpc 2014-10-14 07:43:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1555.html


Note You need to log in before you can comment on or make changes to this bug.