Bug 194411 - [RHEL4 U5] dm-multipath: multipath command fails when a path is added to a map with failed path.
[RHEL4 U5] dm-multipath: multipath command fails when a path is added to a ma...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
5.4
All Linux
high Severity medium
: beta
: ---
Assigned To: LVM and device-mapper development team
Cluster QE
: OtherQA
Depends On:
Blocks: 176344 198694 204573 234251 236328 487443
  Show dependency treegraph
 
Reported: 2006-06-07 17:17 EDT by Kiyoshi Ueda
Modified: 2010-01-11 21:35 EST (History)
21 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 487443 (view as bug list)
Environment:
Last Closed: 2009-02-25 20:05:55 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
proposed patch for multipath (929 bytes, patch)
2006-06-07 17:23 EDT, Kiyoshi Ueda
no flags Details | Diff
proposed patch for multipathd (2.46 KB, patch)
2006-06-07 17:24 EDT, Kiyoshi Ueda
no flags Details | Diff

  None (edit)
Description Kiyoshi Ueda 2006-06-07 17:17:57 EDT
Description of problem:
When multipathd(8) is running and a map has failed path,
multipath(8) for path addition into the map fails.


Version-Release number of selected component:
device-mapper-multipath-0.4.5-16.0.RHEL4


How reproducible:
Always


Steps to Reproduce:
 1. Prepare a storage which has more than 1 path.
    (e.g. /dev/sda and /dev/sdb are multipath.)
 2. Start multipathd.
      # /etc/init.d/multipathd start
 3. Remove one path.
      # echo 1 > /sys/block/sdb/device/delete
 4. Create a multipath map using remained path.
      # multipath
    (The multipath map should be consisted of only /dev/sda,
     for this example.)
 5. Make the remaind path in the map fail.
      # echo offline > /sys/block/sda/device/state
 6. Hot-add the removed path.
      # echo "scsi add-single-device <host> <channel> <bus> <lun>" \
        > /proc/scsi/scsi
 7. Run multipath to add the hot-added path to the map.
      # multipath


Actual results:
multipath command fails with the following message.
-------------------------------------------------------
device-mapper: reload ioctl failed: Invalid argument
-------------------------------------------------------


Expected results:
multipath command succeeds.


Additional info:
multipath(8) is trying to reload table which includes falied path
(in the case above, /dev/sda), and it is rejected by kernel.
The code path which the table includes failed path is:
    main()
      -> configure()
        -> cache_load()
        -> path_discovery()
        -> get_dm_mpvec()
          -> disassemble_map()
        -> coalesce_paths()
wwid of failed path (/dev/sda) is loaded in cache_load() and
it is removed once in path_discovery().  But in disassemble_map(),
it is copied from mpp->wwid again.
Therefore, the failed path (/dev/sda) is used in coalesce_paths().

By the way, if this bug is fixed, path addition will cause
the failed path being removed from existing multipath map
(silently and automatically by hotplug script).
So, even when the failed path comes back online, it will not be
a part of multipath map any longer.
This could be seen as regression from users.
So these problems above must be fixed at a time.


Proposed fix for multipath:
Exclude the failed path from the table in coalesce_paths().


Proposed fix for multipathd:
Monitor the failed path, even if the failed path isn't included
in any map, if wwid of the failed path is same as wwid of a map
which is monitored. (This behavior is already implemented.)
And when the failed path becomes online, fork() and exec() multipath(8).
Comment 1 Kiyoshi Ueda 2006-06-07 17:23:09 EDT
Created attachment 130709 [details]
proposed patch for multipath
Comment 2 Kiyoshi Ueda 2006-06-07 17:24:21 EDT
Created attachment 130710 [details]
proposed patch for multipathd
Comment 3 RHEL Product and Program Management 2006-08-18 11:36:40 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 5 Ben Marzinski 2006-09-20 18:52:11 EDT
I'm not totally happy with this solution.
1. It makes multipathd exec multipath, and ideally we're trying to make
multipathd more and more self sufficient, and the multipath program more of just
a call in to it. This heads in the opposite direction.
2. More importantly, I don't think that failed paths should dissappear from the
map when you add new ones.

Alasdair, Is there a reason why the kernel cannot allow you to create a
multipath map with a failed path in it?

As a workaround, I belive that customers that wants to add a new path while
there is a failed one can kill multipathd, rerun multipath (without multipathd
running, multipath will do exactly what the patch causes. It will remove the
failed path, and add the new path), and start multipathd back up. Forcing this
sort of manual intervention will keep the customer from being surprised by
losing the path. It is pretty unsightly, I admit, and I'd rather just be able to
reload the map with the failed path.
Comment 6 Kiyoshi Ueda 2006-09-22 11:39:52 EDT
I completely agree with the Ben's comment#5.
Being able to reload a map with failed path is a nice idea,
but it is probably not preferred in the kernel side.

Though I still want this situation being handled automatically
by multipathd, if you can't fix it in RHEL4.5, please make sure to
include the documentation about the workaround either in release note
or man page.
Comment 10 RHEL Product and Program Management 2007-03-09 20:04:24 EST
This bugzilla had previously been approved for engineering
consideration but Red Hat Product Management is currently reevaluating
this issue for inclusion in RHEL4.6.
Comment 11 RHEL Product and Program Management 2007-05-09 06:11:43 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 12 Ben Marzinski 2007-08-02 22:28:23 EDT
This is not making 4.6
Comment 16 Suzanne Yeghiayan 2008-05-28 17:24:43 EDT
Unfortunately this bugzilla was not resolved in time for RHEL 4.7 Beta.
It has now been proposed for inclusion in RHEL 4.8 but must regain Product
Management approval.
Comment 18 RHEL Product and Program Management 2008-09-05 13:14:52 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 20 Tom Coughlan 2009-01-24 16:51:32 EST
The benefit associated with this fix does not outweigh the risk at this stage in the life of RHEL 4. I am moving this to RHEL 5.
Comment 23 Kiyoshi Ueda 2009-01-28 00:22:57 EST
Actually this problem can be seen on only RHEL4.
This is a design problem of multipathd(8) of RHEL4,
so I understand this problem isn't fixed in RHEL4.

But, there is a workaround of this problem.
If Red Hat doesn't fix this problem, I want Red Hat to
make some documents about the workaround for users.

So this bugzilla is for a documentation issue in RHEL4.
Please see Comment#5 and Comment#6 for details of
the workaround.
Comment 24 Ben Marzinski 2009-02-25 20:05:22 EST
As noted above, this problem is a RHEL 4 only issue.  I've cloned this bug to the RHEL4 bug 487443.

Note You need to log in before you can comment on or make changes to this bug.