Description of problem: When multipathd(8) is running and a map has failed path, multipath(8) for path addition into the map fails. Version-Release number of selected component: device-mapper-multipath-0.4.5-16.0.RHEL4 How reproducible: Always Steps to Reproduce: 1. Prepare a storage which has more than 1 path. (e.g. /dev/sda and /dev/sdb are multipath.) 2. Start multipathd. # /etc/init.d/multipathd start 3. Remove one path. # echo 1 > /sys/block/sdb/device/delete 4. Create a multipath map using remained path. # multipath (The multipath map should be consisted of only /dev/sda, for this example.) 5. Make the remaind path in the map fail. # echo offline > /sys/block/sda/device/state 6. Hot-add the removed path. # echo "scsi add-single-device <host> <channel> <bus> <lun>" \ > /proc/scsi/scsi 7. Run multipath to add the hot-added path to the map. # multipath Actual results: multipath command fails with the following message. ------------------------------------------------------- device-mapper: reload ioctl failed: Invalid argument ------------------------------------------------------- Expected results: multipath command succeeds. Additional info: multipath(8) is trying to reload table which includes falied path (in the case above, /dev/sda), and it is rejected by kernel. The code path which the table includes failed path is: main() -> configure() -> cache_load() -> path_discovery() -> get_dm_mpvec() -> disassemble_map() -> coalesce_paths() wwid of failed path (/dev/sda) is loaded in cache_load() and it is removed once in path_discovery(). But in disassemble_map(), it is copied from mpp->wwid again. Therefore, the failed path (/dev/sda) is used in coalesce_paths(). By the way, if this bug is fixed, path addition will cause the failed path being removed from existing multipath map (silently and automatically by hotplug script). So, even when the failed path comes back online, it will not be a part of multipath map any longer. This could be seen as regression from users. So these problems above must be fixed at a time. Proposed fix for multipath: Exclude the failed path from the table in coalesce_paths(). Proposed fix for multipathd: Monitor the failed path, even if the failed path isn't included in any map, if wwid of the failed path is same as wwid of a map which is monitored. (This behavior is already implemented.) And when the failed path becomes online, fork() and exec() multipath(8).
Created attachment 130709 [details] proposed patch for multipath
Created attachment 130710 [details] proposed patch for multipathd
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
I'm not totally happy with this solution. 1. It makes multipathd exec multipath, and ideally we're trying to make multipathd more and more self sufficient, and the multipath program more of just a call in to it. This heads in the opposite direction. 2. More importantly, I don't think that failed paths should dissappear from the map when you add new ones. Alasdair, Is there a reason why the kernel cannot allow you to create a multipath map with a failed path in it? As a workaround, I belive that customers that wants to add a new path while there is a failed one can kill multipathd, rerun multipath (without multipathd running, multipath will do exactly what the patch causes. It will remove the failed path, and add the new path), and start multipathd back up. Forcing this sort of manual intervention will keep the customer from being surprised by losing the path. It is pretty unsightly, I admit, and I'd rather just be able to reload the map with the failed path.
I completely agree with the Ben's comment#5. Being able to reload a map with failed path is a nice idea, but it is probably not preferred in the kernel side. Though I still want this situation being handled automatically by multipathd, if you can't fix it in RHEL4.5, please make sure to include the documentation about the workaround either in release note or man page.
This bugzilla had previously been approved for engineering consideration but Red Hat Product Management is currently reevaluating this issue for inclusion in RHEL4.6.
This is not making 4.6
Unfortunately this bugzilla was not resolved in time for RHEL 4.7 Beta. It has now been proposed for inclusion in RHEL 4.8 but must regain Product Management approval.
The benefit associated with this fix does not outweigh the risk at this stage in the life of RHEL 4. I am moving this to RHEL 5.
Actually this problem can be seen on only RHEL4. This is a design problem of multipathd(8) of RHEL4, so I understand this problem isn't fixed in RHEL4. But, there is a workaround of this problem. If Red Hat doesn't fix this problem, I want Red Hat to make some documents about the workaround for users. So this bugzilla is for a documentation issue in RHEL4. Please see Comment#5 and Comment#6 for details of the workaround.
As noted above, this problem is a RHEL 4 only issue. I've cloned this bug to the RHEL4 bug 487443.