Bug 156280
Summary: | multipath-tools tests active paths but never uses the status to fail them | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Lars Marowsky-Bree <lmb> |
Component: | device-mapper-multipath | Assignee: | Alasdair Kergon <agk> |
Status: | CLOSED RAWHIDE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | rawhide | CC: | agk, christophe.varoqui, dmo, egoggin, lmb, tranlan |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-09-04 23:05:59 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Lars Marowsky-Bree
2005-04-28 16:27:23 UTC
Candidate fix in 0.4.5-pre2 Please confirm the new behaviour is what is expected. Hm, I just tried 0.4.5-pre2, but it doesn't appear fixed... After doing a switch port disable, multipathd path checker detects the 2 paths are down, but the internel dm path state is still 'active'. I would expect it to go to 'failed'. 1IBM 2105 739FCA30 [size=953 MB][features="1 queue_if_no_path"][hwhandler="0"] \_ round-robin 0 [enabled][first] \_ 1:0:0:1 sdbn 68:16 [ready ][active] \_ 1:0:1:1 sdbv 68:144 [ready ][active] \_ 0:0:0:1 sdb 8:16 [faulty][active] \_ 0:0:1:1 sdj 8:144 [faulty][active] The framework is in place, certainly needs debugging now : multipathd/main.c:checkerloop() calls fail_path(), which calldm_fail_path() upon path going down events. I just verified the log received the "checker failed path %s in map %s" message when removing a path through sysfs. If you don't beat me to it, I'll see what I can do tomorrow. Hi Christophe, It turns out that dm_fail_path() is never being called in my setup because the check for !pp->mpp always fails. Not sure why you removed the intial multipath reconfiguration from multipathd (in patch below), because otherwise, the multipath maps are not created. And as you removed the signal handling, moving to uevents I believe, I'm not quite sure how the multipath daemon's allpaths gets updated when multipath is run. It looks like uevent is triggered only when removing/adding underlying sd devices? I also get this odd behavior that multipathd process will just die on me if I try to restart it when there are already maps configured. Not sure why, as I see no debug messages. (BTW, I was running 0.4.5-pre2 on the RHEL4 U1 beta1 kernel.) --- multipath-tools-0.4.5-pre2/multipathd/main.c 2005-04-28 16:52:56.000000000 -0700 +++ multipath-tools-0.4.5-pre2-patched/multipathd/main.c 2005-05-03 05:50:47.203453424 -0700 @@ -468,7 +471,7 @@ } log_safe(LOG_NOTICE, "initial reconfigure multipath maps"); -// execute_program(conf->multipath, buff, 1); + execute_program(conf->multipath, buff, 1); while (1) { > And as you removed the signal handling, moving to uevents I believe,
> I'm not quite sure how the multipath daemon's allpaths gets updated when
> multipath is run. It looks like uevent is triggered only when
> removing/adding underlying sd devices?
multipath is run from hotplug/udev, and only for "add"s.
For each hotplug "add" event, the daemon will receive a "add" uevent.
The signal thing was safe to kill.
As for the initial multipath run, if your hotplug/udev setup is right, you don't
need it : the maps should already be configured when starting the daemon.
Even if your setup has multipath.dev disabled, you'd better put the multipath
run either in intrd or in multipathd startup script.
Now, multipathd dying on you needs to be fixed. But as far as I can see the
design directions are right.
Commit http://kernel.org/git/gitweb.cgi?p=linux/storage/multipath-tools/.git;a=commit;h=a6e90b703b6c9a9a38d481a8fe4e82085ad247cc and http://kernel.org/git/gitweb.cgi?p=linux/storage/multipath-tools/.git;a=commit;h=e6b86e3673321852b68496d25d1a081f30d6d21b should fix it. My bad. MODIFIED means it should be fixed but we're awaiting confirmation of that. If there are no comments in a week or so, then we assume it is fixed and close the bug. If subsequently it's found not to have been fixed, then we simply reopen it. Using the multipath-tools git snapshot from May 16, 2005. Without any I/O running, I disabled a port (bringing down half the paths for each multipath device), and the state of the disabled paths were correctly put into '[faulty][failed]' state. However, I also just tried this on the May 26 git snapshot, and it doesn't work anymore. It seems that multipathd keeps dying whenever I try to start it up using RH4, U1's '/etc/init.d/multipathd start' script. Not sure what's going on. (I'm using RH4 U1 beta 2.6.9-9.ELsmp kernel.) >
> Instinctively I would say I messed the case where no "failback" keyword
> is provided in the config file, meaning the culprit is the last commit.
>
Just checked out the latest from the git repository and tried failing paths with
and without I/O running. The paths are failed and recovered as expected under
both scenarios. Thanks Christophe.
Closing as per previous comment. |