Bug 156280

Summary:	multipath-tools tests active paths but never uses the status to fail them
Product:	[Fedora] Fedora	Reporter:	Lars Marowsky-Bree <lmb>
Component:	device-mapper-multipath	Assignee:	Alasdair Kergon <agk>
Status:	CLOSED RAWHIDE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	rawhide	CC:	agk, christophe.varoqui, dmo, egoggin, lmb, tranlan
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2005-09-04 23:05:59 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Lars Marowsky-Bree 2005-04-28 16:27:23 UTC

multipath-tools does check all paths. But when it finds that a path previously
active has failed, it never tells the kernel about it.

(Reported by Edward.)

Comment 1 Christophe Varoqui 2005-05-02 21:08:57 UTC

Candidate fix in 0.4.5-pre2
Please confirm the new behaviour is what is expected.

Comment 2 Lan Tran 2005-05-02 22:33:49 UTC

Hm, I just tried 0.4.5-pre2, but it doesn't appear fixed... 

After doing a switch port disable, multipathd path checker detects the 2 paths 
are down, but the internel dm path state is  still 'active'. I would expect it
to go to 'failed'. 

1IBM     2105            739FCA30
[size=953 MB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
  \_ 1:0:0:1 sdbn 68:16   [ready ][active]
  \_ 1:0:1:1 sdbv 68:144  [ready ][active]
  \_ 0:0:0:1 sdb  8:16    [faulty][active]
  \_ 0:0:1:1 sdj  8:144   [faulty][active]

Comment 3 Christophe Varoqui 2005-05-02 22:46:58 UTC

The framework is in place, certainly needs debugging now :

multipathd/main.c:checkerloop() calls fail_path(), which calldm_fail_path() upon
path going down events.

I just verified the log received the "checker failed path %s in map %s" message
when removing a path through sysfs.

If you don't beat me to it, I'll see what I can do tomorrow.

Comment 4 Lan Tran 2005-05-03 13:08:40 UTC

Hi Christophe, 

It turns out that dm_fail_path() is never being called in my setup because the
check for !pp->mpp always fails. Not sure why you removed the intial multipath
reconfiguration from multipathd (in patch below), because otherwise, the
multipath maps are not created. And as you removed the signal handling, moving
to uevents I believe, I'm not quite sure how the multipath daemon's allpaths
gets updated when multipath is run. It looks like uevent is triggered only when
removing/adding underlying sd devices?  

I also get this odd behavior that multipathd process will just die on me if I
try to restart it when there are already maps configured. Not sure why, as I see
no debug messages. 

(BTW, I was running 0.4.5-pre2 on the RHEL4 U1 beta1 kernel.) 

--- multipath-tools-0.4.5-pre2/multipathd/main.c        2005-04-28
16:52:56.000000000 -0700
+++ multipath-tools-0.4.5-pre2-patched/multipathd/main.c        2005-05-03
05:50:47.203453424 -0700
@@ -468,7 +471,7 @@
        }
                                                                               
                 
        log_safe(LOG_NOTICE, "initial reconfigure multipath maps");
-//     execute_program(conf->multipath, buff, 1);
+       execute_program(conf->multipath, buff, 1);
                                                                               
                 
        while (1) {

Comment 5 Christophe Varoqui 2005-05-03 16:18:21 UTC

> And as you removed the signal handling, moving to uevents I believe,
> I'm not quite sure how the multipath daemon's allpaths gets updated when
> multipath is run. It looks like uevent is triggered only when
> removing/adding underlying sd devices?

multipath is run from hotplug/udev, and only for "add"s.
For each hotplug "add" event, the daemon will receive a "add" uevent.
The signal thing was safe to kill.

As for the initial multipath run, if your hotplug/udev setup is right, you don't
need it : the maps should already be configured when starting the daemon.

Even if your setup has multipath.dev disabled, you'd better put the multipath
run either in intrd or in multipathd startup script.

Now, multipathd dying on you needs to be fixed. But as far as I can see the
design directions are right.

Comment 6 Christophe Varoqui 2005-05-04 21:17:25 UTC

Commit
http://kernel.org/git/gitweb.cgi?p=linux/storage/multipath-tools/.git;a=commit;h=a6e90b703b6c9a9a38d481a8fe4e82085ad247cc
and
http://kernel.org/git/gitweb.cgi?p=linux/storage/multipath-tools/.git;a=commit;h=e6b86e3673321852b68496d25d1a081f30d6d21b
should fix it.

My bad.

Comment 7 Alasdair Kergon 2005-05-05 21:46:44 UTC

MODIFIED means it should be fixed but we're awaiting confirmation of that.
If there are no comments in a week or so, then we assume it is fixed and close
the bug.  If subsequently it's found not to have been fixed, then we simply
reopen it.

Comment 8 Lan Tran 2005-05-26 15:29:47 UTC

Using the multipath-tools git snapshot from May 16, 2005. 
Without any I/O running, I disabled a port (bringing down half the paths for
each multipath device), and the state of the disabled paths were correctly put
into '[faulty][failed]' state.

However, I also just tried this on the May 26 git snapshot, and it doesn't work
anymore. It seems that multipathd keeps dying whenever I try to start it  up
using RH4, U1's '/etc/init.d/multipathd start' script. Not sure what's going on.

(I'm using RH4 U1 beta 2.6.9-9.ELsmp kernel.)

Comment 9 Lan Tran 2005-05-27 01:07:55 UTC

> 
> Instinctively I would say I messed the case where no "failback" keyword
> is provided in the config file, meaning the culprit is the last commit.
> 

Just checked out the latest from the git repository and tried failing paths with
and without I/O running. The paths are failed and recovered as expected under
both scenarios. Thanks Christophe.

Comment 10 Rahul Sundaram 2005-09-04 23:05:59 UTC


Closing as per previous comment.