Bug 488921

Summary:	[NetApp- 4.8 bug] I/O Processes don't get killed when all the paths to the LUN are down
Product:	Red Hat Enterprise Linux 4	Reporter:	nandkumar mane <nandkumar.mane>
Component:	device-mapper-multipath	Assignee:	Ben Marzinski <bmarzins>
Status:	CLOSED ERRATA	QA Contact:	Gris Ge <fge>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.8	CC:	agk, andriusb, bdonahue, bmarzins, christophe.varoqui, coughlan, cward, ddomingo, dwysocha, egoggin, fge, heinzm, iannis, junichi.nomura, kueda, lmb, marting, mbroz, nandkumar.mane, prockai, rlerch, slevine, tranlan, xdl-redhat-bugzilla
Target Milestone:	rc	Keywords:	OtherQA
Target Release:	4.9
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	device-mapper-multipath-0.4.5-40.el4	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	419581	Environment:
Last Closed:	2011-02-16 14:24:09 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	419581
Bug Blocks:	626414

Description nandkumar mane 2009-03-06 07:56:54 UTC

+++ This bug was initially created as a clone of Bug #419581 +++

Description of problem:
I/O Processes (D state processes) don't get killed with `kill SIGKILL` when all the 
paths to the LUN are down, to which the I/O process is writing to. multipath is 
using "queue_if_no_path" option.

Version-Release number of selected component (if applicable):
RHEL5.1

How reproducible:
Very reproducible

Steps to Reproduce:
1. Map a LUN with multiple paths.
2. Make sure queue_if_no_path is enabled
3. Run I/O on the multipathed device.
4. Make all the paths to the multipathed device down
5. Now try killing the I/O process (say dd). 

The process doesn't get killed. This is a big problem because it holds 
shutdown/reboot process also. The only option left is to do a machine reset.
  
Actual results:
The I/O processes don't get killed at all.

Expected results:
When the I/O processes are sent a kill signal, the kernel should be able to kill the 
process.

--- Additional comment from coughlan on 2007-12-11 12:14:04 EDT ---

Ben, Alasdair, please confirm that I have the following right. If so, Stephen,
some guidance about this should go in the dm-multipath manual.

The D state process is in an uninterruptable sleep, waiting for I/O to complete.
The process is essentially in kernel mode, and can not be killed. 

The queue_if_no_path feature causes the kernel to retry the I/O forever, so the
process remains in D state forever. 

The solution is to set no_path_retry to a numeric value specifying the number of
times the system should retry, rather than setting queue_if_no_path. When the
I/O eventually fails, the killed process will die. 



--- Additional comment from bmarzins on 2007-12-11 14:47:19 EDT ---

Yes, this is correct. Unfortunately, in RHEL5, multipath has two ways of setting
how the multipath device will react when all paths have failed,
"queue_if_no_path" and "no_path_retry".  You should not use both at the same
time. If you are adding

no_path_retry <n>

to your configuration, you should make sure that it does not contain

features "1 queue_if_no_path"

If you need to know the existing configuration for your hardware, you can run
multipathd -k"show config" and find the devices section that matches your
vendor/model.



--- Additional comment from coughlan on 2007-12-19 18:07:17 EDT ---

We will add some documentation indicating that 

1) If the "queue_if_no_path" option is selected, then any process that issues
I/O will hang until one or more paths are restored. 

and

2) The two options no_path_retry and queue_if_no_path are mutially exclusive.

Netapp is that sufficient? 

--- Additional comment from rsarraf on 2007-12-20 04:32:47 EDT ---

Yes, that should be good to go.

--- Additional comment from rsarraf on 2008-01-28 09:39:04 EDT ---

Created an attachment (id=293157)
Document queue_if_no_path behavior

It would be good to document this behavior in the man pages.

--- Additional comment from rsarraf on 2008-01-28 09:43:58 EDT ---

The same patch was requested for inclusion upstream:
http://www.redhat.com/archives/dm-devel/2007-December/msg00157.html

--- Additional comment from andriusb on 2008-01-28 11:02:07 EDT ---

Reassigning to slevine per email.

--- Additional comment from ddomingo on 2008-01-28 19:04:33 EDT ---

added to RHEL5.2 release notes under "Known Issues":

<quote>
When using dm-multipath, if features "1 queue_if_no_path" is specified in
/etc/multipath.conf then any process that issues I/O will hang until one or more
paths are restored.

To avoid this, set no_path_retry <N> in /etc/multipath.conf (where <N> is the
number of times the system should retry a path). When you do so, ensure that you
remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
</quote>

please advise if any further revisions are required. thanks!

--- Additional comment from andriusb on 2008-02-11 09:33:43 EDT ---

Reassigning to Ben per Tom, so that Comment #8 (the patch for man page) is
backported into RHEL 5.3.

--- Additional comment from agk on 2008-02-11 09:47:07 EDT ---

"This is a big problem because it holds shutdown/reboot process also."

Has this been addressed?  i.e. Do the machine shutdown scripts handle any
queue_if_no_path settings correctly?


--- Additional comment from rsarraf on 2008-02-20 17:15:27 EDT ---

No. Upto 5.1 the behavior is still the same.

On shutdown, the shutdown sequence is put on hold because the I/O processes 
don't get killed.

--- Additional comment from coughlan on 2008-02-29 17:47:47 EDT ---

To recap:

We have a release note in 5.2, as shown in comment 10, warning users about the
current behavior of queue_if_no_path. 

Netapp has proposed the addition of simmilar text to the man page. This has been
posted upstream, and should then go into 5.3. 

Netapp is also requesting that we consider changing the behavior of
queue_if_no_path, so that I/O will continue on the last available path, rather
than removing that path and causing processes (and system shutdown) to hang. 

Ben is this feasible for 5.3? 

--- Additional comment from ddomingo on 2008-04-01 22:11:33 EDT ---

Hi,
the RHEL5.2 release notes will be dropped to translation on April 15, 2008, at
which point no further additions or revisions will be entertained.

a mockup of the RHEL5.2 release notes can be viewed at the following link:
http://intranet.corp.redhat.com/ic/intranet/RHEL5u2relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don

--- Additional comment from rsarraf on 2008-04-02 04:22:27 EDT ---

Ben/Alasdair,

Adding the following does partially solve the problem.

rrs@learner:~$ cat /tmp/kde-rrs/patch_netfs
--- netfs.orig  2008-03-21 18:19:59.000000000 -0400
+++ netfs       2008-03-27 06:04:32.000000000 -0400
@@ -97,6 +97,12 @@
   stop)
         # Unmount loopback stuff first
        __umount_loopback_loop
+
+       Devices=`dmsetup info | grep Name |  awk '{print $2}'`
+       for device in $Devices ; do
+               dmsetup message $device 0 "fail_if_no_path"
+       done
+
        if [ -n "$NETDEVMTAB" ]; then
                __umount_loop '$4 ~ /_netdev/ && $2 != "/" {print $2}' \
                        /etc/mtab \


If we execute the above when a LUN is in fail state, I/O processes to it will get 
terminated properly.

Adding the above as part of init script can overcome the problem by making 
sure to change the policy to "fail_if_no_path" before doing the unmount.

Then problem we are seeing currently is that when the shutdown command is 
executed, it too goes into the same wait mode. Which never allows runlevel 
6/0 to execute.

--- Additional comment from nandkumar.mane on 2008-04-04 05:54:57 EDT ---

shutdown command hangs because it calls sync() which stalls the entire 
shutdown process because for some of the devices, all the paths to them are 
down, and "queue_if_no_path" policy is set on them.

Again if we execute shutdown with the -n option, shutdown tries to kill all 
processes and gets hung while trying to kill 'D' State uninteruptible state 
processes which are pumping I/O to the LUN for which all the paths are down.

Here are two different solutions to solve these problems.

1) Changing policy of multipathd devices to "fail_if_no_path" in shutdown.c 
solves the issue.

--- sysvinit-2.86/src/shutdown.c.org    2008-04-04 02:41:33.000000000 -0400
+++ sysvinit-2.86/src/shutdown.c        2008-04-04 02:46:27.000000000 -0400
@@ -369,6 +369,10 @@
                hardsleep(1);
                stopit(0);
        }
+
+       syslog(LOG_NOTICE, "Changing multipathed device policy to 
fail_if_no_path");
+       system ("Devices=`dmsetup info | grep Name |  awk '{print $2}'`;for 
device in $Devices ; do dmsetup message $device 0 \"fail_if_no_path\";  done");
+
        openlog("shutdown", LOG_PID, LOG_USER);
        if (do_halt)
                syslog(LOG_NOTICE, "shutting down for system halt");


2) Remove sync() from shutdown.c and add policy change logic in netfs.
   Here, shutdown with -n option will still hang as shutdown will try to kill 
all processes.

--- sysvinit-2.86/src/shutdown.c.orig   2008-04-04 04:31:48.000000000 -0400
+++ sysvinit-2.86/src/shutdown.c        2008-04-04 04:32:12.000000000 -0400
@@ -392,7 +392,7 @@
        unlink(NOLOGIN);

        /* Now execute init to change runlevel. */
-       sync();
+//     sync();
        init_setenv("INIT_HALT", halttype);
        execv(INIT, args);

--- netfs.orig  2008-03-21 18:19:59.000000000 -0400
+++ netfs       2008-03-27 06:04:32.000000000 -0400
@@ -97,6 +97,12 @@
   stop)
         # Unmount loopback stuff first
        __umount_loopback_loop
+
+       Devices=`dmsetup info | grep Name |  awk '{print $2}'`
+       for device in $Devices ; do
+               dmsetup message $device 0 "fail_if_no_path"
+       done
+
        if [ -n "$NETDEVMTAB" ]; then
                __umount_loop '$4 ~ /_netdev/ && $2 != "/" {print $2}' \
                        /etc/mtab \


--- Additional comment from bmarzins on 2008-04-21 15:01:27 EDT ---

This might be able to be fixed in a similar way to how 238421 and 443380 will be
fixed. The fix for 430494 (which is the RHEL-4 version of 238421) has a built-in
"multipathd -k" command to disable queueing on devices.  It seems reasonable to
add a configuration parameter that allows you to specify that devices should not
be set to queue unless the multipath service is running.  If you don't have
multipathd running, you won't be trying failed paths once all the paths have
failed, so you will never get to issue those queued IOs. This would be instead
of modifying the shutdown command, which I'm not sure is the best way to handle
this issue.

This would create a window where IOs could fail if the multipath devices were
created before multipathd started, or if you needed to restart multipathd for
some reason.  However that doesn't seem like an unreasonable window to me.

--- Additional comment from pm-rhel on 2008-06-02 16:26:43 EDT ---

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

--- Additional comment from rsarraf on 2008-07-30 10:46:45 EDT ---

Ben,
To some extent it makes sense. Like if the user is aware that all the paths to 
a LUN are down, and on top of that LUN he's using queue_if_no_path, then the 
user could use this particular option and stop queueing and thus be able to 
avoid the queue hang problem.

What if the user executes shutdown, and in between a LUN loses all paths to 
it? If there's any init script (say a mail server) which before doing a 
shutdown does a sync() to that LUN, that'd run into the queue problem. This is 
the oddest case, but is possible. I'm not sure if we want to consider such an 
odd case.

--- Additional comment from bmarzins on 2008-07-30 14:25:54 EDT ---

Like I said in comment #21, it might be possible to add a configuration option,
so that queuing is turned off when multipathd is stopped. If you have something
like this, then all you need to do is make multipathd stop earlier in the
shutdown process. The multipath devices will still work, but they will simply
fail the IO if all paths go down, and this is probably acceptable during shutdown.

--- Additional comment from rlerch on 2008-08-07 20:33:30 EDT ---

Tracking this bug for the Red Hat Enterprise Linux 5.3 Release Notes. 

This Release Note is currently located in the Known Issues section.

--- Additional comment from rlerch on 2008-08-07 20:33:30 EDT ---


Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

--- Additional comment from rsarraf on 2008-08-08 03:28:50 EDT ---


Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,4 +1,10 @@
 (all architectures)
 When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.
 
-To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.+To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
+
+If "1 queue_if_no_path" is a compulsion to use, and the user hits the issue, they can use the dmsetup command to change the policy at runtime for that particular LUN (for which all the paths are unavailable).
+
+Eg:
+`dmsetup message $device 0 "fail_if_no_path"`
+where $device is the mpath device for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path"

--- Additional comment from nandkumar.mane on 2008-08-18 01:19:47 EDT ---


Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -7,4 +7,4 @@
 
 Eg:
 `dmsetup message $device 0 "fail_if_no_path"`
-where $device is the mpath device for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path"+where $device is the mpath device name (eg. mpath2. Don't use path here.) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".

--- Additional comment from bmarzins on 2008-09-08 18:12:52 EDT ---

O.k. Here are some thoughts on a solution, and a code check in.

For some people, this can be fixed simply not using

features "1 queue_if_no_path"

in the appropriate /etc/multipath.conf entry. Instead, people can use

no_path_retry <n>

where <n> is the number of times to retry the paths (checking every 'polling_interval' seconds). After <n> failed retries, queueing is automatically disabled.  This will mean that the shutdown command will hang until this happens, but it will eventually complete.  For people that don't need to deal with long periods where all the paths to their device are down, or people who are willing to have shutdown take a while to complete, this should be fine.

Also, multipath now has a /etc/multipath.conf option, queue_without_daemon, that should help this.  It defaults to "yes".  If set to "no", when multipathd stops, it turns off queue_if_no_path for all multipath devices. With this option set, shutdown -n will kill multipathd before syncing, and the devices will stop queueing correctly

--- Additional comment from bmarzins on 2008-09-08 18:19:27 EDT ---

Created an attachment (id=316127)
Simple shutdown replacement script

While it isn't as nice a being able to simply use the shutdown command as is, If it's really important to have init do the shutdown work, this script duplicates the shutdown command, with the most common (for me anyway) options. It handles the -h, -r, and -t options, and it deals with setting the time to "now" or +<minutes>, although it only sends one warning for delayed shutdowns.  This shutdown script disables queueing on all the paths before calling sync. However it relies on a multipathd interactive command that won't be available until RHEL 5.3,  so it is not suitable as a workaround.

--- Additional comment from xdl-redhat-bugzilla on 2008-09-11 15:27:06 EDT ---

Ben,
Using no_path_retry or Not using queue_if_no_path doesn't look very promising solutions.It is very difficult to determie time to retry paths.
  
What are chances of queue_without_daemon getting included for 5.3.This solution is looking useful given one need to use shutdown -n.

If you are really interested in doing changes in shutdown executable, patch included in comment #20 gives you a good option to changes in shutdonwn.c. Rather than using replacement script, I would prefer doing changes in current shutdown.c ,one need not use any parameter while running shutdown.

Please tell me your views.

--- Additional comment from bmarzins on 2008-09-12 15:48:13 EDT ---

Sorry if I was not clear in comment #29.  The queue_without_daemon code is already checked in.  It will definitely be in 5.3.

As for the ability to use shutdown without options, I'm not sure if the sort of fixes in comment #20 will ever be accepted.  Simply removing the sync() call likely won't work.  I assume (although I'm not totally certain) that this is necessary to guarantee that all cached data is flushed back to disk.  Users expect this to happen on routine shutdowns.  The other option makes shutdown dependent on the multipath package, and I have doubts that this will be acceptable to the sysvinit maintainers.

I wonder if there would be a way for shutdown to signal that is in process, and multipathd could simply occasionally check for this, and disable queuing if it noticed it.  I was hoping that there would be something like that already in the shutdown code, but it seems like all the possible candidates (pid files, etc) get removed before shutdown calls sync(), which is where shutdown would likely be stuck if multipathd only checked occasionally. Something like this, which would not force the sysvinit package to depend on the multipath package, and it would be more generally useful, seems much more likely to be accepted.

--- Additional comment from ddomingo on 2008-09-14 22:45:43 EDT ---


Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -3,8 +3,12 @@
 
 To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
 
-If "1 queue_if_no_path" is a compulsion to use, and the user hits the issue, they can use the dmsetup command to change the policy at runtime for that particular LUN (for which all the paths are unavailable).
+#
 
-Eg:
+When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.
-`dmsetup message $device 0 "fail_if_no_path"`
+
-where $device is the mpath device name (eg. mpath2. Don't use path here.) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".+To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
+
+If you need to use "1 queue_if_no_path" and experience the issue noted here, use dmsetup to edit the policy at runtime for a particular LUN (i.e. for which all the paths are unavailable).
+
+To illustrate: run dmsetup message [device] 0 "fail_if_no_path", where [device] is the multipath device name (e.g. mpath2; do not specify the path) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".

--- Additional comment from nandkumar.mane on 2008-09-14 22:52:18 EDT ---

Ben, Thanks for the information. 
queue_without_dameon option and shutdown -n will help us tackle the issue.

--- Additional comment from nandkumar.mane on 2008-09-18 12:28:32 EDT ---

Ben,Can you please provide us with patches applied for queue_without_daemon feature? So that we can test feature even before 5.3 beta.

--- Additional comment from nandkumar.mane on 2008-09-18 12:28:32 EDT ---


Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,14 +1 @@
-(all architectures)
+(all architectures)
When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.

To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.

#

When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.

To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.

If you need to use "1 queue_if_no_path" and experience the issue noted here, use dmsetup to edit the policy at runtime for a particular LUN (i.e. for which all the paths are unavailable).

To illustrate: run dmsetup message [device] 0 "fail_if_no_path", where [device] is the multipath device name (e.g. mpath2; do not specify the path) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".-When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.
-
-To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
-
-#
-
-When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.
-
-To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
-
-If you need to use "1 queue_if_no_path" and experience the issue noted here, use dmsetup to edit the policy at runtime for a particular LUN (i.e. for which all the paths are unavailable).
-
-To illustrate: run dmsetup message [device] 0 "fail_if_no_path", where [device] is the multipath device name (e.g. mpath2; do not specify the path) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".

--- Additional comment from bmarzins on 2008-09-29 14:36:29 EDT ---

Theres device mapper packages up at
http://people.redhat.com/rpeterso/Experimental/RHEL5.x/dm-multipath/

--- Additional comment from bmarzins on 2008-09-29 14:36:29 EDT ---


Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,14 @@
-(all architectures)
When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.

To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.

#

When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.

To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.

If you need to use "1 queue_if_no_path" and experience the issue noted here, use dmsetup to edit the policy at runtime for a particular LUN (i.e. for which all the paths are unavailable).

To illustrate: run dmsetup message [device] 0 "fail_if_no_path", where [device] is the multipath device name (e.g. mpath2; do not specify the path) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".+(all architectures)
+When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.
+
+To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
+
+#
+
+When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.
+
+To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
+
+If you need to use "1 queue_if_no_path" and experience the issue noted here, use dmsetup to edit the policy at runtime for a particular LUN (i.e. for which all the paths are unavailable).
+
+To illustrate: run dmsetup message [device] 0 "fail_if_no_path", where [device] is the multipath device name (e.g. mpath2; do not specify the path) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".

--- Additional comment from nandkumar.mane on 2008-11-05 04:22:54 EDT ---

queue_without_dameon option helps us tackle the shutdown issue, given we have to run shutdown with -n option.  Thank you.

But here is what man page for shutdown says..
 -n     [DEPRECATED] Donât call init(8) to do the shutdown but do  it  ourself.
              The  use  of this option is discouraged, and its results are not always what youâd expect.

So we need to find some other solution before -n option gets deprecate.

While system is running, if paths went down and user is not able to kill IO process, he/she has to set feature to fail_if_no_path manually using
dmsetup message [device] 0 "fail_if_no_path".

--- Additional comment from cward on 2008-11-05 07:16:26 EDT ---

Netapp, i recommend opening a new bug describing your concerns of the depreciation of -n option to ensure it stays on the radar. Thanks!

--- Additional comment from nandkumar.mane on 2008-11-06 02:52:41 EDT ---

We will open a new bugzilla for deprecation issue. Thanks.

--- Additional comment from errata-xmlrpc on 2009-01-20 17:08:08 EDT ---

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0232.html

Comment 1 nandkumar mane 2009-03-06 08:05:20 UTC

queue_without_daemon feature is not included in RHEL4.8 Beta. 
device-mapper-multipath-0.4.5-33.el4

We have queue_without_daemon feature in RHEL5.3(device-mapper-multipath-0.4.7-23.el5) which helped us solve the above issue. Are there any chances that it will be ported in RHEL4.8.

Comment 3 Andrius Benokraitis 2009-03-31 00:40:38 UTC

I don't think this will be possible at this point in time in the RHEL 4.8 release cycle. Tom - what do you think?

Comment 6 Ben Marzinski 2010-05-25 23:34:00 UTC

backporting this shouldn't be too bad.

Comment 8 Ben Marzinski 2010-10-19 15:17:20 UTC

I've finished up the work on this one.  I should be building a package shortly.

Comment 9 Ben Marzinski 2010-10-27 22:05:30 UTC

queue_without_daemon has been backported.

Comment 11 Gris Ge 2011-01-25 09:33:10 UTC

Feature verified.
device-mapper-multipath-0.4.5-41.el4 have the feature enabled.

When add "queue_without_daemon    no" into mutlipath.conf.
process will be killed when all path down and multipathd stoped.


Ben,

Please do we need to nedd this line into multipath.conf.synthetic?
queue_without_daemon    no

Comment 12 Gris Ge 2011-01-25 09:59:32 UTC

multipath.conf.annotated also need to be updated.
Thanks.

Comment 15 errata-xmlrpc 2011-02-16 14:24:09 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0243.html