Bug 419581

Summary: [NetApp-S 5.3 bug] I/O Processes don't get killed when all the paths to the LUN are down
Product: Red Hat Enterprise Linux 5 Reporter: Ritesh Raj Sarraf <rsarraf>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 5.1CC: agk, andriusb, bmarzins, cward, ddomingo, marting, nandkumar.mane, slevine, xdl-redhat-bugzilla
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
(all architectures) When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored. To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well. # When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored. To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well. If you need to use "1 queue_if_no_path" and experience the issue noted here, use dmsetup to edit the policy at runtime for a particular LUN (i.e. for which all the paths are unavailable). To illustrate: run dmsetup message [device] 0 "fail_if_no_path", where [device] is the multipath device name (e.g. mpath2; do not specify the path) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".
Story Points: ---
Clone Of:
: 488921 (view as bug list) Environment:
Last Closed: 2009-01-20 22:08:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 373081, 391221, 446125, 454962, 488921    
Attachments:
Description Flags
Document queue_if_no_path behavior
none
Simple shutdown replacement script none

Description Ritesh Raj Sarraf 2007-12-11 11:56:48 UTC
Description of problem:
I/O Processes (D state processes) don't get killed with `kill SIGKILL` when all the 
paths to the LUN are down, to which the I/O process is writing to. multipath is 
using "queue_if_no_path" option.

Version-Release number of selected component (if applicable):
RHEL5.1

How reproducible:
Very reproducible

Steps to Reproduce:
1. Map a LUN with multiple paths.
2. Make sure queue_if_no_path is enabled
3. Run I/O on the multipathed device.
4. Make all the paths to the multipathed device down
5. Now try killing the I/O process (say dd). 

The process doesn't get killed. This is a big problem because it holds 
shutdown/reboot process also. The only option left is to do a machine reset.
  
Actual results:
The I/O processes don't get killed at all.

Expected results:
When the I/O processes are sent a kill signal, the kernel should be able to kill the 
process.

Comment 1 Tom Coughlan 2007-12-11 17:14:04 UTC
Ben, Alasdair, please confirm that I have the following right. If so, Stephen,
some guidance about this should go in the dm-multipath manual.

The D state process is in an uninterruptable sleep, waiting for I/O to complete.
The process is essentially in kernel mode, and can not be killed. 

The queue_if_no_path feature causes the kernel to retry the I/O forever, so the
process remains in D state forever. 

The solution is to set no_path_retry to a numeric value specifying the number of
times the system should retry, rather than setting queue_if_no_path. When the
I/O eventually fails, the killed process will die. 



Comment 2 Ben Marzinski 2007-12-11 19:47:19 UTC
Yes, this is correct. Unfortunately, in RHEL5, multipath has two ways of setting
how the multipath device will react when all paths have failed,
"queue_if_no_path" and "no_path_retry".  You should not use both at the same
time. If you are adding

no_path_retry <n>

to your configuration, you should make sure that it does not contain

features "1 queue_if_no_path"

If you need to know the existing configuration for your hardware, you can run
multipathd -k"show config" and find the devices section that matches your
vendor/model.



Comment 4 Tom Coughlan 2007-12-19 23:07:17 UTC
We will add some documentation indicating that 

1) If the "queue_if_no_path" option is selected, then any process that issues
I/O will hang until one or more paths are restored. 

and

2) The two options no_path_retry and queue_if_no_path are mutially exclusive.

Netapp is that sufficient? 

Comment 5 Ritesh Raj Sarraf 2007-12-20 09:32:47 UTC
Yes, that should be good to go.

Comment 7 Ritesh Raj Sarraf 2008-01-28 14:39:04 UTC
Created attachment 293157 [details]
Document queue_if_no_path behavior

It would be good to document this behavior in the man pages.

Comment 8 Ritesh Raj Sarraf 2008-01-28 14:43:58 UTC
The same patch was requested for inclusion upstream:
http://www.redhat.com/archives/dm-devel/2007-December/msg00157.html

Comment 9 Andrius Benokraitis 2008-01-28 16:02:07 UTC
Reassigning to slevine per email.

Comment 10 Don Domingo 2008-01-29 00:04:33 UTC
added to RHEL5.2 release notes under "Known Issues":

<quote>
When using dm-multipath, if features "1 queue_if_no_path" is specified in
/etc/multipath.conf then any process that issues I/O will hang until one or more
paths are restored.

To avoid this, set no_path_retry <N> in /etc/multipath.conf (where <N> is the
number of times the system should retry a path). When you do so, ensure that you
remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
</quote>

please advise if any further revisions are required. thanks!

Comment 14 Andrius Benokraitis 2008-02-11 14:33:43 UTC
Reassigning to Ben per Tom, so that Comment #8 (the patch for man page) is
backported into RHEL 5.3.

Comment 15 Alasdair Kergon 2008-02-11 14:47:07 UTC
"This is a big problem because it holds shutdown/reboot process also."

Has this been addressed?  i.e. Do the machine shutdown scripts handle any
queue_if_no_path settings correctly?


Comment 16 Ritesh Raj Sarraf 2008-02-20 22:15:27 UTC
No. Upto 5.1 the behavior is still the same.

On shutdown, the shutdown sequence is put on hold because the I/O processes 
don't get killed.

Comment 17 Tom Coughlan 2008-02-29 22:47:47 UTC
To recap:

We have a release note in 5.2, as shown in comment 10, warning users about the
current behavior of queue_if_no_path. 

Netapp has proposed the addition of simmilar text to the man page. This has been
posted upstream, and should then go into 5.3. 

Netapp is also requesting that we consider changing the behavior of
queue_if_no_path, so that I/O will continue on the last available path, rather
than removing that path and causing processes (and system shutdown) to hang. 

Ben is this feasible for 5.3? 

Comment 18 Don Domingo 2008-04-02 02:11:33 UTC
Hi,
the RHEL5.2 release notes will be dropped to translation on April 15, 2008, at
which point no further additions or revisions will be entertained.

a mockup of the RHEL5.2 release notes can be viewed at the following link:
http://intranet.corp.redhat.com/ic/intranet/RHEL5u2relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don

Comment 19 Ritesh Raj Sarraf 2008-04-02 08:22:27 UTC
Ben/Alasdair,

Adding the following does partially solve the problem.

rrs@learner:~$ cat /tmp/kde-rrs/patch_netfs
--- netfs.orig  2008-03-21 18:19:59.000000000 -0400
+++ netfs       2008-03-27 06:04:32.000000000 -0400
@@ -97,6 +97,12 @@
   stop)
         # Unmount loopback stuff first
        __umount_loopback_loop
+
+       Devices=`dmsetup info | grep Name |  awk '{print $2}'`
+       for device in $Devices ; do
+               dmsetup message $device 0 "fail_if_no_path"
+       done
+
        if [ -n "$NETDEVMTAB" ]; then
                __umount_loop '$4 ~ /_netdev/ && $2 != "/" {print $2}' \
                        /etc/mtab \


If we execute the above when a LUN is in fail state, I/O processes to it will get 
terminated properly.

Adding the above as part of init script can overcome the problem by making 
sure to change the policy to "fail_if_no_path" before doing the unmount.

Then problem we are seeing currently is that when the shutdown command is 
executed, it too goes into the same wait mode. Which never allows runlevel 
6/0 to execute.

Comment 20 nandkumar mane 2008-04-04 09:54:57 UTC
shutdown command hangs because it calls sync() which stalls the entire 
shutdown process because for some of the devices, all the paths to them are 
down, and "queue_if_no_path" policy is set on them.

Again if we execute shutdown with the -n option, shutdown tries to kill all 
processes and gets hung while trying to kill 'D' State uninteruptible state 
processes which are pumping I/O to the LUN for which all the paths are down.

Here are two different solutions to solve these problems.

1) Changing policy of multipathd devices to "fail_if_no_path" in shutdown.c 
solves the issue.

--- sysvinit-2.86/src/shutdown.c.org    2008-04-04 02:41:33.000000000 -0400
+++ sysvinit-2.86/src/shutdown.c        2008-04-04 02:46:27.000000000 -0400
@@ -369,6 +369,10 @@
                hardsleep(1);
                stopit(0);
        }
+
+       syslog(LOG_NOTICE, "Changing multipathed device policy to 
fail_if_no_path");
+       system ("Devices=`dmsetup info | grep Name |  awk '{print $2}'`;for 
device in $Devices ; do dmsetup message $device 0 \"fail_if_no_path\";  done");
+
        openlog("shutdown", LOG_PID, LOG_USER);
        if (do_halt)
                syslog(LOG_NOTICE, "shutting down for system halt");


2) Remove sync() from shutdown.c and add policy change logic in netfs.
   Here, shutdown with -n option will still hang as shutdown will try to kill 
all processes.

--- sysvinit-2.86/src/shutdown.c.orig   2008-04-04 04:31:48.000000000 -0400
+++ sysvinit-2.86/src/shutdown.c        2008-04-04 04:32:12.000000000 -0400
@@ -392,7 +392,7 @@
        unlink(NOLOGIN);

        /* Now execute init to change runlevel. */
-       sync();
+//     sync();
        init_setenv("INIT_HALT", halttype);
        execv(INIT, args);

--- netfs.orig  2008-03-21 18:19:59.000000000 -0400
+++ netfs       2008-03-27 06:04:32.000000000 -0400
@@ -97,6 +97,12 @@
   stop)
         # Unmount loopback stuff first
        __umount_loopback_loop
+
+       Devices=`dmsetup info | grep Name |  awk '{print $2}'`
+       for device in $Devices ; do
+               dmsetup message $device 0 "fail_if_no_path"
+       done
+
        if [ -n "$NETDEVMTAB" ]; then
                __umount_loop '$4 ~ /_netdev/ && $2 != "/" {print $2}' \
                        /etc/mtab \


Comment 21 Ben Marzinski 2008-04-21 19:01:27 UTC
This might be able to be fixed in a similar way to how 238421 and 443380 will be
fixed. The fix for 430494 (which is the RHEL-4 version of 238421) has a built-in
"multipathd -k" command to disable queueing on devices.  It seems reasonable to
add a configuration parameter that allows you to specify that devices should not
be set to queue unless the multipath service is running.  If you don't have
multipathd running, you won't be trying failed paths once all the paths have
failed, so you will never get to issue those queued IOs. This would be instead
of modifying the shutdown command, which I'm not sure is the best way to handle
this issue.

This would create a window where IOs could fail if the multipath devices were
created before multipathd started, or if you needed to restart multipathd for
some reason.  However that doesn't seem like an unreasonable window to me.

Comment 22 RHEL Program Management 2008-06-02 20:26:43 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 23 Ritesh Raj Sarraf 2008-07-30 14:46:45 UTC
Ben,
To some extent it makes sense. Like if the user is aware that all the paths to 
a LUN are down, and on top of that LUN he's using queue_if_no_path, then the 
user could use this particular option and stop queueing and thus be able to 
avoid the queue hang problem.

What if the user executes shutdown, and in between a LUN loses all paths to 
it? If there's any init script (say a mail server) which before doing a 
shutdown does a sync() to that LUN, that'd run into the queue problem. This is 
the oddest case, but is possible. I'm not sure if we want to consider such an 
odd case.

Comment 24 Ben Marzinski 2008-07-30 18:25:54 UTC
Like I said in comment #21, it might be possible to add a configuration option,
so that queuing is turned off when multipathd is stopped. If you have something
like this, then all you need to do is make multipathd stop earlier in the
shutdown process. The multipath devices will still work, but they will simply
fail the IO if all paths go down, and this is probably acceptable during shutdown.

Comment 25 Ryan Lerch 2008-08-08 00:33:30 UTC
Tracking this bug for the Red Hat Enterprise Linux 5.3 Release Notes. 

This Release Note is currently located in the Known Issues section.

Comment 26 Ryan Lerch 2008-08-08 00:33:30 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Comment 27 Ritesh Raj Sarraf 2008-08-08 07:28:50 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,4 +1,10 @@
 (all architectures)
 When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.
 
-To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.+To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
+
+If "1 queue_if_no_path" is a compulsion to use, and the user hits the issue, they can use the dmsetup command to change the policy at runtime for that particular LUN (for which all the paths are unavailable).
+
+Eg:
+`dmsetup message $device 0 "fail_if_no_path"`
+where $device is the mpath device for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path"

Comment 28 nandkumar mane 2008-08-18 05:19:47 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -7,4 +7,4 @@
 
 Eg:
 `dmsetup message $device 0 "fail_if_no_path"`
-where $device is the mpath device for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path"+where $device is the mpath device name (eg. mpath2. Don't use path here.) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".

Comment 29 Ben Marzinski 2008-09-08 22:12:52 UTC
O.k. Here are some thoughts on a solution, and a code check in.

For some people, this can be fixed simply not using

features "1 queue_if_no_path"

in the appropriate /etc/multipath.conf entry. Instead, people can use

no_path_retry <n>

where <n> is the number of times to retry the paths (checking every 'polling_interval' seconds). After <n> failed retries, queueing is automatically disabled.  This will mean that the shutdown command will hang until this happens, but it will eventually complete.  For people that don't need to deal with long periods where all the paths to their device are down, or people who are willing to have shutdown take a while to complete, this should be fine.

Also, multipath now has a /etc/multipath.conf option, queue_without_daemon, that should help this.  It defaults to "yes".  If set to "no", when multipathd stops, it turns off queue_if_no_path for all multipath devices. With this option set, shutdown -n will kill multipathd before syncing, and the devices will stop queueing correctly

Comment 30 Ben Marzinski 2008-09-08 22:19:27 UTC
Created attachment 316127 [details]
Simple shutdown replacement script

While it isn't as nice a being able to simply use the shutdown command as is, If it's really important to have init do the shutdown work, this script duplicates the shutdown command, with the most common (for me anyway) options. It handles the -h, -r, and -t options, and it deals with setting the time to "now" or +<minutes>, although it only sends one warning for delayed shutdowns.  This shutdown script disables queueing on all the paths before calling sync. However it relies on a multipathd interactive command that won't be available until RHEL 5.3,  so it is not suitable as a workaround.

Comment 33 NetApp filed bugzillas 2008-09-11 19:27:06 UTC
Ben,
Using no_path_retry or Not using queue_if_no_path doesn't look very promising solutions.It is very difficult to determie time to retry paths.
  
What are chances of queue_without_daemon getting included for 5.3.This solution is looking useful given one need to use shutdown -n.

If you are really interested in doing changes in shutdown executable, patch included in comment #20 gives you a good option to changes in shutdonwn.c. Rather than using replacement script, I would prefer doing changes in current shutdown.c ,one need not use any parameter while running shutdown.

Please tell me your views.

Comment 34 Ben Marzinski 2008-09-12 19:48:13 UTC
Sorry if I was not clear in comment #29.  The queue_without_daemon code is already checked in.  It will definitely be in 5.3.

As for the ability to use shutdown without options, I'm not sure if the sort of fixes in comment #20 will ever be accepted.  Simply removing the sync() call likely won't work.  I assume (although I'm not totally certain) that this is necessary to guarantee that all cached data is flushed back to disk.  Users expect this to happen on routine shutdowns.  The other option makes shutdown dependent on the multipath package, and I have doubts that this will be acceptable to the sysvinit maintainers.

I wonder if there would be a way for shutdown to signal that is in process, and multipathd could simply occasionally check for this, and disable queuing if it noticed it.  I was hoping that there would be something like that already in the shutdown code, but it seems like all the possible candidates (pid files, etc) get removed before shutdown calls sync(), which is where shutdown would likely be stuck if multipathd only checked occasionally. Something like this, which would not force the sysvinit package to depend on the multipath package, and it would be more generally useful, seems much more likely to be accepted.

Comment 35 Don Domingo 2008-09-15 02:45:43 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -3,8 +3,12 @@
 
 To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
 
-If "1 queue_if_no_path" is a compulsion to use, and the user hits the issue, they can use the dmsetup command to change the policy at runtime for that particular LUN (for which all the paths are unavailable).
+#
 
-Eg:
+When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.
-`dmsetup message $device 0 "fail_if_no_path"`
+
-where $device is the mpath device name (eg. mpath2. Don't use path here.) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".+To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
+
+If you need to use "1 queue_if_no_path" and experience the issue noted here, use dmsetup to edit the policy at runtime for a particular LUN (i.e. for which all the paths are unavailable).
+
+To illustrate: run dmsetup message [device] 0 "fail_if_no_path", where [device] is the multipath device name (e.g. mpath2; do not specify the path) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".

Comment 36 nandkumar mane 2008-09-15 02:52:18 UTC
Ben, Thanks for the information. 
queue_without_dameon option and shutdown -n will help us tackle the issue.

Comment 37 nandkumar mane 2008-09-18 16:28:32 UTC
Ben,Can you please provide us with patches applied for queue_without_daemon feature? So that we can test feature even before 5.3 beta.

Comment 38 nandkumar mane 2008-09-18 16:28:32 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,14 +1 @@
-(all architectures)
+(all architectures)
When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.

To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.

#

When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.

To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.

If you need to use "1 queue_if_no_path" and experience the issue noted here, use dmsetup to edit the policy at runtime for a particular LUN (i.e. for which all the paths are unavailable).

To illustrate: run dmsetup message [device] 0 "fail_if_no_path", where [device] is the multipath device name (e.g. mpath2; do not specify the path) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".-When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.
-
-To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
-
-#
-
-When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.
-
-To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
-
-If you need to use "1 queue_if_no_path" and experience the issue noted here, use dmsetup to edit the policy at runtime for a particular LUN (i.e. for which all the paths are unavailable).
-
-To illustrate: run dmsetup message [device] 0 "fail_if_no_path", where [device] is the multipath device name (e.g. mpath2; do not specify the path) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".

Comment 40 Ben Marzinski 2008-09-29 18:36:29 UTC
Theres device mapper packages up at
http://people.redhat.com/rpeterso/Experimental/RHEL5.x/dm-multipath/

Comment 41 Ben Marzinski 2008-09-29 18:36:29 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,14 @@
-(all architectures)
When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.

To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.

#

When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.

To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.

If you need to use "1 queue_if_no_path" and experience the issue noted here, use dmsetup to edit the policy at runtime for a particular LUN (i.e. for which all the paths are unavailable).

To illustrate: run dmsetup message [device] 0 "fail_if_no_path", where [device] is the multipath device name (e.g. mpath2; do not specify the path) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".+(all architectures)
+When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.
+
+To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
+
+#
+
+When using dm-multipath, if features "1 queue_if_no_path" is specified in /etc/multipath.conf then any process that issues I/O will hang until one or more paths are restored.
+
+To avoid this, set no_path_retry [N] in /etc/multipath.conf (where [N] is the number of times the system should retry a path). When you do, remove the features "1 queue_if_no_path" option from /etc/multipath.conf as well.
+
+If you need to use "1 queue_if_no_path" and experience the issue noted here, use dmsetup to edit the policy at runtime for a particular LUN (i.e. for which all the paths are unavailable).
+
+To illustrate: run dmsetup message [device] 0 "fail_if_no_path", where [device] is the multipath device name (e.g. mpath2; do not specify the path) for which you want to change the policy from "queue_if_no_path" to "fail_if_no_path".

Comment 42 nandkumar mane 2008-11-05 09:22:54 UTC
queue_without_dameon option helps us tackle the shutdown issue, given we have to run shutdown with -n option.  Thank you.

But here is what man page for shutdown says..
 -n     [DEPRECATED] Donât call init(8) to do the shutdown but do  it  ourself.
              The  use  of this option is discouraged, and its results are not always what youâd expect.

So we need to find some other solution before -n option gets deprecate.

While system is running, if paths went down and user is not able to kill IO process, he/she has to set feature to fail_if_no_path manually using
dmsetup message [device] 0 "fail_if_no_path".

Comment 43 Chris Ward 2008-11-05 12:16:26 UTC
Netapp, i recommend opening a new bug describing your concerns of the depreciation of -n option to ensure it stays on the radar. Thanks!

Comment 44 nandkumar mane 2008-11-06 07:52:41 UTC
We will open a new bugzilla for deprecation issue. Thanks.

Comment 47 errata-xmlrpc 2009-01-20 22:08:08 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0232.html