Bug 227409 - no_path_retry = # acts like no_path_retry = queue
no_path_retry = # acts like no_path_retry = queue
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Ben Marzinski
Corey Marthaler
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-02-05 15:47 EST by Jonathan Earl Brassow
Modified: 2010-01-11 21:28 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-04-27 12:17:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jonathan Earl Brassow 2007-02-05 15:47:40 EST
While testing mirroring on top of MP, we noticed that the I/Os were queued indefinitely by MP when we 
failed all the paths (took out the device).

For those that understand the MP device mapper table, if the problem is in userspace, you should be 
able to tell by the table...  Unfortunately, I don't have those table lines to add to this bugzilla...

BTW, this was with device-mapper-multipath-0.4.5-20.RHEL4.x86_64.rpm
Comment 1 Ben Marzinski 2007-02-05 20:15:36 EST
Do you know if the table ever changed from something like:

mpath10: 0 102400000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 8:48 1000 

to something like:

mpath10: 0 102400000 multipath 0 0 1 1 round-robin 0 1 1 8:48 1000 


When a device is created with

no_path_retry <some number>

You will initially see "1 queue_if_no_path" in the table.  Once the device has
been failed for more than <no_path_retry> * <polling_interval> seconds, the "1
queue_if_no_path" should be replaced by "0" in the table. This works correctly
on my setup. When I interactively watch the paths being checked with

# multipathd -k
> show paths

There looks to be a printing error (it doesn't show the correct
polling_interval) on the path with queued IO, but it functions correctly. After
the correct amount of time, I get a message from multipathd that says

Feb  5 15:18:21 cypher-05 multipathd: mpath13: Disable queueing

and dmsetup table shows the correct value.


Aside from the table, you can run multipath -v3. There should be a line like:

no_path_retry = 10 (config file default)

for each device. You can see this even if the devices are already created when
you run the command. You can also run multipathd with

# multipathd -v6

This should display a lines that say

<map_name>: Retrying.. No active path

and then

<map_name>: Disable queueing

When the device finally fails the IO.  Unfortunately, these print statements
don't tell you how many retrys you have left.
Comment 2 Jonathan Earl Brassow 2007-02-09 09:40:39 EST
I didn't originally understand that it was no_path_retry * polling_interval.  
Our values are 5 and 30 respectively.  After waiting 5 minutes... the paths 
never errored the I/O.

[root@clx12ah01 ~]# rpm -q kernel device-mapper-multipath device-mapper udev
kernel-2.6.9-42.EL
device-mapper-multipath-0.4.5-20.RHEL4
device-mapper-1.02.17-2.el4
udev-039-10.15.EL4
Comment 3 Jonathan Earl Brassow 2007-02-09 09:45:38 EST
sorry:

[root@clx12ah01 ~]# rpm -q kernel-smp device-mapper-multipath device-mapper 
udev
kernel-smp-2.6.9-46.EL
device-mapper-multipath-0.4.5-20.RHEL4
device-mapper-1.02.17-2.el4
udev-039-10.15.EL4
Comment 4 Jonathan Earl Brassow 2007-04-18 15:56:21 EDT
I am hitting this now and it is causing pain for HA LVM.

I am able to reproduce at will.
Comment 7 Jonathan Earl Brassow 2007-04-27 12:17:09 EDT
multipathd was not running.  As a result, the device-mapper mapping table was
not being properly updated.

Note You need to log in before you can comment on or make changes to this bug.