While testing mirroring on top of MP, we noticed that the I/Os were queued indefinitely by MP when we
failed all the paths (took out the device).
For those that understand the MP device mapper table, if the problem is in userspace, you should be
able to tell by the table... Unfortunately, I don't have those table lines to add to this bugzilla...
BTW, this was with device-mapper-multipath-0.4.5-20.RHEL4.x86_64.rpm
Do you know if the table ever changed from something like:
mpath10: 0 102400000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 8:48 1000
to something like:
mpath10: 0 102400000 multipath 0 0 1 1 round-robin 0 1 1 8:48 1000
When a device is created with
no_path_retry <some number>
You will initially see "1 queue_if_no_path" in the table. Once the device has
been failed for more than <no_path_retry> * <polling_interval> seconds, the "1
queue_if_no_path" should be replaced by "0" in the table. This works correctly
on my setup. When I interactively watch the paths being checked with
# multipathd -k
> show paths
There looks to be a printing error (it doesn't show the correct
polling_interval) on the path with queued IO, but it functions correctly. After
the correct amount of time, I get a message from multipathd that says
Feb 5 15:18:21 cypher-05 multipathd: mpath13: Disable queueing
and dmsetup table shows the correct value.
Aside from the table, you can run multipath -v3. There should be a line like:
no_path_retry = 10 (config file default)
for each device. You can see this even if the devices are already created when
you run the command. You can also run multipathd with
# multipathd -v6
This should display a lines that say
<map_name>: Retrying.. No active path
<map_name>: Disable queueing
When the device finally fails the IO. Unfortunately, these print statements
don't tell you how many retrys you have left.
I didn't originally understand that it was no_path_retry * polling_interval.
Our values are 5 and 30 respectively. After waiting 5 minutes... the paths
never errored the I/O.
[root@clx12ah01 ~]# rpm -q kernel device-mapper-multipath device-mapper udev
[root@clx12ah01 ~]# rpm -q kernel-smp device-mapper-multipath device-mapper
I am hitting this now and it is causing pain for HA LVM.
I am able to reproduce at will.
multipathd was not running. As a result, the device-mapper mapping table was
not being properly updated.