Bug 254148

Summary: Multipath losing pathes
Product: Red Hat Enterprise Linux 4 Reporter: Bjoern Robbe <bjoern.robbe>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Corey Marthaler <cmarthal>
Severity: high Docs Contact:
Priority: high    
Version: 4.0CC: agk, bmarzins, christophe.varoqui, dwysocha, egoggin, junichi.nomura, kueda, lmb, mbroz, prockai, tranlan
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-12 20:23:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bjoern Robbe 2007-08-24 14:05:48 UTC
Description of problem:
multipath lost pathes

Version-Release number of selected component (if applicable):

device-mapper-multipath-0.4.5-21.RHEL4
Kernel-Version: 2.6.9-55.0.2.ELsmp

Environment:

Hardware: 
Server: HP DL360
SAN: 2x IBM Director 2109-48 (Relabelt Brocade Silkworm 48000)
Storage: 2x IBM Enteprise Storage Server 2105-800 (ESS)

Environment Description:

The Server is connected via two Fibre Channel adapters to both Directors. The
two ESS also connected to these Directors this 4 pathes each. The Server got 2
LUNs from every ESS, so overall the Server got 16 hdisks from the ESS. 

With the Multipath driver wie build two disk, both mirrored across both ESS. So
LUN0 in ESS1 and LUN0 in ESS2 = md0 and LUN1 in ESS1 and LUN1 in ESS2 = md1.

A lv is build across both disk.

ls -l /dev/mpath/
total 0
lrwxrwxrwx  1 root root 7 Aug 23 14:36 mpath0 -> ../dm-8
lrwxrwxrwx  1 root root 7 Aug 23 14:36 mpath1 -> ../dm-9
lrwxrwxrwx  1 root root 8 Aug 23 14:36 mpath2 -> ../dm-10
lrwxrwxrwx  1 root root 8 Aug 23 14:36 mpath3 -> ../dm-11

pvscan
PV /dev/md0 VG testvg01   lvm2 [29.80 GB / 10.89 GB free]
PV /dev/md1 VG testvg01   lvm2 [29.80 GB / 16.21 GB free]


lsmod |grep -i multi
dm_multipath           22985  2 dm_round_robin
dm_mod                 64617  36 dm_multipath,dm_snapshot,dm_zero,dm_mirror

modinfo dm_mod
filename:       /lib/modules/2.6.9-55.0.2.ELsmp/kernel/drivers/md/dm-mod.ko
parm:           major:The major number of the device mapper
description:    device-mapper driver
author:         Joe Thornber <dm-devel>
license:        GPL
vermagic:       2.6.9-55.0.2.ELsmp SMP 686 REGPARM 4KSTACKS gcc-3.4
depends:        

modinfo dm_multipath
filename:      /lib/modules/2.6.9-55.0.2.ELsmp/kernel/drivers/md/dm-multipath.ko
description:    device-mapper multipath target
author:         Sistina Software <dm-devel>
license:        GPL
vermagic:       2.6.9-55.0.2.ELsmp SMP 686 REGPARM 4KSTACKS gcc-3.4
depends:        dm-mod


How reproducible:

Preparation:

Open three termials. In the 1. Termial run the following:

while true; do multipath -ll; sleep 1; done

In the 2. Termial run

iostat -d /dev/md1 1

at least run in the third termial a dd comand

dd if=/dev/zero of=/testlv/test.out bs=1024 count=10000000

Error Description:

In the first termial, inital you can see the following:

[code]
mpath2 (1IBM_____2105____________02A24503)
[size=29 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=4][active]
 \_ 0:0:2:0 sde 8:64  [active][ready]
 \_ 0:0:3:0 sdg 8:96  [active][ready]
 \_ 1:0:2:0 sdm 8:192 [active][ready]
 \_ 1:0:3:0 sdo 8:224 [active][ready]

mpath1 (1IBM_____2105____________15C24597)
[size=29 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=4][active]
 \_ 0:0:0:1 sdb 8:16  [active][ready]
 \_ 0:0:1:1 sdd 8:48  [active][ready]
 \_ 1:0:0:1 sdj 8:144 [active][ready]
 \_ 1:0:1:1 sdl 8:176 [active][ready]

mpath0 (1IBM_____2105____________13B24597)
[size=29 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=4][active]
 \_ 0:0:0:0 sda 8:0   [active][ready]
 \_ 0:0:1:0 sdc 8:32  [active][ready]
 \_ 1:0:0:0 sdi 8:128 [active][ready]
 \_ 1:0:1:0 sdk 8:160 [active][ready]

mpath3 (1IBM_____2105____________11624503)
[size=29 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=4][active]
 \_ 0:0:2:1 sdf 8:80  [active][ready]
 \_ 0:0:3:1 sdh 8:112 [active][ready]
 \_ 1:0:2:1 sdn 8:208 [active][ready]
 \_ 1:0:3:1 sdp 8:240 [active][ready]

[/code]

A few seconds after starting the dd command, you'll see that some pathes fails
and IO stops (in iostat window). 

[code]
mpath2 (1IBM_____2105____________02A24503)
[size=29 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=4][active]
 \_ 0:0:2:0 sde 8:64  [active][ready]
 \_ 0:0:3:0 sdg 8:96  [active][ready]
 \_ 1:0:2:0 sdm 8:192 [active][ready]
 \_ 1:0:3:0 sdo 8:224 [active][ready]

mpath1 (1IBM_____2105____________15C24597)
[size=29 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=4][active]
 \_ 0:0:0:1 sdb 8:16  [failed][ready]
 \_ 0:0:1:1 sdd 8:48  [failed][ready]
 \_ 1:0:0:1 sdj 8:144 [active][ready]
 \_ 1:0:1:1 sdl 8:176 [active][ready]

mpath0 (1IBM_____2105____________13B24597)
[size=29 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=4][active]
 \_ 0:0:0:0 sda 8:0   [active][ready]
 \_ 0:0:1:0 sdc 8:32  [active][ready]
 \_ 1:0:0:0 sdi 8:128 [active][ready]
 \_ 1:0:1:0 sdk 8:160 [active][ready]

mpath3 (1IBM_____2105____________11624503)
[size=29 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [prio=4][active]
 \_ 0:0:2:1 sdf 8:80  [failed][ready]
 \_ 0:0:3:1 sdh 8:112 [failed][ready]
 \_ 1:0:2:1 sdn 8:208 [active][ready]
 \_ 1:0:3:1 sdp 8:240 [active][ready]
[/code]

[code]
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
md1           26677.23         0.00    213417.82          0     215552

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
md1           39331.68         0.00    314653.47          0     317800

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
md1           62253.54         0.00    498028.28          0     493048

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
md1           16437.00         0.00    131496.00          0     131496

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
md1               0.00         0.00         0.00          0          0

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
md1               0.00         0.00         0.00          0          0

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
md1               0.00         0.00         0.00          0          0

[/code]

Futhermore the first termial begins to "hang". I think this ist based on path
recovery.

I the dd is finished, the failed path's switch back to active.

Expected results:

There is no reason for the pathes to fail in caes of IO. So normaly multipath
has to use all pathes. 



Additional info:

Comment 1 Doug Ledford 2007-08-24 16:06:18 UTC
This is a problem in either the device-mapper-multipath user space code, or the
device-mapper kernel code.  The mdadm device is functioning properly in as much
as when you start the dd, it sends the data to both mpath1 and mpath3.  Once
those devices start to block up due to failed paths, since you have the
queue_if_no_path option enabled, the md device will never see a failure and it
will lock up anything trying to access the device until the path situation is
corrected.

In any case, the device mapper people will know more about this than I will, so
I'm reassigning the bug to them.

Comment 2 Bjoern Robbe 2007-08-24 17:58:09 UTC
If you need any more data (traces, logs or something else) please let me know. 

Thx for your support

Comment 3 Ben Marzinski 2007-10-10 21:11:27 UTC
First off, I don't understand why All IO would block to the devices if only two
of the paths have failed, as is shown in the initial comments on this bug. Also, I
don't understand why the kernel thinks the path has failed but multipath still
thinks the path is fine.

There's a bunch of information that would make this much easier to debug.

First, can you send a copy of your /etc/multipath.conf

second, can you run 
# multipath -F
# multipath -v6
and send the output.  This will require unmounting your logical volumes

This should let me know exactly how things are set up. I'm guessing that it
would be easier to identify the problem if we removed as much complexity as
possible.  Can you not setup mdadm and lvm on top of these devices, and run
dd's directly to the multipath devices.  NOTE! This will destroy whatever data
is on those devices, so if you need to keep the data on there, you won't be
able to do this. It would be nice if you could at least remove mdadm.  Of course
this will mean that the two devices are no longer in sync with eachother.

At any rate, when you run this again, if you could save the all the messages
that get logged to /var/log/messages, that should let me see what's going on
when these devices are failing.

Comment 4 Ben Marzinski 2008-04-03 00:06:15 UTC
Are you still seeing this problem? Could you send me the above information?