Bug 826428 - [NetApp 5.7 z - Bug] : Device Mapper Multipath (dm-multipath) fails to update correct path entries during a fabric/switch/storage failure during IO. [NEEDINFO]
[NetApp 5.7 z - Bug] : Device Mapper Multipath (dm-multipath) fails to update...
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
x86_64 Linux
unspecified Severity high
: rc
: ---
Assigned To: Ben Marzinski
Storage QE
Depends On:
  Show dependency treegraph
Reported: 2012-05-30 03:49 EDT by Ranjan
Modified: 2014-04-15 15:50 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2014-04-15 15:50:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
bmarzins: needinfo? (ranjan.kumar)

Attachments (Terms of Use)

  None (edit)
Description Ranjan 2012-05-30 03:49:38 EDT
Description of problem:
It has been observed that during fabric or storage failures, the RHEL5.7z host's
Device Mapper Multipath daemon (multipathd) fails to update the correct path 
status of the LUNs. "multipath -ll" and multipathd -k "show paths" report path 
status of few paths as failed ([failed][ready]), while the scsi device actually
exists, and is active. Also,in some cases, when "multipath -ll" reports the 
path_status as "active" ([active][ready]), the maps in the 
daemon (multipathd -k"show paths") reports ([failed][ready]) as "failed". Both the cases are explained below :

Case 1 :
When fabric faults with IO is run on a rhel5.7-z host, multipathd daemon 
reports the "path_status" of few paths as failed ,when the scsi device exists and active.

The multipath -ll output looks like the following in such a scenario:

Note scsi device sdal :

360a98000486e53636934694457326548 dm-11 NETAPP,LUN
[size=10G][features=1 queue_if_no_path][hwhandler=1 alua][rw]
\_ round-robin 0 [prio=50][active]
 \_ 1:0:0:37 sdca 68:224  [active][ready]
 \_ 0:0:0:37 sdal 66:80   [failed][ready]
\_ round-robin 0 [prio=10][enabled]
 \_ 0:0:1:37 sdex 129:144 [active][ready]
 \_ 1:0:1:37 sdfe 130:0   [active][ready]

#multipathd -k"show paths" | grep sdal
0:0:0:37 sdal 66:80   50  [failed][ready] X......... 2/20

The dm_status of sdal device is ready and is accessible to host.

#dd if=/dev/sdal of=/dev/null
20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 19.5835 seconds, 548 MB/s

#sg_inq /dev/sdal
standard INQUIRY:
  PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
  [AERC=0]  [TrmTsk=0]  NormACA=1  HiSUP=1  Resp_data_format=2
  SCCS=0  ACC=0  TPGS=1  3PC=0  Protect=0  BQue=0
  EncServ=0  MultiP=1 (VS=0)  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=0  Sync=0  Linked=0  [TranDis=0]  CmdQue=1
  [SPI: Clocking=0x0  QAS=0  IUS=0]
    length=117 (0x75)   Peripheral device type: disk
 Vendor identification: NETAPP
 Product identification: LUN
 Product revision level: 8020
 Unit serial number: HnSci4iDW2eH

Case 2 :
Also for few paths, "multipath -ll" reports the path_status as "active" whereas
the maps in the daemon reports "failed".

Note scsi device sdfe :
# multipath -ll | grep sdfe
 \_ 1:0:1:37 sdfe 130:0   [active][ready]

# multipathd -k"show paths" | grep sdfe
1:0:1:37 sdfe 130:0   10  [failed][ready] XXXXX..... 10/20

All the scsi devices are accessible to host but the entries in multipath daemon
are wrong.

Version-Release number of selected component (if applicable):
kernel : 2.6.18-274.18.1.el5

How reproducible:

Steps to Reproduce:
1. Map 10 LUNs with 4 paths each.
2. Create few LVs.
3. Create an fs and start IO to the LVs.
4. Run fabric/switch/storage failure.
Actual results:
Path states are reported incorrect.

Expected results:
path states should be correct.

Additional info:
Comment 1 RHEL Product and Program Management 2014-01-29 05:36:44 EST
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.
Comment 2 Ben Marzinski 2014-02-06 18:14:22 EST
Can you still reproduce this? If so, can you please the output of

# multipath -ll
# multipthd -k"show config"

and the syslog output from when this occurs. Also, while this is occuring, could you run

# dmesetup status <devname>

This will let me know if it's simply multipathd that doesn't have the correct status, or if the status is really wrong in the kernel.
Comment 3 Ben Marzinski 2014-03-27 17:11:22 EDT
Are you still able to hit this?
Comment 4 Ben Marzinski 2014-04-08 23:36:04 EDT
I am not able to recreate this behavior.  If you are not able to reproduce this on the current packages, I'm going to close this bug.

Note You need to log in before you can comment on or make changes to this bug.