Bug 826428 - [NetApp 5.7 z - Bug] : Device Mapper Multipath (dm-multipath) fails to update correct path entries during a fabric/switch/storage failure during IO.
Summary: [NetApp 5.7 z - Bug] : Device Mapper Multipath (dm-multipath) fails to update...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath
Version: 5.7
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Storage QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-05-30 07:49 UTC by Ranjan
Modified: 2023-09-14 01:29 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-04-15 19:50:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ranjan 2012-05-30 07:49:38 UTC
Description of problem:
It has been observed that during fabric or storage failures, the RHEL5.7z host's
Device Mapper Multipath daemon (multipathd) fails to update the correct path 
status of the LUNs. "multipath -ll" and multipathd -k "show paths" report path 
status of few paths as failed ([failed][ready]), while the scsi device actually
exists, and is active. Also,in some cases, when "multipath -ll" reports the 
path_status as "active" ([active][ready]), the maps in the 
daemon (multipathd -k"show paths") reports ([failed][ready]) as "failed". Both the cases are explained below :

Case 1 :
When fabric faults with IO is run on a rhel5.7-z host, multipathd daemon 
reports the "path_status" of few paths as failed ,when the scsi device exists and active.

The multipath -ll output looks like the following in such a scenario:

Note scsi device sdal :

360a98000486e53636934694457326548 dm-11 NETAPP,LUN
[size=10G][features=1 queue_if_no_path][hwhandler=1 alua][rw]
\_ round-robin 0 [prio=50][active]
 \_ 1:0:0:37 sdca 68:224  [active][ready]
 \_ 0:0:0:37 sdal 66:80   [failed][ready]
\_ round-robin 0 [prio=10][enabled]
 \_ 0:0:1:37 sdex 129:144 [active][ready]
 \_ 1:0:1:37 sdfe 130:0   [active][ready]

#multipathd -k"show paths" | grep sdal
0:0:0:37 sdal 66:80   50  [failed][ready] X......... 2/20

The dm_status of sdal device is ready and is accessible to host.

#dd if=/dev/sdal of=/dev/null
20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 19.5835 seconds, 548 MB/s

#sg_inq /dev/sdal
standard INQUIRY:
  PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
  [AERC=0]  [TrmTsk=0]  NormACA=1  HiSUP=1  Resp_data_format=2
  SCCS=0  ACC=0  TPGS=1  3PC=0  Protect=0  BQue=0
  EncServ=0  MultiP=1 (VS=0)  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=0  Sync=0  Linked=0  [TranDis=0]  CmdQue=1
  [SPI: Clocking=0x0  QAS=0  IUS=0]
    length=117 (0x75)   Peripheral device type: disk
 Vendor identification: NETAPP
 Product identification: LUN
 Product revision level: 8020
 Unit serial number: HnSci4iDW2eH

Case 2 :
Also for few paths, "multipath -ll" reports the path_status as "active" whereas
the maps in the daemon reports "failed".

Note scsi device sdfe :
# multipath -ll | grep sdfe
 \_ 1:0:1:37 sdfe 130:0   [active][ready]

# multipathd -k"show paths" | grep sdfe
1:0:1:37 sdfe 130:0   10  [failed][ready] XXXXX..... 10/20

All the scsi devices are accessible to host but the entries in multipath daemon
are wrong.



Version-Release number of selected component (if applicable):
kernel : 2.6.18-274.18.1.el5
device-mapper-event-1.02.63-4.el5
device-mapper-1.02.63-4.el5
device-mapper-multipath-0.4.7-46.el5_7.2
device-mapper-1.02.63-4.el5

How reproducible:
frequent.

Steps to Reproduce:
1. Map 10 LUNs with 4 paths each.
2. Create few LVs.
3. Create an fs and start IO to the LVs.
4. Run fabric/switch/storage failure.
  
Actual results:
Path states are reported incorrect.

Expected results:
path states should be correct.

Additional info:

Comment 1 RHEL Program Management 2014-01-29 10:36:44 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 2 Ben Marzinski 2014-02-06 23:14:22 UTC
Can you still reproduce this? If so, can you please the output of

# multipath -ll
# multipthd -k"show config"

and the syslog output from when this occurs. Also, while this is occuring, could you run

# dmesetup status <devname>

This will let me know if it's simply multipathd that doesn't have the correct status, or if the status is really wrong in the kernel.

Comment 3 Ben Marzinski 2014-03-27 21:11:22 UTC
Are you still able to hit this?

Comment 4 Ben Marzinski 2014-04-09 03:36:04 UTC
I am not able to recreate this behavior.  If you are not able to reproduce this on the current packages, I'm going to close this bug.

Comment 5 Red Hat Bugzilla 2023-09-14 01:29:29 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.