826428 – [NetApp 5.7 z - Bug] : Device Mapper Multipath (dm-multipath) fails to update correct path entries during a fabric/switch/storage failure during IO.

Bug 826428 - [NetApp 5.7 z - Bug] : Device Mapper Multipath (dm-multipath) fails to update correct path entries during a fabric/switch/storage failure during IO.

Summary: [NetApp 5.7 z - Bug] : Device Mapper Multipath (dm-multipath) fails to update...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	device-mapper-multipath
Sub Component:
Version:	5.7
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Ben Marzinski
QA Contact:	Storage QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-05-30 07:49 UTC by Ranjan
Modified:	2023-09-14 01:29 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-04-15 19:50:52 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ranjan 2012-05-30 07:49:38 UTC

Description of problem:
It has been observed that during fabric or storage failures, the RHEL5.7z host's
Device Mapper Multipath daemon (multipathd) fails to update the correct path 
status of the LUNs. "multipath -ll" and multipathd -k "show paths" report path 
status of few paths as failed ([failed][ready]), while the scsi device actually
exists, and is active. Also,in some cases, when "multipath -ll" reports the 
path_status as "active" ([active][ready]), the maps in the 
daemon (multipathd -k"show paths") reports ([failed][ready]) as "failed". Both the cases are explained below :

Case 1 :
When fabric faults with IO is run on a rhel5.7-z host, multipathd daemon 
reports the "path_status" of few paths as failed ,when the scsi device exists and active.

The multipath -ll output looks like the following in such a scenario:

Note scsi device sdal :

360a98000486e53636934694457326548 dm-11 NETAPP,LUN
[size=10G][features=1 queue_if_no_path][hwhandler=1 alua][rw]
\_ round-robin 0 [prio=50][active]
 \_ 1:0:0:37 sdca 68:224  [active][ready]
 \_ 0:0:0:37 sdal 66:80   [failed][ready]
\_ round-robin 0 [prio=10][enabled]
 \_ 0:0:1:37 sdex 129:144 [active][ready]
 \_ 1:0:1:37 sdfe 130:0   [active][ready]

#multipathd -k"show paths" | grep sdal
0:0:0:37 sdal 66:80   50  [failed][ready] X......... 2/20

The dm_status of sdal device is ready and is accessible to host.

#dd if=/dev/sdal of=/dev/null
20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 19.5835 seconds, 548 MB/s

#sg_inq /dev/sdal
standard INQUIRY:
  PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
  [AERC=0]  [TrmTsk=0]  NormACA=1  HiSUP=1  Resp_data_format=2
  SCCS=0  ACC=0  TPGS=1  3PC=0  Protect=0  BQue=0
  EncServ=0  MultiP=1 (VS=0)  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=0  Sync=0  Linked=0  [TranDis=0]  CmdQue=1
  [SPI: Clocking=0x0  QAS=0  IUS=0]
    length=117 (0x75)   Peripheral device type: disk
 Vendor identification: NETAPP
 Product identification: LUN
 Product revision level: 8020
 Unit serial number: HnSci4iDW2eH

Case 2 :
Also for few paths, "multipath -ll" reports the path_status as "active" whereas
the maps in the daemon reports "failed".

Note scsi device sdfe :
# multipath -ll | grep sdfe
 \_ 1:0:1:37 sdfe 130:0   [active][ready]

# multipathd -k"show paths" | grep sdfe
1:0:1:37 sdfe 130:0   10  [failed][ready] XXXXX..... 10/20

All the scsi devices are accessible to host but the entries in multipath daemon
are wrong.



Version-Release number of selected component (if applicable):
kernel : 2.6.18-274.18.1.el5
device-mapper-event-1.02.63-4.el5
device-mapper-1.02.63-4.el5
device-mapper-multipath-0.4.7-46.el5_7.2
device-mapper-1.02.63-4.el5

How reproducible:
frequent.

Steps to Reproduce:
1. Map 10 LUNs with 4 paths each.
2. Create few LVs.
3. Create an fs and start IO to the LVs.
4. Run fabric/switch/storage failure.
  
Actual results:
Path states are reported incorrect.

Expected results:
path states should be correct.

Additional info:

Comment 1 RHEL Program Management 2014-01-29 10:36:44 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 2 Ben Marzinski 2014-02-06 23:14:22 UTC

Can you still reproduce this? If so, can you please the output of

# multipath -ll
# multipthd -k"show config"

and the syslog output from when this occurs. Also, while this is occuring, could you run

# dmesetup status <devname>

This will let me know if it's simply multipathd that doesn't have the correct status, or if the status is really wrong in the kernel.

Comment 3 Ben Marzinski 2014-03-27 21:11:22 UTC

Are you still able to hit this?

Comment 4 Ben Marzinski 2014-04-09 03:36:04 UTC

I am not able to recreate this behavior.  If you are not able to reproduce this on the current packages, I'm going to close this bug.

Comment 5 Red Hat Bugzilla 2023-09-14 01:29:29 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.