Bug 826428

Summary: [NetApp 5.7 z - Bug] : Device Mapper Multipath (dm-multipath) fails to update correct path entries during a fabric/switch/storage failure during IO.
Product: Red Hat Enterprise Linux 5 Reporter: Ranjan <ranjan.kumar>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Storage QE <storage-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.7CC: agk, bdonahue, bmarzins, bmr, dwysocha, heinzm, msnitzer, prajnoha, prockai, ranjan.kumar, vaughan.cao, xdl-redhat-bugzilla, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-04-15 19:50:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ranjan 2012-05-30 07:49:38 UTC
Description of problem:
It has been observed that during fabric or storage failures, the RHEL5.7z host's
Device Mapper Multipath daemon (multipathd) fails to update the correct path 
status of the LUNs. "multipath -ll" and multipathd -k "show paths" report path 
status of few paths as failed ([failed][ready]), while the scsi device actually
exists, and is active. Also,in some cases, when "multipath -ll" reports the 
path_status as "active" ([active][ready]), the maps in the 
daemon (multipathd -k"show paths") reports ([failed][ready]) as "failed". Both the cases are explained below :

Case 1 :
When fabric faults with IO is run on a rhel5.7-z host, multipathd daemon 
reports the "path_status" of few paths as failed ,when the scsi device exists and active.

The multipath -ll output looks like the following in such a scenario:

Note scsi device sdal :

360a98000486e53636934694457326548 dm-11 NETAPP,LUN
[size=10G][features=1 queue_if_no_path][hwhandler=1 alua][rw]
\_ round-robin 0 [prio=50][active]
 \_ 1:0:0:37 sdca 68:224  [active][ready]
 \_ 0:0:0:37 sdal 66:80   [failed][ready]
\_ round-robin 0 [prio=10][enabled]
 \_ 0:0:1:37 sdex 129:144 [active][ready]
 \_ 1:0:1:37 sdfe 130:0   [active][ready]

#multipathd -k"show paths" | grep sdal
0:0:0:37 sdal 66:80   50  [failed][ready] X......... 2/20

The dm_status of sdal device is ready and is accessible to host.

#dd if=/dev/sdal of=/dev/null
20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 19.5835 seconds, 548 MB/s

#sg_inq /dev/sdal
standard INQUIRY:
  PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
  [AERC=0]  [TrmTsk=0]  NormACA=1  HiSUP=1  Resp_data_format=2
  SCCS=0  ACC=0  TPGS=1  3PC=0  Protect=0  BQue=0
  EncServ=0  MultiP=1 (VS=0)  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=0  Sync=0  Linked=0  [TranDis=0]  CmdQue=1
  [SPI: Clocking=0x0  QAS=0  IUS=0]
    length=117 (0x75)   Peripheral device type: disk
 Vendor identification: NETAPP
 Product identification: LUN
 Product revision level: 8020
 Unit serial number: HnSci4iDW2eH

Case 2 :
Also for few paths, "multipath -ll" reports the path_status as "active" whereas
the maps in the daemon reports "failed".

Note scsi device sdfe :
# multipath -ll | grep sdfe
 \_ 1:0:1:37 sdfe 130:0   [active][ready]

# multipathd -k"show paths" | grep sdfe
1:0:1:37 sdfe 130:0   10  [failed][ready] XXXXX..... 10/20

All the scsi devices are accessible to host but the entries in multipath daemon
are wrong.



Version-Release number of selected component (if applicable):
kernel : 2.6.18-274.18.1.el5
device-mapper-event-1.02.63-4.el5
device-mapper-1.02.63-4.el5
device-mapper-multipath-0.4.7-46.el5_7.2
device-mapper-1.02.63-4.el5

How reproducible:
frequent.

Steps to Reproduce:
1. Map 10 LUNs with 4 paths each.
2. Create few LVs.
3. Create an fs and start IO to the LVs.
4. Run fabric/switch/storage failure.
  
Actual results:
Path states are reported incorrect.

Expected results:
path states should be correct.

Additional info:

Comment 1 RHEL Program Management 2014-01-29 10:36:44 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 2 Ben Marzinski 2014-02-06 23:14:22 UTC
Can you still reproduce this? If so, can you please the output of

# multipath -ll
# multipthd -k"show config"

and the syslog output from when this occurs. Also, while this is occuring, could you run

# dmesetup status <devname>

This will let me know if it's simply multipathd that doesn't have the correct status, or if the status is really wrong in the kernel.

Comment 3 Ben Marzinski 2014-03-27 21:11:22 UTC
Are you still able to hit this?

Comment 4 Ben Marzinski 2014-04-09 03:36:04 UTC
I am not able to recreate this behavior.  If you are not able to reproduce this on the current packages, I'm going to close this bug.

Comment 5 Red Hat Bugzilla 2023-09-14 01:29:29 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days