Bug 480359 - Qlogic Unexpected Device Mapper (DM) path failures running IO with no perturbations
Qlogic Unexpected Device Mapper (DM) path failures running IO with no perturb...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
5.3
All Linux
low Severity high
: rc
: ---
Assigned To: LVM and device-mapper development team
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-01-16 11:46 EST by Richard Wojdak
Modified: 2010-07-27 16:11 EDT (History)
20 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-07-27 16:02:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Message file with failing path issue (102.76 KB, text/plain)
2009-01-16 11:49 EST, Richard Wojdak
no flags Details
Emulex Message file for DM failed paths (921.48 KB, application/octet-stream)
2009-01-16 15:01 EST, Richard Wojdak
no flags Details
QLogic serial console output with extended error logging enabled (1.26 MB, application/octet-stream)
2009-01-21 14:57 EST, Richard Wojdak
no flags Details

  None (edit)
Description Richard Wojdak 2009-01-16 11:46:05 EST
Description of problem:

While running I/O on DM multipath devices , we are seeing frequent path failures which leads to unexpected I/O failover.

Environment:
  RH53 RC2
  HP Proliant and Integrity Blades
  Qlogic QMH2462 and Emulex LPe1105 using Inbox driver with DM
  Test: Hazard C8



Snippet of issue
> Jan 14 14:40:34 RH53-IA64 kernel: sd 1:0:6:12: SCSI error: return code 
> = 0x00020000 Jan 14 14:40:34 RH53-IA64 kernel: end_request: I/O error, 
> dev sduf, sector 97312554 Jan 14 14:40:34 RH53-IA64 kernel: device-mapper: multipath: Failing path 66:624.
> Jan 14 14:40:34 RH53-IA64 kernel: sd 1:0:6:6: SCSI error: return code 
> = 0x00020000 Jan 14 14:40:34 RH53-IA64 kernel: end_request: I/O error, 
> dev sdtz, sector 97555949 Jan 14 14:40:34 RH53-IA64 kernel: device-mapper: multipath: Failing path 66:528.
> Jan 14 14:40:34 RH53-IA64 kernel: sd 1:0:6:6: SCSI error: return code 
> = 0x00020000 Jan 14 14:40:34 RH53-IA64 kernel: end_request: I/O error, 
> dev sdtz, sector 97630204 Jan 14 14:40:34 RH53-IA64 kernel: sd 


Version-Release number of selected component (if applicable):
RH53RC2

How reproducible:
Occurs every time and continues as long as IO is running

Steps to Reproduce:
1. Run IO with no perturbations(Hazard C8)
2. DM failed paths reported in messages log

  
Actual results:
Unexpected path failure is seeing during the I/O.

Expected results:


Additional info:
Comment 1 Richard Wojdak 2009-01-16 11:49:26 EST
Created attachment 329227 [details]
Message file with failing path issue
Comment 2 Mike Christie 2009-01-16 14:22:49 EST
Is this something that you do not see in RHEL 5.2?

It looks like we get DID_BUS_BUSY which as you saw in the notes for 244967 is fast failed.

You are using Qlogic cards right? If so then we should speak to them about if the problem you are hitting can return DID_TRANSPORT_DISRUPTED instead of DID_BUS_BUSY.

Qlogic will probably need you to run this with extending logging on and then send those logs so they can see exactly why DID_BUS_BUSY is returned.
Comment 3 Richard Wojdak 2009-01-16 14:59:50 EST
HI,
   We are seeing the issue on both Qlogic and Emulex cards. The first log I posted was for QLOgic. I will post a log for Emulex. 

I contacted QLogic and will wait for their response


I notified Emulex of your comments and they are:
This response does not apply to Emulex. We updated our driver to  return DID_TRANSPORT_DISRUPTED in case of dropped frame any way.
Comment 4 Richard Wojdak 2009-01-16 15:01:13 EST
Created attachment 329248 [details]
Emulex Message file for DM failed paths
Comment 5 Richard Wojdak 2009-01-16 15:07:57 EST
Emulex has requested I open a seperate Bugzilla for their DM path failure issue. They have determined it is different than the QLogic issue:
Can we use this one just for QLogic?

From EMULEX:
I reviewed the log file for qlogic run. I am seeing frequent path failover due to DID_BUSY coming from Qlogic which is different than Emulex.

I believe you need to open a new bugzilla with Emulex log . We are encountering an issue where new retry DM logic does not retry in case of aborted command.
Comment 6 Richard Wojdak 2009-01-16 15:12:13 EST
I opened Bugzilla 480394 for the Emulex DM path failure issue
Comment 7 Mike Christie 2009-01-16 15:20:41 EST
Thanks.

In the emulex one we see rport-2:0-2: blocked FC remote port time out: saving binding, so if the rport is going to timeout changing qla2xxx to DID_TRANSPORT_DISRUPTED is not going to make a difference. It would only make a difference if we recovered before the rport timedout.

You can probably ask Qlogic to use the new values but for this it will not help. We probably need the extended logging info to see what is causing the problem in the first place.
Comment 8 Richard Wojdak 2009-01-21 14:57:10 EST
Created attachment 329645 [details]
QLogic serial console output with extended error logging enabled
Comment 9 IDima 2009-05-19 06:51:07 EDT
I've got the same trouble with QLE2462(fw 4.03.02, driver 8.02.00-k5-rhel5.2-04), CentOS5.2(kernel 2.6.18.92.1.18) and Xyratex 5412 storage controller.

Note You need to log in before you can comment on or make changes to this bug.