Bug 223198

Summary: qla2400 Failed to load segment 0 of firmware
Product: Red Hat Enterprise Linux 4 Reporter: Didier Belhomme <didier.belhomme>
Component: kernelAssignee: Andrew Vasquez <andrew.vasquez>
Status: CLOSED CANTFIX QA Contact: Brian Brock <bbrock>
Severity: urgent Docs Contact:
Priority: medium    
Version: 4.4CC: andriusb, coughlan, didier.belhomme, dwa, mbarrow, qlogic-redhat-ext
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-01-19 22:37:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 216986    
Attachments:
Description Flags
dmesg output
none
/var/log/messages file none

Description Didier Belhomme 2007-01-18 11:26:10 UTC
Our system (Sun Fire X4200) is connected through two Sun FC-AL 4gbs cards (OEM
of Qlogic 2460) and 2 Brocade 200E switches to a Storage array Sun Storagetek
6340. We are experiencing, after a while and during high IO activity, some
disconnect from the SAN like reported in /var/log/messages :

Jan 18 09:58:22 hector kernel: qla2400 0000:05:01.0: ISP Request Transfer Error.
Jan 18 09:58:22 hector kernel: qla2400 0000:05:02.0: ISP Request Transfer Error.
Jan 18 09:58:22 hector kernel: qla2400 0000:05:01.0: Performing ISP error
recovery - ha= 00000101fbd983c8.
Jan 18 09:58:22 hector kernel: qla2400 0000:05:02.0: Performing ISP error
recovery - ha= 00000101fbed03c8.
Jan 18 09:58:52 hector kernel: qla2400 0000:05:01.0: [ERROR] Failed to load
segment 0 of firmware
Jan 18 09:58:52 hector kernel: Mailbox registers:
Jan 18 09:58:52 hector kernel: scsi(1): mbox 0 0x0000
Jan 18 09:58:52 hector kernel: scsi(1): mbox 1 0x0000
Jan 18 09:58:52 hector kernel: scsi(1): mbox 2 0x0001
Jan 18 09:58:52 hector kernel: scsi(1): mbox 3 0x4000
Jan 18 09:58:52 hector kernel: scsi(1): mbox 4 0x0040
Jan 18 09:58:52 hector kernel: scsi(1): mbox 5 0x0000
Jan 18 09:58:52 hector kernel: qla2400 0000:05:02.0: [ERROR] Failed to load
segment 0 of firmware
Jan 18 09:58:52 hector kernel: Mailbox registers:
Jan 18 09:58:52 hector kernel: scsi(2): mbox 0 0x0000
Jan 18 09:58:52 hector kernel: scsi(2): mbox 1 0x0000
Jan 18 09:58:52 hector kernel: scsi(2): mbox 2 0x0001
Jan 18 09:58:52 hector kernel: scsi(2): mbox 3 0x4000
Jan 18 09:58:52 hector kernel: scsi(2): mbox 4 0x0040
Jan 18 09:58:52 hector kernel: scsi(2): mbox 5 0x0000
Jan 18 09:59:22 hector kernel: qla2400 0000:05:01.0: [ERROR] Failed to load
segment 0 of firmware
Jan 18 09:59:22 hector kernel: Mailbox registers:
Jan 18 09:59:22 hector kernel: scsi(1): mbox 0 0x0000
Jan 18 09:59:22 hector kernel: scsi(1): mbox 1 0x0000
Jan 18 09:59:22 hector kernel: scsi(1): mbox 2 0x0001
Jan 18 09:59:22 hector kernel: scsi(1): mbox 3 0x4000
Jan 18 09:59:22 hector kernel: scsi(1): mbox 4 0x0040
Jan 18 09:59:22 hector kernel: scsi(1): mbox 5 0x0000

The problem is reported on BOTH the card, that is preventing any failover (mpp
driver).

Jan 18 10:00:56 hector kernel: 94 [RAIDarray.mpp]SAN1:1:0:0 Selection Retry
count exhausted
Jan 18 10:00:56 hector kernel: 7 [RAIDarray.mpp]SAN1:1:0 Path Failed
Jan 18 10:00:56 hector kernel: 495 [RAIDarray.mpp]SAN1:1:0:0 Cmnd failed-retry
on a new path. vcmnd SN 25039014 pdev H1:C0:T1:L0 0x00/0x00/0x00 0x00010000
mpp_statu
Jan 18 10:00:56 hector kernel: 94 [RAIDarray.mpp]SAN1:1:0:0 Selection Retry
count exhausted
Jan 18 10:00:56 hector kernel: 495 [RAIDarray.mpp]SAN1:1:0:0 Cmnd failed-retry
on a new path. vcmnd SN 25039019 pdev H1:C0:T1:L0 0x00/0x00/0x00 0x00010000
mpp_statu
Jan 18 10:00:56 hector kernel: 94 [RAIDarray.mpp]SAN1:1:0:0 Selection Retry
count exhausted
Jan 18 10:00:56 hector kernel: 495 [RAIDarray.mpp]SAN1:1:0:0 Cmnd failed-retry
on a new path. vcmnd SN 25039022 pdev H1:C0:T1:L0 0x00/0x00/0x00 0x00010000
mpp_statu
Jan 18 10:00:56 hector kernel: 94 [RAIDarray.mpp]SAN1:1:0:0 Selection Retry
count exhausted
Jan 18 10:00:56 hector kernel: 495 [RAIDarray.mpp]SAN1:1:0:0 Cmnd failed-retry
on a new path. vcmnd SN 25039027 pdev H1:C0:T1:L0 0x00/0x00/0x00 0x00010000
mpp_statu
Jan 18 10:00:56 hector kernel: 94 [RAIDarray.mpp]SAN1:1:0:0 Selection Retry
count exhausted
Jan 18 10:00:56 hector kernel: 495 [RAIDarray.mpp]SAN1:1:0:0 Cmnd failed-retry
on a new path. vcmnd SN 25039030 pdev H1:C0:T1:L0 0x00/0x00/0x00 0x00010000
mpp_statu
Jan 18 10:00:56 hector kernel: 94 [RAIDarray.mpp]SAN1:1:0:0 Selection Retry
count exhausted
Jan 18 10:00:56 hector kernel: 495 [RAIDarray.mpp]SAN1:1:0:0 Cmnd failed-retry
on a new path. vcmnd SN 25039035 pdev H1:C0:T1:L0 0x00/0x00/0x00 0x00010000
mpp_statu
Jan 18 10:00:56 hector kernel: 94 [RAIDarray.mpp]SAN1:1:0:0 Selection Retry
count exhausted
Jan 18 10:00:56 hector kernel: 495 [RAIDarray.mpp]SAN1:1:0:0 Cmnd failed-retry
on a new path. vcmnd SN 25039039 pdev H1:C0:T1:L0 0x00/0x00/0x00 0x00010000
mpp_statu
Jan 18 10:00:56 hector kernel: 94 [RAIDarray.mpp]SAN1:1:0:0 Selection Retry
count exhausted
Jan 18 10:00:56 hector kernel: 495 [RAIDarray.mpp]SAN1:1:0:0 Cmnd failed-retry
on a new path. vcmnd SN 25039043 pdev H1:C0:T1:L0 0x00/0x00/0x00 0x00010000
mpp_statu
Jan 18 10:00:56 hector kernel: 94 [RAIDarray.mpp]SAN1:1:0:0 Selection Retry
count exhausted
Jan 18 10:00:56 hector kernel: st: I/O error, dev sdb, sector 485504928
Jan 18 10:00:56 hector kernel: SCSI error : <3 0 0 0> return code = 0x10000
Jan 18 10:00:56 hector kernel: end_request: I/O error, dev sdb, sector 485502648
Jan 18 10:00:56 hector kernel: SCSI error : <3 0 0 0> return code = 0x10000
Jan 18 10:00:56 hector kernel: end_request: I/O error, dev sdb, sector 485505944
Jan 18 10:00:56 hector kernel: SCSI error : <3 0 0 0> return code = 0x10000
Jan 18 10:00:56 hector kernel: end_request: I/O error, dev sdb, sector 485503664
Jan 18 10:00:56 hector kernel: SCSI error : <3 0 0 0> return code = 0x10000
Jan 18 10:00:56 hector kernel: end_request: I/O error, dev sdb, sector 485498568
Jan 18 10:00:56 hector kernel: SCSI error : <3 0 0 0> return code = 0x10000
Jan 18 10:00:56 hector kernel: end_request: I/O error, dev sdb, sector 485496440

Of course, after a while, the device is in error (so ext3 journal aborting, etc).

Comment 1 Didier Belhomme 2007-01-18 11:26:11 UTC
Created attachment 145904 [details]
dmesg output

Comment 2 Didier Belhomme 2007-01-18 11:26:43 UTC
Created attachment 145905 [details]
/var/log/messages file

Comment 3 Didier Belhomme 2007-01-19 08:30:46 UTC
I have to say that using the recommended driver downloaded from Qlogic (as
indicated in documentation from Sun Microsystems), I keep getting slightly
differents errors. I've reverted to "standard" driver from the kernel in order
to simplify the update. The file downloaded from QLogic is
qla2xxx-v8.01.06-dist.tgz.

Comment 4 Andrew Vasquez 2007-01-19 17:30:57 UTC
This issue has been reported to QLogic by Sun and its customers.
The issue stems from this platforms (x4200) inability to support 
modifications to the PCI Max-Memory-Read-Byte-Count.

I can also see that the card is connected into one of the host's
66mhz slot:

 QLogic Fibre Channel HBA Driver: 8.01.04-d7
  QLogic QLA2460 - Sun PCI-X 2.0 to 4Gb FC, Single Channel
  ISP2422: PCI-X Mode 1 (66 MHz) @ 0000:05:01.0 hdma+, host#=1, fw=4.00.18 [IP] 


A potential workaround for this issue is to place the HBA in a
133MHZ slot.  Beyond that, I'd suggest the customer work directly
with Sun.



Comment 5 Didier Belhomme 2007-01-19 20:15:07 UTC
The Sun X4200 does have 3 PCI-X 66MHz slots, 1 133MHz and 1 100MHz. Since I have
2 cards to connect (in order to introduce redundancy in the SAN connection), I
can put one in a 133 MHz slot and the other in the 100MHz slot. Do you think
that workaround could work ?

Meanwhile, I'll report the problem to Sun.

And thanks to Andrew for the fast reply.

Comment 6 Andrew Vasquez 2007-01-19 20:54:17 UTC
We've only seen the issue when FC HBA cards are attached to
the 66Mhz slots.