119702 – qla2300 driver broken

Bug 119702 - qla2300 driver broken

Summary: qla2300 driver broken

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Tom Coughlan
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-04-01 16:43 UTC by Chuck Berg
Modified:	2007-11-30 22:07 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-12-22 13:15:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Chuck Berg 2004-04-01 16:43:16 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.6)
Gecko/20040116

Description of problem:
My card is:
0f:01.0 Fibre Channel: QLogic Corp. QLA2300 64-bit FC-AL Adapter (rev 01)
It is plugged into an Imperial Solid State Disk:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: IMPERIAL Model: MG-2000          Rev: A310
  Type:   Direct-Access                    ANSI SCSI revision: 03

firmware/driver version:
        Firmware version:  3.02.13, Driver version 6.06.00b11

With the qla2300 driver that comes with 2.4.21-9, I get terrible
(~10MB/s) performance and these messages in syslog:

scsi(0:0:0): QUEUE FULL status detected 0x1c-0x28, pid=342229.
scsi(0:0:0): QUEUE FULL status detected 0x1c-0x28, pid=342230.

When I use qlogic's latest (from qla2x00-v6.06.10-dist.tgz), I have no
trouble at all.

This system is an 8-CPU HP DL760.

Version-Release number of selected component (if applicable):
2.4.21-9.0.1.ELhugemem

How reproducible:
Always

Steps to Reproduce:
1. plug in stuff
2. load driver
3. access disk
    

Additional info:

Comment 1 Tom Coughlan 2004-04-02 13:49:28 UTC

The SCSI queue depth is set to a higher value in our driver than it is
in the driver from QLogic.  This improves performance in most cases,
but it is apparently over-running the queue in your storage device.

Please look at dmesg or /var/log/messages and find the line like this
that shows the queue depth setting:

kernel: scsi(4:0:0:0): Enabled tagged queuing, queue depth 64.

Please post this information for the disks that are getting the errors.

The QLogic driver has a module load parameter that lets you adjust the
queue depth. Please try:

rmmod qla2300
modprobe qla2300 ql2xmaxqdepth=32

See if that fixes the problem.

Thanks.

Comment 2 Chuck Berg 2004-04-05 22:27:28 UTC

Neither "tagged" nor "depth" show up in syslog on this machine.
However /proc/scsi/qla2300/0 says "Device queue depth = 0x20" for both
drivers. 

Setting ql2xmaxqdepth to 32, 16, or 1 with the 2.4.21-9 qla2300 driver
has no effect on the problem.

Setting ql2xmaxqdepth to 64 with qlogic's qla2300 driver does not
trigger the problem.

Is the description for this parameter accurate? It says "Maximum queue
depth to report for target devices," but the qla2300 driver is the
initiator here, not the target.

When using qlogic's driver I had been loading the qla2300_conf module
as well, however loading it or not loading it doesn't affect this problem.

The qla2300 driver that comes with 2.6.3-2.1.253.2.1custom works fine
(though doesn't support the ql2xmaxqdepth option).

Here's /proc/scsi/qla2300/0 for the 2.4.21-9 qla2300 driver (loaded
with ql2xmaxqdepth=16)
QLogic PCI to Fibre Channel Host Adapter for QLA2300/2310:
        Firmware version:  3.02.13, Driver version 6.06.00b11
Entry address = f8bda060
HBA: QLA2300 , Serial# J53908
Request Queue = 0xeb3d0000, Response Queue = 0xe1bc0000
Request Queue count= 512, Response Queue count= 512
Total number of active commands = 0
Total number of interrupts = 5
Total number of IOCBs (used/max) = (0/600)
Total number of queued commands = 0
    Device queue depth = 0x10
Number of free request entries = 503
Number of mailbox timeouts = 0
Number of ISP aborts = 0
Number of loop resyncs = 0
Number of retries for empty slots = 0
Number of reqs in pending_q= 0, retry_q= 0, done_q= 0, scsi_retry_q= 0
Host adapter:loop state= <READY>, flags= 0x48e0a13
Dpc flags = 0x0
MBX flags = 0x0
SRB Free Count = 4096
Link down Timeout = 030
Port down retry = 030
Login retry count = 030
Commands retried with dropped frame(s) = 0


SCSI Device Information:
scsi-qla0-adapter-node=200000e08b0e348e;
scsi-qla0-adapter-port=210000e08b0e348e;
scsi-qla0-target-0=20020002340000d6;

SCSI LUN Information:
(Id:Lun)  * - indicates lun is not registered with the OS.
( 0: 0): Total reqs 8, Pending reqs 0, flags 0x0, 0:0:02,

Here's /proc/scsi/qla2300/0 for qlogic's driver: (loaded with
ql2xmaxqdepth=64)
QLogic PCI to Fibre Channel Host Adapter for QLA2300/2310:
        Firmware version:  3.02.16, Driver version 6.06.10
Entry address = f8c2a060
HBA: QLA2300 , Serial# J53908
Request Queue = 0xe761c000, Response Queue = 0xeb7d0000
Request Queue count= 128, Response Queue count= 512
Total number of active commands = 0
Total number of interrupts = 284586
Total number of IOCBs (used/max) = (0/600)
Total number of queued commands = 0
    Device queue depth = 0x40
Number of free request entries = 59
Number of mailbox timeouts = 0
Number of ISP aborts = 0
Number of loop resyncs = 0
Number of retries for empty slots = 0
Number of reqs in pending_q= 0, retry_q= 0, done_q= 0, scsi_retry_q= 0
Host adapter:loop state= <READY>, flags= 0x8e0a13
Dpc flags = 0x0
MBX flags = 0x0
SRB Free Count = 4096
Link down Timeout = 030
Port down retry = 030
Login retry count = 030
Commands retried with dropped frame(s) = 0


SCSI Device Information:
scsi-qla0-adapter-node=200000e08b0e348e;
scsi-qla0-adapter-port=210000e08b0e348e;
scsi-qla0-target-0=20020002340000d6;

SCSI LUN Information:
(Id:Lun)  * - indicates lun is not registered with the OS.
( 0: 0): Total reqs 420115, Pending reqs 0, flags 0x0, 0:0:02,

Also, I got this panic several times after doing rmmod qla2300;
modprobe qla2300. I'm inclined to ignore it since I've never
considered rmmod to be safe. However I'm including it in case it is
relevant.

wait_on_irq, CPU 3:
irq:  1 [ 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ]
bh:   1 [ 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 ]
Stack dumps:
CPU 0:<1>Unable to handle kernel paging request at virtual address
fffd4900
 printing eip:
0210c6a4
*pde = 00000000
Oops: 0000
qla2300 tun nfs lockd sunrpc lp parport autofs e1000 floppy sg
microcode ide-cd cdrom keybdev mousedev hid input usb-ohci usbcore
ext3 jbd lvm-mod cciss sd_mo
CPU:    3
EIP:    0060:[<0210c6a4>]    Not tainted
EFLAGS: 00010046

EIP is at show_stack [kernel] 0x44 (2.4.21-9.0.1.ELhugemem/i686)
eax: fffd4900   ebx: fffd4900   ecx: 00000020   edx: 00000080
esi: 00000000   edi: 00000001   ebp: 024eb730   esp: 1c10be70
ds: 0068   es: 0068   ss: 0068
Process swapper (pid: 0, stackpage=1c10b000)
Stack: 1c10be8c 00000000 00000000 00000003 0210dbe9 fffd4900 00000000
00000003 
       ffffffff 00000003 0210ebd8 022ad358 00000003 00000004 0210dd02
00000003 
       efc6de80 021c2d6b 02436080 00000003 1c10bf00 021c2d20 024eb300
02380180 
Call Trace:   [<0210dbe9>] show [kernel] 0x139 (0x1c10be80)
[<0210ebd8>] wait_on_irq [kernel] 0xf8 (0x1c10be98)
[<0210dd02>] __global_cli [kernel] 0x62 (0x1c10bea8)
[<021c2d6b>] rs_timer [kernel] 0x4b (0x1c10beb4)
[<021c2d20>] rs_timer [kernel] 0x0 (0x1c10bec4)
[<021346a5>] __run_timers [kernel] 0xb5 (0x1c10bed4)
[<021344e2>] timer_bh [kernel] 0x62 (0x1c10bf00)
[<0212f274>] bh_action [kernel] 0x54 (0x1c10bf14)
[<0212f112>] tasklet_hi_action [kernel] 0x62 (0x1c10bf1c)
[<0212eed5>] do_softirq [kernel] 0xd5 (0x1c10bf34)
[<0210e146>] do_IRQ [kernel] 0x146 (0x1c10bf50)
[<0210e000>] do_IRQ [kernel] 0x0 (0x1c10bf74)
[<02109100>] default_idle [kernel] 0x0 (0x1c10bf7c)
[<02109129>] default_idle [kernel] 0x29 (0x1c10bfa4)
[<021091c2>] cpu_idle [kernel] 0x42 (0x1c10bfb0)
[<02128e21>] printk [kernel] 0x141 (0x1c10bfcc)

Code: 8b 03 46 83 c3 04 c7 04 24 16 d1 2a 02 89 44 24 04 e8 26 c6

Kernel panic: Fatal exception
In interrupt handler - not syncing

Comment 3 Tom Coughlan 2004-04-06 16:09:36 UTC

> Is the description for this parameter accurate? It says "Maximum queue
> depth to report for target devices," but the qla2300 driver is the
> initiator here, not the target.

This parameter is the max number of commands that the SCSI midlayer
should queue to the SCSI target.  It is reported to the midlayer by
the qla2300 driver, on behalf of the SCSI targets it controls.
Reducing this parameter _should_ reduce the occurrence of queue full
errors.

We have updated the QLogic driver in RHEL 3 U2 (a variant of QLogic's
6.07).  We have also removed a patch from the I/O subsystem that was
causing performance problems (see bugzilla 104633).  I expect that
this will fix your problem.  Please try the RHEL 3 U2 beta when it
becomes available.

Comment 4 Tom Coughlan 2004-08-27 16:09:29 UTC

Any updates? Have you re-tested with U2 (or U3 beta)? U3 final will
ship in the next week or two, with yet another QLogic driver update.

Comment 5 Tom Coughlan 2004-12-22 13:15:19 UTC

This has been NEEDINFO for four months.  Please re-open if the problem
still exists in a recent RHEL 3 update (U4 shipped this week).

Comment 6 Ernie Petrides 2004-12-23 01:21:53 UTC

The RHEL3 U4 kernel advisory is RHBA-2004:550.

Note You need to log in before you can comment on or make changes to this bug.