From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.6) Gecko/20040116 Description of problem: My card is: 0f:01.0 Fibre Channel: QLogic Corp. QLA2300 64-bit FC-AL Adapter (rev 01) It is plugged into an Imperial Solid State Disk: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: IMPERIAL Model: MG-2000 Rev: A310 Type: Direct-Access ANSI SCSI revision: 03 firmware/driver version: Firmware version: 3.02.13, Driver version 6.06.00b11 With the qla2300 driver that comes with 2.4.21-9, I get terrible (~10MB/s) performance and these messages in syslog: scsi(0:0:0): QUEUE FULL status detected 0x1c-0x28, pid=342229. scsi(0:0:0): QUEUE FULL status detected 0x1c-0x28, pid=342230. When I use qlogic's latest (from qla2x00-v6.06.10-dist.tgz), I have no trouble at all. This system is an 8-CPU HP DL760. Version-Release number of selected component (if applicable): 2.4.21-9.0.1.ELhugemem How reproducible: Always Steps to Reproduce: 1. plug in stuff 2. load driver 3. access disk Additional info:
The SCSI queue depth is set to a higher value in our driver than it is in the driver from QLogic. This improves performance in most cases, but it is apparently over-running the queue in your storage device. Please look at dmesg or /var/log/messages and find the line like this that shows the queue depth setting: kernel: scsi(4:0:0:0): Enabled tagged queuing, queue depth 64. Please post this information for the disks that are getting the errors. The QLogic driver has a module load parameter that lets you adjust the queue depth. Please try: rmmod qla2300 modprobe qla2300 ql2xmaxqdepth=32 See if that fixes the problem. Thanks.
Neither "tagged" nor "depth" show up in syslog on this machine. However /proc/scsi/qla2300/0 says "Device queue depth = 0x20" for both drivers. Setting ql2xmaxqdepth to 32, 16, or 1 with the 2.4.21-9 qla2300 driver has no effect on the problem. Setting ql2xmaxqdepth to 64 with qlogic's qla2300 driver does not trigger the problem. Is the description for this parameter accurate? It says "Maximum queue depth to report for target devices," but the qla2300 driver is the initiator here, not the target. When using qlogic's driver I had been loading the qla2300_conf module as well, however loading it or not loading it doesn't affect this problem. The qla2300 driver that comes with 2.6.3-2.1.253.2.1custom works fine (though doesn't support the ql2xmaxqdepth option). Here's /proc/scsi/qla2300/0 for the 2.4.21-9 qla2300 driver (loaded with ql2xmaxqdepth=16) QLogic PCI to Fibre Channel Host Adapter for QLA2300/2310: Firmware version: 3.02.13, Driver version 6.06.00b11 Entry address = f8bda060 HBA: QLA2300 , Serial# J53908 Request Queue = 0xeb3d0000, Response Queue = 0xe1bc0000 Request Queue count= 512, Response Queue count= 512 Total number of active commands = 0 Total number of interrupts = 5 Total number of IOCBs (used/max) = (0/600) Total number of queued commands = 0 Device queue depth = 0x10 Number of free request entries = 503 Number of mailbox timeouts = 0 Number of ISP aborts = 0 Number of loop resyncs = 0 Number of retries for empty slots = 0 Number of reqs in pending_q= 0, retry_q= 0, done_q= 0, scsi_retry_q= 0 Host adapter:loop state= <READY>, flags= 0x48e0a13 Dpc flags = 0x0 MBX flags = 0x0 SRB Free Count = 4096 Link down Timeout = 030 Port down retry = 030 Login retry count = 030 Commands retried with dropped frame(s) = 0 SCSI Device Information: scsi-qla0-adapter-node=200000e08b0e348e; scsi-qla0-adapter-port=210000e08b0e348e; scsi-qla0-target-0=20020002340000d6; SCSI LUN Information: (Id:Lun) * - indicates lun is not registered with the OS. ( 0: 0): Total reqs 8, Pending reqs 0, flags 0x0, 0:0:02, Here's /proc/scsi/qla2300/0 for qlogic's driver: (loaded with ql2xmaxqdepth=64) QLogic PCI to Fibre Channel Host Adapter for QLA2300/2310: Firmware version: 3.02.16, Driver version 6.06.10 Entry address = f8c2a060 HBA: QLA2300 , Serial# J53908 Request Queue = 0xe761c000, Response Queue = 0xeb7d0000 Request Queue count= 128, Response Queue count= 512 Total number of active commands = 0 Total number of interrupts = 284586 Total number of IOCBs (used/max) = (0/600) Total number of queued commands = 0 Device queue depth = 0x40 Number of free request entries = 59 Number of mailbox timeouts = 0 Number of ISP aborts = 0 Number of loop resyncs = 0 Number of retries for empty slots = 0 Number of reqs in pending_q= 0, retry_q= 0, done_q= 0, scsi_retry_q= 0 Host adapter:loop state= <READY>, flags= 0x8e0a13 Dpc flags = 0x0 MBX flags = 0x0 SRB Free Count = 4096 Link down Timeout = 030 Port down retry = 030 Login retry count = 030 Commands retried with dropped frame(s) = 0 SCSI Device Information: scsi-qla0-adapter-node=200000e08b0e348e; scsi-qla0-adapter-port=210000e08b0e348e; scsi-qla0-target-0=20020002340000d6; SCSI LUN Information: (Id:Lun) * - indicates lun is not registered with the OS. ( 0: 0): Total reqs 420115, Pending reqs 0, flags 0x0, 0:0:02, Also, I got this panic several times after doing rmmod qla2300; modprobe qla2300. I'm inclined to ignore it since I've never considered rmmod to be safe. However I'm including it in case it is relevant. wait_on_irq, CPU 3: irq: 1 [ 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ] bh: 1 [ 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 ] Stack dumps: CPU 0:<1>Unable to handle kernel paging request at virtual address fffd4900 printing eip: 0210c6a4 *pde = 00000000 Oops: 0000 qla2300 tun nfs lockd sunrpc lp parport autofs e1000 floppy sg microcode ide-cd cdrom keybdev mousedev hid input usb-ohci usbcore ext3 jbd lvm-mod cciss sd_mo CPU: 3 EIP: 0060:[<0210c6a4>] Not tainted EFLAGS: 00010046 EIP is at show_stack [kernel] 0x44 (2.4.21-9.0.1.ELhugemem/i686) eax: fffd4900 ebx: fffd4900 ecx: 00000020 edx: 00000080 esi: 00000000 edi: 00000001 ebp: 024eb730 esp: 1c10be70 ds: 0068 es: 0068 ss: 0068 Process swapper (pid: 0, stackpage=1c10b000) Stack: 1c10be8c 00000000 00000000 00000003 0210dbe9 fffd4900 00000000 00000003 ffffffff 00000003 0210ebd8 022ad358 00000003 00000004 0210dd02 00000003 efc6de80 021c2d6b 02436080 00000003 1c10bf00 021c2d20 024eb300 02380180 Call Trace: [<0210dbe9>] show [kernel] 0x139 (0x1c10be80) [<0210ebd8>] wait_on_irq [kernel] 0xf8 (0x1c10be98) [<0210dd02>] __global_cli [kernel] 0x62 (0x1c10bea8) [<021c2d6b>] rs_timer [kernel] 0x4b (0x1c10beb4) [<021c2d20>] rs_timer [kernel] 0x0 (0x1c10bec4) [<021346a5>] __run_timers [kernel] 0xb5 (0x1c10bed4) [<021344e2>] timer_bh [kernel] 0x62 (0x1c10bf00) [<0212f274>] bh_action [kernel] 0x54 (0x1c10bf14) [<0212f112>] tasklet_hi_action [kernel] 0x62 (0x1c10bf1c) [<0212eed5>] do_softirq [kernel] 0xd5 (0x1c10bf34) [<0210e146>] do_IRQ [kernel] 0x146 (0x1c10bf50) [<0210e000>] do_IRQ [kernel] 0x0 (0x1c10bf74) [<02109100>] default_idle [kernel] 0x0 (0x1c10bf7c) [<02109129>] default_idle [kernel] 0x29 (0x1c10bfa4) [<021091c2>] cpu_idle [kernel] 0x42 (0x1c10bfb0) [<02128e21>] printk [kernel] 0x141 (0x1c10bfcc) Code: 8b 03 46 83 c3 04 c7 04 24 16 d1 2a 02 89 44 24 04 e8 26 c6 Kernel panic: Fatal exception In interrupt handler - not syncing
> Is the description for this parameter accurate? It says "Maximum queue > depth to report for target devices," but the qla2300 driver is the > initiator here, not the target. This parameter is the max number of commands that the SCSI midlayer should queue to the SCSI target. It is reported to the midlayer by the qla2300 driver, on behalf of the SCSI targets it controls. Reducing this parameter _should_ reduce the occurrence of queue full errors. We have updated the QLogic driver in RHEL 3 U2 (a variant of QLogic's 6.07). We have also removed a patch from the I/O subsystem that was causing performance problems (see bugzilla 104633). I expect that this will fix your problem. Please try the RHEL 3 U2 beta when it becomes available.
Any updates? Have you re-tested with U2 (or U3 beta)? U3 final will ship in the next week or two, with yet another QLogic driver update.
This has been NEEDINFO for four months. Please re-open if the problem still exists in a recent RHEL 3 update (U4 shipped this week).
The RHEL3 U4 kernel advisory is RHBA-2004:550.