The test was carried on a 2.6.9-42.25 kernel with: - add-qla4xxx2.patch (taken from Mike Christie from BZ 180363: LTC17917-FEAT 7154: Qlogic iSCSI TOE adapters driver for Power). - qla3xxx.patch (pretty much a back-port from 2.6.19 of the driver - BZ # 209341: LTC27612-FEAT: 200804: Include the qla3xxx networking driver) - qla_fixes.patch (back-port of the patch from bugzilla # 215641: Patch to fix reset issue when ethernet interface is enabled). With the qla4xxx loaded and not doing anything, the machine panics when a ping test is performed. The patches and the kernel are all available at: http://www.darnok.org/qlogic/ Attached is the serial output that has: dmesg, cat of the test script, and the panic.
Created attachment 142295 [details] Serial output with dmesg, cat of test script, and the panic.
This is what the lspci tells me (this is a Compaq DL380 box with the iSCSI/NIC PCI card) 00:00.0 Host bridge: Broadcom CNB20LE Host Bridge (rev 05) Flags: bus master, medium devsel, latency 64 00:00.1 Host bridge: Broadcom CNB20LE Host Bridge (rev 05) Flags: bus master, medium devsel, latency 64 00:01.0 RAID bus controller: LSI Logic / Symbios Logic 53C1510 (rev 02) Subsystem: Compaq Computer Corporation Integrated Array Controller Flags: bus master, medium devsel, latency 192, IRQ 177 I/O ports at 2000 [size=256] Memory at c5000000 (32-bit, non-prefetchable) [size=16M] Memory at c4000000 (32-bit, non-prefetchable) [size=16M] Capabilities: [40] Power Management version 2 00:02.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08) Subsystem: Compaq Computer Corporation NC3163 Fast Ethernet NIC (embedded, WOL) Flags: bus master, medium devsel, latency 64, IRQ 185 Memory at c3fff000 (32-bit, non-prefetchable) [size=4K] I/O ports at 2400 [size=64] Memory at c3e00000 (32-bit, non-prefetchable) [size=1M] Capabilities: [dc] Power Management version 2 00:03.0 VGA compatible controller: ATI Technologies Inc 3D Rage IIC 215IIC [Mach64 GT IIC] (rev 7a) (prog-if 00 [VGA]) Subsystem: ATI Technologies Inc Rage IIC Flags: bus master, stepping, medium devsel, latency 64 Memory at c2000000 (32-bit, prefetchable) [size=16M] I/O ports at 2800 [size=256] Memory at c3dff000 (32-bit, non-prefetchable) [size=4K] Capabilities: [5c] Power Management version 1 00:04.0 System peripheral: Compaq Computer Corporation Advanced System Management Controller Subsystem: Compaq Computer Corporation: Unknown device b0f3 Flags: medium devsel, IRQ 193 I/O ports at 1800 [size=256] Memory at c3dfef00 (32-bit, non-prefetchable) [size=256] 00:0f.0 ISA bridge: Broadcom OSB4 South Bridge (rev 4f) Subsystem: Broadcom OSB4 South Bridge Flags: bus master, medium devsel, latency 0 00:0f.1 IDE interface: Broadcom OSB4 IDE Controller (prog-if 8a [Master SecP PriP]) Flags: bus master, medium devsel, latency 64 I/O ports at 2c00 [size=16] 03:06.0 Ethernet controller: QLogic Corp. QLA3022 Network Adapter (rev 03) Subsystem: QLogic Corp.: Unknown device 0123 Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 201 I/O ports at 3000 [size=256] Memory at c6fff000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Power Management version 2 Capabilities: [4c] PCI-X non-bridge device. Capabilities: [54] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- 03:06.1 Network controller: QLogic Corp. QLA4022 iSCSI TOE Adapter (rev 03) Subsystem: QLogic Corp.: Unknown device 0124 Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 209 I/O ports at 3400 [size=256] Memory at c6ffe000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Power Management version 2 Capabilities: [4c] PCI-X non-bridge device. Capabilities: [54] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Changing the title as the ql_process_mac_tx_intr is in the QLA3xxx code.
We are working on this issue.
is it possible give me the output of ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:02:A5:37:E6:70 inet addr:192.168.79.212 Bcast:192.168.79.255 Mask:255.255.252.0 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:15234 errors:0 dropped:0 overruns:0 frame:0 TX packets:56 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1224360 (1.1 MiB) TX bytes:5157 (5.0 KiB) eth1 Link encap:Ethernet HWaddr 00:C0:DD:08:76:0D inet addr:192.168.78.223 Bcast:192.168.79.255 Mask:255.255.252.0 inet6 addr: fe80::2c0:ddff:fe08:760d/64 Scope:Link UP BROADCAST RUNNING MTU:1500 Metric:1 RX packets:63549 errors:0 dropped:0 overruns:0 frame:0 TX packets:328 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:5589021 (5.3 MiB) TX bytes:46294 (45.2 KiB) Interrupt:201 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:35 errors:0 dropped:0 overruns:0 frame:0 TX packets:35 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:4540 (4.4 KiB) TX bytes:4540 (4.4 KiB) sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) [root@perf8 ~]#
I just noticed that the problem is with RHEL4.5. You should not be using the qla4xxx open-iscsi driver for this release. You need to use v5.00.04-d4 qla4xxx driver which is currently in RHEL4.4 along with patches which we are going to provide in the next few days.
David, The qla4xxx I am using is v5.00.04-d4. In what BZ are the patches you refer too?
Sorry for the confusion, you are using the correct qla4xxx driver. We haven't submitted the required patches yet and plan to do so in a few days.
Konrad, Can you increase debug output for the qla4xxx driver? I want to eliminate the possibility of a driver interaction problem. It can be done as below. # insmod ./qla4xxx.ko extended_error_logging=2 Regards, Ron Mercer (author of qla3xxx)
Created attachment 142635 [details] screen output when qla4xxx inits with extended debugging.
Created attachment 142636 [details] screen output when qla3xxx panics with qla4xxx with extended debugging.
Created attachment 142898 [details] Fixes panic caused by successive inbound ping frames. Please do the following as the patch doesn't contain pathing: # cd /usr/src/YourKernel/drivers/net # cp /path_to_this_patch/qla3xxx-v2.02.00-k37RH.patch . # patch -p1 < qla3xxx-v2.02.00-k37RH.patch rebuild the drivers/modules This is the RHEL04 qla3xxx network driver with changes to inbound completion handling. It passes Konrad's a.sh ping test in our lab.
Created attachment 142975 [details] Screen log of qla3xxx - v2.02.00-k37RH - crashing. I tested it with your patch and can still reproduce the error. Attached is the screenhost. I booted the kernel using 'selinux=0' so that you won't see all of those avc: denied { rawip_recv } for pid=6110 comm=... messages.
FYI: Here is the kernel: http://darnok.com/kernels/kernel-smp-2.6.9-42.25.EL_qla3xxx_qla4xxx_qla_fixes_5.i686.rpm and the src RPM: http://darnok.com/kernels/kernel-2.6.9-42.25.EL_qla3xxx_qla4xxx_qla_fixes_5.src.rpm
Perhaps I a missing a firmware update to the Qlogic card I have? What is the firmware version for the one that you are using in your lab?
I don't think it is a firmware issue. I was able to reproduce it with the initial driver and no qla4xxx loaded. I can no longer reproduce it with the latest driver. I will drop your kernel on my machine and continue.
Please comment out the following line in qla3xxx_probe() and run the test again: ndev->features = NETIF_F_LLTX; This is left over from previous code. It tell the net device layer that the driver is locking the tx queue. We removed the locking a few revs back, but never removed the flag. If the tests runs to completion I will issue another release.
Ron, That fixed it. The ping tests ran to completion. Thank you! I will try next week do some with exercising the iSCSI and the qla3xxx at the same time, if such configuration can be done.
Konrad, Excellent news. I have a couple of questions. First, I would like to update you with my latest driver that supports our 4032 chip. I have been pushing the changes to the upstream kernel, and will update you when they're accepted. Secondly, I was able to get your kernel running, but I am not able to build my drivers on it. Would it be possible to get the source code in a tarball so I can build my own image?
Ron, Sounds good. If you want you can just send me the qla3xxx.* files (with the support for the 4032 chipset) and I will replace the patches which I have been applying to RHEL4 U4 kernel which are: 1) http://darnok.com/qlogic/qla3xxx.patch 2) http://darnok.com/qlogic/qla_fixes.patch 3) http://darnok.com/qlogic/qla3xxx_fix.patch 4) http://darnok.com/qlogic/qla3xxx_fix_2.patch and instead have just one patch. The source ball, with all of these patches and the qla4xxx driver (without the patch for 4032 for support that Mike posted recently) is http://darnok.com/qlogic/linux-2.6.9-qla4xxx-qla3xxx-fixes_6.tgz
Ron, With regards to comment #21, is the patch #2) actually neccessary?
Ron, Just ignore my comment #22. Can you provide me with a qla3xxx.* files this week or should I go ahead with the composite patches (qla3xxx.patch, qla3xxx_fix.patch without the printk's, qla3xxx_fix_2.patch)? Thanks.
Konrad, I will update you with the two source files this week. I think that will be easier. It should be tomorrow or Wednesday. Regards, Ron
Created attachment 143562 [details] QLA3xxx driver from main-line with the two patches from this BZ. Ron, This is the patch that I will propose tomorrow unless you have a more updated one. We can always provide bug-fix updates to this driver after the dead-line.
Konrad, It is better to use your patch as the code I was planning to send you doesn't have enough testing done on it.
Created attachment 143586 [details] Adds 1us delay on NVRAM access.
Update. I ran the tests overnight with the qla4xxx in action (dd-ing a file to an iSCSI storage) and rmmod-ing/modprobing the qla3xxx driver and then followed by a ping test. The machine paniced with this: ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:199! invalid operand: 0000 [#1] SMP Modules linked in: qla3xxx qla4xxx md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc dm_multipath button battery ac e100 mii floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod cpqarray sd_mod scsi_mod CPU: 0 EIP: 0060:[<c02d3b90>] Not tainted VLI EFLAGS: 00010213 (2.6.9-42.25.EL_qla3xxx_qla4xxx_qla_fixes_8smp) EIP is at _read_lock+0x9/0x1d eax: c1a521f0 ebx: 00000000 ecx: f64fa8c0 edx: 0d0000e1 esi: c1a521e0 edi: 0d0000e1 ebp: 00000011 esp: c03cff00 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c03cf000 task=c0323a80) Stack: c02be1d3 f64fa8c0 c1a521e0 0d0000e1 f64fa8c0 f1b40000 c0295eae 00000011 004b0830 db4afec0 db4b0830 db4b0830 db4afec0 f1b40000 c0297ca3 00000000 f1b40000 db4afec0 c034f318 00000008 00000000 c0281255 db4afec0 00000001 Call Trace: [<c02be1d3>] ip_check_mc+0x1b/0x94 [<c0295eae>] ip_route_input+0xcd/0x16b [<c0297ca3>] ip_rcv+0x1d1/0x438 [<c0281255>] netif_receive_skb+0x2ac/0x2ec [<f8901892>] ql_process_macip_rx_intr+0x196/0x1ca [qla3xxx] [<f8901954>] ql_tx_rx_clean+0x8e/0x1ad [qla3xxx] [<f8901aa5>] ql_poll+0x32/0xa5 [qla3xxx] [<c0281440>] net_rx_action+0xae/0x160 [<c0126a08>] __do_softirq+0x4c/0xb1 [<c010819f>] do_softirq+0x4f/0x56 ======================= [<c0107ab4>] do_IRQ+0x1a2/0x1ae [<c02d5998>] common_interrupt+0x18/0x20 [<c0104018>] default_idle+0x0/0x2f [<c01e007b>] acpi_ex_load_table_op+0xcd/0x14f [<c0104041>] default_idle+0x29/0x2f [<c01040a0>] cpu_idle+0x26/0x3b [<c0396786>] start_kernel+0x199/0x19d Code: 5b c3 81 78 04 ed 1e af de 74 08 0f 0b cf 00 7d 69 2e c0 f0 81 28 00 00 00 01 74 05 e8 7a ea ff ff c3 81 78 04 ed 1e af de 74 08 <0f> 0b c7 00 7d 69 2e c0 f0 83 28 01 79 05 e8 7d ea ff ff c3 81 <0>Kernel panic - not syncing: Fatal exception in interrupt I will coordinate with Mike Christie to see if there is another bug for this or if we should open a new one as this one is fixed (pings don't panic the driver anymore).
Ron, I am testing with you patch from comment #27 along with an updated qla4xxx driver from Mark Chrisite. The drivers are at: http://darnok.com/qlogic/kernel-2.6.9-42.25.EL_qla3xxx_qla4xxx_qla_fixes_9.src.rpm So far the above panic in comment #28 is _not_ showing up.
Created attachment 143658 [details] QLA3xxx driver from main-line with the three patches from this BZ. Which is now part of this RPM: http://darnok.com/qlogic/kernel-2.6.9-42.25.EL_qla3xxx_qla4xxx_qla_fixes_9.src.rpm i686: http://darnok.com/qlogic/kernel-smp-2.6.9-42.25.EL_qla3xxx_qla4xxx_qla_fixes_9.i686.rpm and x86_64: http://darnok.com/qlogic/kernel-smp-2.6.9-42.25.EL_qla3xxx_qla4xxx_qla_fixes_9.x86_64.rpm
Created attachment 143659 [details] QLA3xxx driver from main-line with the three patches from this BZ. Whoops. Wrong file uploaded.
Konrad, The patch from #27 is a fix to prevent a problem seen on some platforms when the driver is loading. It just adds a delay to NVRAM access and wouldn't play a part in the panic from comment #28. The panic is in the kernel code when it's processing a multicast packet sent up by qla3xxx. The kernel is trying to take a lock that is in a structure that is not accessed by the driver. I am looking to see if there is anything the driver could have done to contribute to this, but it would be a good idea to have the kernel guys take a look too. Could I take a look at the test script? I will try to reproduce it here. I am guessing that it does not need the presence of the iSCSI driver for this to happen. Ron
Created attachment 143671 [details] Screen log. Here is the screen output along with the output of the test-scripts.
Two missing scripts from the screen output: [konrad@dhcp83-154 tmp]$ more c.sh #!/bin/bash cd /mnt/konrad-junk-ignore-it while (true) do ls -al uptime sleep 10 done [konrad@dhcp83-154 tmp]$ more a.sh #!/bin/bash COUNT=1 while (true) do expr $COUNT \> 65500 1>/dev/null if [ "$?" == 0 ]; then exit 0; fi COUNT=`expr $COUNT + 1` 1>/dev/null ping pbladem -c 5 -s$COUNT 1>/dev/null 2>/dev/null & done
I noticed several places in that qla3xxx patch where the driver sleeps with a spin lock held and interrupts off. Could you guys fix that up. Doing this fixes all the soft lockups I am seeing.
(In reply to comment #35) > I noticed several places in that qla3xxx patch where the driver sleeps with a > spin lock held and interrupts off. Could you guys fix that up. Doing this fixes > all the soft lockups I am seeing. Ignore the comment about fixing the soft lockups.
I openned another BZ: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=219783 ": qla3xxx panics in ql_process_macip_rx_intr" to track the panic from comment #28. I will close this BZ, as the initial problem (panic when sending pings) has been fixed.
Konrad, Have you pushed the fixes for this bug into RHEL5, or should we do it? Ron
Ron, I have not pushed the updates to RHEL5. Let me clone this BZ for RHEL5 and ask RH management to make an exception to get this in. Have the fixes that are in this BZ been pushed upstream? If so, do you have the GIT commit?
QE ack for RHEL4.5.
committed in stream U5 build 42.38. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html