Description of problem: When setting jumbo frames on an HS21 IBM Blade using the bnx2 driver (1.7.9-2) with this Broadcom NIC (04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)) a kernel panic occurs. Steps to Reproduce: I have reproduced this in the GSS lab on machine: ibm-hs21-7995-2.gsslab.rdu.redhat.com 1. Boot into 2.6.9-89 kernel 2. run: # ifconfig eth0 mtu 9000 3. in 1-3 minutes the box kernel panics. general protection fault: 0000 [1] SMP CPU 0 Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yent a_socket pcmcia_core cpufreq_powersave ib_srp ib_sdp ib_ipoib inet_lro rdma_ucm rdma_cm iw_cm ib_addr ib_umad ib_ucm ib_uverbs ib_cm ib_sa ib_mad ib_core ide_du mp scsi_dump diskdump zlib_deflate dm_mirror dm_mod button battery ac md5 ipv6 i 5000_edac edac_mc hw_random bnx2 ext3 jbd qla2400 ata_piix libata qla2xxx scsi_t ransport_fc usb_storage uhci_hcd ohci_hcd ehci_hcd sd_mod scsi_mod Pid: 0, comm: swapper Not tainted 2.6.9-89.ELsmp RIP: 0010:[<ffffffff802b3d31>] <ffffffff802b3d31>{skb_drop_list+14} RSP: 0018:ffffffff8046db08 EFLAGS: 00010202 RAX: 0000010037d10500 RBX: 4c43502c434c4d2c RCX: 000001000000ea90 RDX: 0000010037d10500 RSI: 0000000000000000 RDI: 4c43502c434c4d2c RBP: 0000010122e36040 R08: 0000010129dc9000 R09: 000000011f040000 R10: 000001019f040000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000ab01ab R14: 0000000000000000 R15: 000000000000ab01 FS: 0000000000000000(0000) GS:ffffffff80504500(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000002a9556c000 CR3: 0000000000101000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffffffff80508000, task ffffffff803e1f00) Stack: 0000000000000001 ffffffff802b3dc7 0000000000000001 0000010122e36040 0000010122e36040 ffffffff802b3c0b 000000000000013f ffffffff802f3c4e 3d65640a3d65640a e602640a010000e0 Call Trace:<IRQ> <ffffffff802b3dc7>{skb_release_data+106} <ffffffff802b3c0b>{kfr ee_skbmem+9} <ffffffff802f3c4e>{udp_rcv+1042} <ffffffff802d2bed>{ip_local_deliver+298} <ffffffff802d3386>{ip_rcv+1046} <ffffffff802b9820>{netif_receive_skb+957} <ffffffffa012a07c>{:bnx2:bnx2_poll+4765} <ffffffff801340e4>{rebalance_tic k+133} <ffffffff80132c9f>{activate_task+124} <ffffffff802b9a44>{net_rx_action+20 8} <ffffffff8013d864>{__do_softirq+88} <ffffffff8013d90d>{do_softirq+49} <ffffffff801132f3>{do_IRQ+328} <ffffffff801108c3>{ret_from_intr+0} <EOI> <ffffffff8010e88c>{mwait_idle+86} <ffffffff8010e81c>{cpu_idle+26} Code: 48 8b 1b 8b 87 e8 00 00 00 ff c8 75 05 0f ae e8 eb 0e f0 ff RIP <ffffffff802b3d31>{skb_drop_list+14} RSP <ffffffff8046db08> This is fixed in RHEL5.4/bz#475567. The patch 0011-bnx2-Update-5706-5708-firmware.patch is enough to fix this bug on RHEL4u8. However, the patch 0012-bnx2-Eliminate-TSO-header-modifications.patch needs to applied together due the driver no longer has to modify the TCP/IP header fields when transmitting TSO packets. Brew build of testing package: https://brewweb.devel.redhat.com/taskinfo?taskID=1829611 Feedback: Those two patches works on an in-house system. Customer gave good feedback too.
Created attachment 360222 [details] bnx2-firmware-update.patch Attaching the tested patch.
we are getting problems also with kernel -89.0.9 in a HP DL585G2, which has a broadcom NIC: Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02) after some "heavy" traffic the network stop working. IF we restart the service, it will work again for a little while. The server do not panic and I am able to log into iLo console and check the stats. So far, we can see a lot of drops while working and then it stop when errors start reporting. for example: eth0 Link encap:Ethernet HWaddr 00:1B:78:BE:E8:5C inet addr:169.185.XXX.YYY Bcast:169.185.XXX.YYY Mask:255.255.255.128 UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1 RX packets:314613 errors:74 dropped:5503 overruns:0 frame:74 TX packets:188633 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:459977654 (438.6 MiB) TX bytes:45648574 (43.5 MiB) Interrupt:209 Memory:dc000000-dc012100 the problem disappear when MTU is set to 1500, or when we roll back to kernel -78.0.22 this is very easy to replicate in our environment, just with when creating a big tar file in local disk from a NFS volumen. wondering if you have a kernel to test with your patches included.
I built rpms for x86_64 and i686 that have the firmware updated and the patch to bnx2.c as found in comment #1. Please see my people page http://people.redhat.com/jfeeney/.rhel4-bnx2/ Note that the firmware included in these rpms is a newer version than what comment #1 specifies (patch has version 4.6.16 and comment #1 has 4.4.2). I would appreciate it if this could be tested and the results reported back here. Thanks.
sure John but would be possible to have the kernel-smp too? since we like to actually test the kernel with all 8 cpu availables thanks
Okay, now the smps are on my people page too. Thanks.
2.6.9-89.14.EL.jfeeney.522119smp works on ibm-hs21-7995-2. Flavio
Thank you, Flavio for the update. With this news, I am going to close this bz as a duplicate of bz523691 since it has the same fix as you provided.
*** This bug has been marked as a duplicate of bug 523691 ***