Bug 522119
| Summary: | bnx2 driver with jumbo frames (MTU 9000) enabled causes kernel panic on IBM Blade HS21 with RHEL4.8 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Flavio Leitner <fleitner> | ||||
| Component: | kernel | Assignee: | John Feeney <jfeeney> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 4.8 | CC: | orkcu, peterm, tao | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2009-11-12 16:04:25 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 360222 [details]
bnx2-firmware-update.patch
Attaching the tested patch.
we are getting problems also with kernel -89.0.9 in a HP DL585G2, which has a broadcom NIC:
Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02)
after some "heavy" traffic the network stop working. IF we restart the service, it will work again for a little while. The server do not panic and I am able to log into iLo console and check the stats.
So far, we can see a lot of drops while working and then it stop when errors start reporting.
for example:
eth0 Link encap:Ethernet HWaddr 00:1B:78:BE:E8:5C
inet addr:169.185.XXX.YYY Bcast:169.185.XXX.YYY Mask:255.255.255.128
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:314613 errors:74 dropped:5503 overruns:0 frame:74
TX packets:188633 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:459977654 (438.6 MiB) TX bytes:45648574 (43.5 MiB)
Interrupt:209 Memory:dc000000-dc012100
the problem disappear when MTU is set to 1500, or when we roll back to kernel -78.0.22
this is very easy to replicate in our environment, just with when creating a big tar file in local disk from a NFS volumen.
wondering if you have a kernel to test with your patches included.
I built rpms for x86_64 and i686 that have the firmware updated and the patch to bnx2.c as found in comment #1. Please see my people page http://people.redhat.com/jfeeney/.rhel4-bnx2/ Note that the firmware included in these rpms is a newer version than what comment #1 specifies (patch has version 4.6.16 and comment #1 has 4.4.2). I would appreciate it if this could be tested and the results reported back here. Thanks. sure John but would be possible to have the kernel-smp too? since we like to actually test the kernel with all 8 cpu availables thanks Okay, now the smps are on my people page too. Thanks. 2.6.9-89.14.EL.jfeeney.522119smp works on ibm-hs21-7995-2. Flavio Thank you, Flavio for the update. With this news, I am going to close this bz as a duplicate of bz523691 since it has the same fix as you provided. *** This bug has been marked as a duplicate of bug 523691 *** |
Description of problem: When setting jumbo frames on an HS21 IBM Blade using the bnx2 driver (1.7.9-2) with this Broadcom NIC (04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)) a kernel panic occurs. Steps to Reproduce: I have reproduced this in the GSS lab on machine: ibm-hs21-7995-2.gsslab.rdu.redhat.com 1. Boot into 2.6.9-89 kernel 2. run: # ifconfig eth0 mtu 9000 3. in 1-3 minutes the box kernel panics. general protection fault: 0000 [1] SMP CPU 0 Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yent a_socket pcmcia_core cpufreq_powersave ib_srp ib_sdp ib_ipoib inet_lro rdma_ucm rdma_cm iw_cm ib_addr ib_umad ib_ucm ib_uverbs ib_cm ib_sa ib_mad ib_core ide_du mp scsi_dump diskdump zlib_deflate dm_mirror dm_mod button battery ac md5 ipv6 i 5000_edac edac_mc hw_random bnx2 ext3 jbd qla2400 ata_piix libata qla2xxx scsi_t ransport_fc usb_storage uhci_hcd ohci_hcd ehci_hcd sd_mod scsi_mod Pid: 0, comm: swapper Not tainted 2.6.9-89.ELsmp RIP: 0010:[<ffffffff802b3d31>] <ffffffff802b3d31>{skb_drop_list+14} RSP: 0018:ffffffff8046db08 EFLAGS: 00010202 RAX: 0000010037d10500 RBX: 4c43502c434c4d2c RCX: 000001000000ea90 RDX: 0000010037d10500 RSI: 0000000000000000 RDI: 4c43502c434c4d2c RBP: 0000010122e36040 R08: 0000010129dc9000 R09: 000000011f040000 R10: 000001019f040000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000ab01ab R14: 0000000000000000 R15: 000000000000ab01 FS: 0000000000000000(0000) GS:ffffffff80504500(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000002a9556c000 CR3: 0000000000101000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffffffff80508000, task ffffffff803e1f00) Stack: 0000000000000001 ffffffff802b3dc7 0000000000000001 0000010122e36040 0000010122e36040 ffffffff802b3c0b 000000000000013f ffffffff802f3c4e 3d65640a3d65640a e602640a010000e0 Call Trace:<IRQ> <ffffffff802b3dc7>{skb_release_data+106} <ffffffff802b3c0b>{kfr ee_skbmem+9} <ffffffff802f3c4e>{udp_rcv+1042} <ffffffff802d2bed>{ip_local_deliver+298} <ffffffff802d3386>{ip_rcv+1046} <ffffffff802b9820>{netif_receive_skb+957} <ffffffffa012a07c>{:bnx2:bnx2_poll+4765} <ffffffff801340e4>{rebalance_tic k+133} <ffffffff80132c9f>{activate_task+124} <ffffffff802b9a44>{net_rx_action+20 8} <ffffffff8013d864>{__do_softirq+88} <ffffffff8013d90d>{do_softirq+49} <ffffffff801132f3>{do_IRQ+328} <ffffffff801108c3>{ret_from_intr+0} <EOI> <ffffffff8010e88c>{mwait_idle+86} <ffffffff8010e81c>{cpu_idle+26} Code: 48 8b 1b 8b 87 e8 00 00 00 ff c8 75 05 0f ae e8 eb 0e f0 ff RIP <ffffffff802b3d31>{skb_drop_list+14} RSP <ffffffff8046db08> This is fixed in RHEL5.4/bz#475567. The patch 0011-bnx2-Update-5706-5708-firmware.patch is enough to fix this bug on RHEL4u8. However, the patch 0012-bnx2-Eliminate-TSO-header-modifications.patch needs to applied together due the driver no longer has to modify the TCP/IP header fields when transmitting TSO packets. Brew build of testing package: https://brewweb.devel.redhat.com/taskinfo?taskID=1829611 Feedback: Those two patches works on an in-house system. Customer gave good feedback too.