Bug 522119

Summary: bnx2 driver with jumbo frames (MTU 9000) enabled causes kernel panic on IBM Blade HS21 with RHEL4.8
Product: Red Hat Enterprise Linux 4 Reporter: Flavio Leitner <fleitner>
Component: kernelAssignee: John Feeney <jfeeney>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 4.8CC: orkcu, peterm, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-11-12 16:04:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
bnx2-firmware-update.patch none

Description Flavio Leitner 2009-09-09 14:12:39 UTC
Description of problem:

When setting jumbo frames on an HS21 IBM Blade using the bnx2 driver (1.7.9-2)
with this Broadcom NIC (04:00.0 Ethernet controller: Broadcom Corporation
NetXtreme II BCM5708S Gigabit Ethernet (rev 12)) a kernel panic occurs.

Steps to Reproduce:
I have reproduced this in the GSS lab on machine:
ibm-hs21-7995-2.gsslab.rdu.redhat.com

1.  Boot into 2.6.9-89 kernel
2.  run: # ifconfig eth0 mtu 9000
3.  in 1-3 minutes the box kernel panics.


general protection fault: 0000 [1] SMP
CPU 0
Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yent
a_socket pcmcia_core cpufreq_powersave ib_srp ib_sdp ib_ipoib inet_lro rdma_ucm
rdma_cm iw_cm ib_addr ib_umad ib_ucm ib_uverbs ib_cm ib_sa ib_mad ib_core ide_du
mp scsi_dump diskdump zlib_deflate dm_mirror dm_mod button battery ac md5 ipv6 i
5000_edac edac_mc hw_random bnx2 ext3 jbd qla2400 ata_piix libata qla2xxx scsi_t
ransport_fc usb_storage uhci_hcd ohci_hcd ehci_hcd sd_mod scsi_mod
Pid: 0, comm: swapper Not tainted 2.6.9-89.ELsmp
RIP: 0010:[<ffffffff802b3d31>] <ffffffff802b3d31>{skb_drop_list+14}
RSP: 0018:ffffffff8046db08  EFLAGS: 00010202
RAX: 0000010037d10500 RBX: 4c43502c434c4d2c RCX: 000001000000ea90
RDX: 0000010037d10500 RSI: 0000000000000000 RDI: 4c43502c434c4d2c
RBP: 0000010122e36040 R08: 0000010129dc9000 R09: 000000011f040000
R10: 000001019f040000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000ab01ab R14: 0000000000000000 R15: 000000000000ab01
FS:  0000000000000000(0000) GS:ffffffff80504500(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000002a9556c000 CR3: 0000000000101000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff80508000, task ffffffff803e1f00)
Stack: 0000000000000001 ffffffff802b3dc7 0000000000000001 0000010122e36040
      0000010122e36040 ffffffff802b3c0b 000000000000013f ffffffff802f3c4e
      3d65640a3d65640a e602640a010000e0
Call Trace:<IRQ> <ffffffff802b3dc7>{skb_release_data+106} <ffffffff802b3c0b>{kfr
ee_skbmem+9}
      <ffffffff802f3c4e>{udp_rcv+1042} <ffffffff802d2bed>{ip_local_deliver+298}

      <ffffffff802d3386>{ip_rcv+1046} <ffffffff802b9820>{netif_receive_skb+957}

      <ffffffffa012a07c>{:bnx2:bnx2_poll+4765} <ffffffff801340e4>{rebalance_tic
k+133}
      <ffffffff80132c9f>{activate_task+124} <ffffffff802b9a44>{net_rx_action+20
8}
      <ffffffff8013d864>{__do_softirq+88} <ffffffff8013d90d>{do_softirq+49}
      <ffffffff801132f3>{do_IRQ+328} <ffffffff801108c3>{ret_from_intr+0}
       <EOI> <ffffffff8010e88c>{mwait_idle+86} <ffffffff8010e81c>{cpu_idle+26}

Code: 48 8b 1b 8b 87 e8 00 00 00 ff c8 75 05 0f ae e8 eb 0e f0 ff
RIP <ffffffff802b3d31>{skb_drop_list+14} RSP <ffffffff8046db08>


This is fixed in RHEL5.4/bz#475567. The patch 0011-bnx2-Update-5706-5708-firmware.patch is enough to fix this bug on RHEL4u8. However, the patch 
0012-bnx2-Eliminate-TSO-header-modifications.patch needs to applied 
together due the driver no longer has to modify the TCP/IP header fields 
when transmitting TSO packets.

Brew build of testing package:
https://brewweb.devel.redhat.com/taskinfo?taskID=1829611

Feedback:
Those two patches works on an in-house system. Customer gave good feedback too.

Comment 1 Flavio Leitner 2009-09-09 14:15:40 UTC
Created attachment 360222 [details]
bnx2-firmware-update.patch

Attaching the tested patch.

Comment 2 Roger Pena-Escobio 2009-10-07 20:01:16 UTC
we are getting problems also with kernel -89.0.9 in a HP DL585G2, which has a broadcom NIC:
Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02)

after some "heavy" traffic the network stop working. IF we restart the service, it will work again for a little while. The server do not panic and I am able to log into iLo console and check the stats.
So far, we can see a lot of drops while working and then it stop when errors start reporting.

for example:
eth0      Link encap:Ethernet  HWaddr 00:1B:78:BE:E8:5C  
          inet addr:169.185.XXX.YYY  Bcast:169.185.XXX.YYY  Mask:255.255.255.128
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:314613 errors:74 dropped:5503 overruns:0 frame:74
          TX packets:188633 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:459977654 (438.6 MiB)  TX bytes:45648574 (43.5 MiB)
          Interrupt:209 Memory:dc000000-dc012100 

the problem disappear when MTU is set to 1500, or when we roll back to kernel -78.0.22

this is very easy to replicate in our environment, just with when creating a big tar file in local disk from a NFS volumen.

wondering if you have a kernel to test with your patches included.

Comment 4 John Feeney 2009-10-30 18:34:06 UTC
I built rpms for x86_64 and i686 that have the firmware updated and the patch to bnx2.c as found in comment #1. Please see my people page
http://people.redhat.com/jfeeney/.rhel4-bnx2/

Note that the firmware included in these rpms is a newer version than what comment #1 specifies (patch has version 4.6.16 and comment #1 has 4.4.2).

I would appreciate it if this could be tested and the results reported back here.
Thanks.

Comment 6 Roger Pena-Escobio 2009-10-30 19:55:01 UTC
sure John

but would be possible to have the kernel-smp too?
since we like to actually test the kernel with all 8 cpu availables

thanks

Comment 7 John Feeney 2009-10-30 21:42:25 UTC
Okay, now the smps are on my people page too. 


Thanks.

Comment 8 Flavio Leitner 2009-11-12 13:26:15 UTC
2.6.9-89.14.EL.jfeeney.522119smp works on ibm-hs21-7995-2.
Flavio

Comment 9 John Feeney 2009-11-12 16:03:47 UTC
Thank you, Flavio for the update. With this news, I am going to close this bz as a duplicate of bz523691 since it has the same fix as you provided.

Comment 10 John Feeney 2009-11-12 16:04:25 UTC

*** This bug has been marked as a duplicate of bug 523691 ***