Description of problem: When iptables is enabled, network communication with bnx2x fails. The following messages are logged. Oct 2 15:56:06 localhost kernel: [bnx2x_stats_update:4549(eth0)]storm stats were not updated for 3 times Oct 2 15:56:06 localhost kernel: [bnx2x_stats_update:4550(eth0)]driver assert Oct 2 15:56:06 localhost kernel: [bnx2x_panic_dump:632(eth0)]begin crash dump ----------------- Oct 2 15:56:06 localhost kernel: [bnx2x_panic_dump:640(eth0)]def_c_idx(3240) def_u_idx(0) def_x_idx(0) def_t_idx(0) def_att_idx(6) attn_state(0) spq_prod_idx(171) Oct 2 15:56:06 localhost kernel: [bnx2x_panic_dump:651(eth0)]fp0: rx_bd_prod(9154) rx_bd_cons(156) *rx_bd_cons_sb(0) rx_comp_prod(92dd) rx_comp_cons(82dd) *rx_cons_sb(82dd) Oct 2 15:56:06 localhost kernel: [bnx2x_panic_dump:656(eth0)] rx_sge_prod(400) last_max_sge(0) fp_u_idx(15fa) *sb_u_idx(15fa) Oct 2 15:56:06 localhost kernel: [bnx2x_panic_dump:666(eth0)]fp1: tx_pkt_prod(a717) tx_pkt_cons(a707) tx_bd_prod(fae1) tx_bd_cons(faab) *tx_cons_sb(a707) Oct 2 15:56:06 localhost kernel: [bnx2x_panic_dump:670(eth0)] fp_c_idx(9718) *sb_c_idx(9718) tx_db_prod(fae1) Oct 2 15:56:06 localhost kernel: [bnx2x_panic_dump:685(eth0)]fp0: rx_bd[2d3]=[0:34fe7010] sw_bd=[f46a5c00] Oct 2 15:56:06 localhost kernel: [bnx2x_panic_dump:685(eth0)]fp0: rx_bd[2d4]=[0:347bc010] sw_bd=[f5cf2800] Version-Release number of selected component (if applicable): How reproducible: This problem is reproducible on BL460c G6. 1) create /etc/sysconfig/iptables as the following # cat /etc/sysconfig/iptables # Generated by iptables-save v1.2.11 on Fri Oct 2 15:54:04 2009 *nat :PREROUTING ACCEPT [14:2015] :POSTROUTING ACCEPT [3:628] :OUTPUT ACCEPT [3:628] -A POSTROUTING -s 192.168.67.0/255.255.255.0 -p tcp -m tcp --sport 20 -j MASQUERADE COMMIT # Completed on Fri Oct 2 15:54:04 2009 2) service iptables start 3) ftp xxxxx ftp > put <file> ; file bigger than 1MB problem occurs If above problem occurs, network communication with bnx2x NIC will fail. To recover, it is necessary to do the below command. # rmmod bnx2x ; service iptables off; service network restart Actual results: Network stalls, bnx2x druver asserts. Expected results: Normal workflow. Additional info: As per adjacent bzs, suggested switching off TSO. will see if that helps.
> This problem is reproducible on BL460c G6. How can I get access to this machine ?
> The following messages are logged. > > Oct 2 15:56:06 localhost kernel: [bnx2x_stats_update:4549(eth0)]storm stats > were not updated for 3 times We have no such message in RHEL4 driver, it is in RHEL5 however. > Version-Release number of selected component (if applicable): This information is missing. It looks like newer Broadcom module is used instead of one shipped with RHEL4, correct ? Otherwise it is RHEL5 not RHEL4 ?
Taken form sosreport: Broadcom NetXtreme II 5771x 10Gigabit Ethernet Driver bnx2x 1.50.13 ($DateTime: 2009/07/22 07:22:59 $) Hence bnx2x is external module. I guess Broadcom should support this, that would be the best way as this is very low level bug.
Event posted on 10-08-2009 07:01pm JST by tumeya Apology for this haven't checked in-shop. I've confirmed the simple reproducer again, ruling out few things from their original report. 1. run modprobe: # modprobe iptable_nat # lsmod | grep ip iptable_nat 27613 0 ip_conntrack 46085 1 iptable_nat ip_tables 23105 1 iptable_nat ipv6 244833 26 #<---not relevant but anyways... 2. Confirm the rule is blank: # iptables -t nat -L Chain PREROUTING (policy ACCEPT) target prot opt source destination Chain POSTROUTING (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination 3. ftp to any box; put 1MB+ file; network hangs. I've pinged HP and they said this would cause the same issue they are seeing. This event sent from IssueTracker by tumeya issue 350828
Eilon, We have problem with bnx2x working in RHEL4. Issue is also reproducible with (newest ?) driver version 1.50.13. Any help will be very appreciated. If you want more info please let us know.
I will try to build a similar setup locally and reproduce. Stay tuned.
I could not reproduce on a different system. I’m trying to get BL460 to test again.
(In reply to comment #10) > I could not reproduce on a different system. I’m trying to get BL460 to test > again. Thank you for your effort. Here are some more details about configuration: BL460c G6 RHEL4.8 (2.6.9-89.ELsmp) 32bit (i386) BCM57711E 100/1GB/10GB NIC We do not have info about connected switch, do you think this is important?
I think that there might be an earlier error prior to the statistics collection failure. Can you please enable the following debug prints 0xef00f7 and send the log? Thanks, Eilon
Created attachment 364536 [details] log provided by HP
This log seems to be not sufficient. I think log for 1.50.13 driver is needed (this one seems to be for RHEL4 stock driver), log should start when bnx2x module is loaded and ending after bnx2x_panic_dump or even better after rmmod. Eilon, do I'm right ?
Created attachment 364796 [details] bnx2x: Changing the Disabled state to a flag Hi, Indeed, the log is very limited and it is missing the prints I was hopping to see – especially the FW dump. Though I’m still unable to reproduce, I was able to reproduce and fix (at least I hop so ;)) a race condition when loading and unloading a driver on an HP system with DCC (device control channel) enabled. I posted a full fix to the DCC issues, which includes fixes to 2 other issues, on netdev. The problem is only seen with DCC which is part of version 1.50.13 but not part of the current RH4.8 version. I think that this issue is addressed in one of the fixes which in patches 4,5 or 6 – but I don’t have enough information to know for sure. If one of those race conditions will happen, eventually statistics collection will stop. If possible, please try to run with those patches. I’m attaching them here as well. Thanks, Eilon
Created attachment 364886 [details] netxtreme2-5.0.17-dcc-fixes.patch Eilon, your patches do not apply, even if I force to apply then compilation fail. I fixed that. Please check if this patch contains all intended fixed. Patch is for form Broadcom site.
Sorry about that, it was based on net-next - this patch looks good.
Created attachment 365807 [details] message provided by HP
Current status: DCC fixes do not help with this bug. I get access to BL460c G6 in RedHat and I'm able to reproduce the bug. It's enough to load iptable_nat module (without any iptables settings) and transmit data to reproduce.
Created attachment 366296 [details] logs.debug.bad.gz Logs when iptable_nat is loaded, with bnx2x debug=0xffffff and stack dumps from iptables code from functions ip_nat_fn(), ip_nat_out().
Created attachment 366297 [details] logs.debug.good.gz Logs when transmitting data without iptable_nat module, with bnx2x debug=0xffffff
With iptable_nat TSO is used (xmit_type 8) and this seems to make firmware crash, as we can not see TSO transmissions in "good" logs (we have only xmit_type 0,1,5). Eilon, any hints?
I’m able to reproduce now and I’m looking into it. Stay tuned…
Created attachment 366466 [details] CSUM is always set when GSO is set Apparently, when iptable_nat is loaded, we receive request for GSO without explicit CSUM offload request. This patch take care of the CSUM when GSO is set. This patch fixes the problem on my setup – please let me know the status on yours.
The patch works, but it is not good enough. I will work on something cleaner and send it out.
Created attachment 366473 [details] Set CSUM on GSO This fix is more like it. Sorry for the mess
Event posted on 10-29-2009 03:04pm JST by tumeya It's confirmed that the patch works. The issue is gone. This event sent from IssueTracker by tumeya issue 350828
I also confirm fix works in my setup. Many, many thanks Eilon.
Created attachment 366563 [details] set_csum_on_gso for RHEL4 Patch for RHEL4. I'm going to prepare kernel packages now.
Brew build: https://brewweb.devel.redhat.com/taskinfo?taskID=2052216 Test packages (i686, x86_64, src.rpm) for public download: http://people.redhat.com/sgruszka/bnx2x-rhel4/
Eilon, is patch queued for upstream submission? I need that info before posting to RKML.
Hi, I’m sorry about the delay, but I’m under the gun for another project… Yes – I plan to submit this patch to Dave Miller’s net-next soon (within days). BTW - Any idea why iptable_nat is not setting the CHECKSUM_HW in the skb->ip_summed? The reason will not effect this patch, since the bnx2x should always configure the FW/HW for checksum offload when setting GSO, but it still interesting to know. Thanks, Eilon
(In reply to comment #48) > BTW - Any idea why iptable_nat is not setting the CHECKSUM_HW in the > skb->ip_summed? I guess only because want to have the same code for incoming/outgoing/forwarding packets and avoid compilations with pseudo header. I do not see any other reason, but I don't know netfilter code very well. Anyway, upstream checksuming in netfilter looks better.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Committed in 89.15.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0263.html