Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 660506

Summary: [Broadcom 5.6 bug] tg3: Increase tx jumbo bd flag threshold
Product: Red Hat Enterprise Linux 5 Reporter: Matt Carlson <mcarlson>
Component: kernelAssignee: John Feeney <jfeeney>
Status: CLOSED ERRATA QA Contact: Network QE <network-qe>
Severity: urgent Docs Contact:
Priority: high    
Version: 5.6CC: andriusb, benlu, gideonn, hjia, jfeeney, kzhang, mcarlson
Target Milestone: rcKeywords: OtherQA
Target Release: 5.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 22:03:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Additional patches to fix the problem none

Description Matt Carlson 2010-12-06 22:55:07 UTC
Description of problem:

Setting the jumbo BD flag too soon seems to cause some tx recovery problems on 5719.  Please apply the recent upstream commit, entitled "tg3: Raise the jumbo frame BD flag threshold".

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 RHEL Program Management 2010-12-06 23:19:45 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 Matt Carlson 2010-12-07 18:01:39 UTC
> Do you consider this bug as a blocker for RHEL 5.6? Note that we are *very*
> late in the release cycle and plan to build RC kernel in a few days.

FYI, maybe the problem is my web browser (Opera), but I didn't see the quoted part of this comment (above) in the comment history.

Unfortunately, yes.  Common transmit patterns can easily bring about the problem.  This modification would be the correct thing to do, even if a problem were not present.  IMO, it is low risk.

Comment 4 John Feeney 2010-12-07 18:49:42 UTC
When applying the "Raise the jumbo frame BD frame threshold" patch I noticed
that the upstream patch changes tg3_start_xmit() and tg3_start_xmit_dma_bug()
but RHEL5.6 code would be changing tg3_start_xmit_dma_bug() only.

--- linux-2.6.18.noarch/drivers/net/tg3.c.657097.6.repost
+++ linux-2.6.18.noarch/drivers/net/tg3.c
@@ -5390,7 +5390,7 @@ static int tg3_start_xmit_dma_bug(struct
 #endif

        if ((tp->tg3_flags3 & TG3_FLG3_USE_JUMBO_BDFLAG) &&
-           !mss && skb->len > ETH_DATA_LEN)
+           !mss && skb->len > VLAN_ETH_FRAME_LEN)
                base_flags |= TXD_FLAG_JMB_PKT;

        /* Queue skb data, a.k.a. the main skb fragment. */
@@ -5416,7 +5416,7 @@ static int tg3_start_xmit_dma_bug(struct
                would_hit_hwbug = 1;

        if ((tp->tg3_flags3 & TG3_FLG3_USE_JUMBO_BDFLAG) &&
-           !mss && skb->len > ETH_DATA_LEN)
+           !mss && skb->len > VLAN_ETH_FRAME_LEN)
                base_flags |= TXD_FLAG_JMB_PKT;

        tg3_set_txd(tnapi, entry, mapping, len, base_flags,

One of these if statements appears to be superfluous. I tracked this mis-step
down to the trouble I had a year ago when posting the 3.104 patch set. If you
remember, there was a pointer problem that caused panics. The original patch
set posted had the correct code, as far as TXD_FLAG_JMB_PKT goes, but then the
patch set had to be reposted in a piecemeal fashion and it looks like both if
statements ended up in tg3_start_xmit_dma_bug(), as opposed to one in each
function. (Yuck)

So given that RHEL5.5 onward does not set TXD_FLAG_JMB_PKT in tg3_start_xmit(),
does this bring up another problem and does tg3_start_xmit_dma_bug() still need
to change to VLAN_ETH_FRAME_LEN? I assume it does, but does fixing it still
remain critical at this point in the RHEL5.6 cycle?

Comment 5 Matt Carlson 2010-12-07 19:20:48 UTC
(In reply to comment #4)
> One of these if statements appears to be superfluous. I tracked this mis-step
> down to the trouble I had a year ago when posting the 3.104 patch set. If you
> remember, there was a pointer problem that caused panics. The original patch
> set posted had the correct code, as far as TXD_FLAG_JMB_PKT goes, but then the
> patch set had to be reposted in a piecemeal fashion and it looks like both if
> statements ended up in tg3_start_xmit_dma_bug(), as opposed to one in each
> function. (Yuck)
>
> So given that RHEL5.5 onward does not set TXD_FLAG_JMB_PKT in tg3_start_xmit(),
> does this bring up another problem and does tg3_start_xmit_dma_bug() still need
> to change to VLAN_ETH_FRAME_LEN? I assume it does, but does fixing it still
> remain critical at this point in the RHEL5.6 cycle?

Yuck is right.  I didn't catch that.  The TXD_FLAG_JMB_PKT flag is needed in tg3_start_xmit() too for the 57765 asic rev.  We need to add that part or jumbo frames won't work correctly.

So, the patch is needed in both places, though it only fixes a bug for the 5719 in the tg3_start_xmit_dma_bug() case.  I'd consider the full patch as necessary.

Comment 6 John Feeney 2010-12-07 19:43:46 UTC
Okay, how does this look? Add if to tg3_start_xmit(), modify tg3_start_xmit_dma_bug(), and remove superflous.

@@ -5197,6 +5197,10 @@ static int tg3_start_xmit(struct sk_buff
        tnapi->tx_buffers[entry].skb = skb;
        pci_unmap_addr_set(&tnapi->tx_buffers[entry], mapping, mapping);

+       if ((tp->tg3_flags3 & TG3_FLG3_USE_JUMBO_BDFLAG) &&
+           !mss && skb->len > VLAN_ETH_FRAME_LEN)
+               base_flags |= TXD_FLAG_JMB_PKT;
+
        tg3_set_txd(tnapi, entry, mapping, len, base_flags,
                    (skb_shinfo(skb)->nr_frags == 0) | (mss << 1));

@@ -5390,7 +5394,7 @@ static int tg3_start_xmit_dma_bug(struct
 #endif

        if ((tp->tg3_flags3 & TG3_FLG3_USE_JUMBO_BDFLAG) &&
-           !mss && skb->len > ETH_DATA_LEN)
+           !mss && skb->len > VLAN_ETH_FRAME_LEN)
                base_flags |= TXD_FLAG_JMB_PKT;

        /* Queue skb data, a.k.a. the main skb fragment. */
@@ -5415,10 +5419,6 @@ static int tg3_start_xmit_dma_bug(struct
        if (tp->tg3_flags3 & TG3_FLG3_5701_DMA_BUG)
                would_hit_hwbug = 1;

-       if ((tp->tg3_flags3 & TG3_FLG3_USE_JUMBO_BDFLAG) &&
-           !mss && skb->len > ETH_DATA_LEN)
-               base_flags |= TXD_FLAG_JMB_PKT;
-
        tg3_set_txd(tnapi, entry, mapping, len, base_flags,
                    (skb_shinfo(skb)->nr_frags == 0) | (mss << 1));

Comment 7 Matt Carlson 2010-12-07 19:53:15 UTC
Looks good to me.

Comment 8 John Feeney 2010-12-07 21:31:45 UTC
See http://people.redhat.com/jfeeney/.bz660506
It has the patch provided above. Very quick testing feedback would be appreciated.

Comment 9 Matt Carlson 2010-12-07 21:36:38 UTC
Ben, can you test this?

Comment 10 Ben 2010-12-08 00:22:03 UTC
OK, update the test soon.

Comment 12 Jarod Wilson 2010-12-14 14:28:42 UTC
in kernel-2.6.18-237.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 14 Hushan Jia 2010-12-16 09:39:46 UTC
Hi Ben,
When you finish the testing, could you post a feedback to indicate the testing status of this bug. The test kernels can be found in Comment #12.

Thanks.

Comment 15 Matt Carlson 2010-12-20 20:34:16 UTC
Created attachment 469849 [details]
Additional patches to fix the problem

Hi John.  It looks like some more code is needed to fix the problem.  The attached tarball contains two patches that will take care of the bug.  Any chance these can make it in?

Comment 16 John Feeney 2010-12-20 21:20:27 UTC
Matt, 
That's too bad. This bz would not be appropriate because it is all locked and loaded for RHEL5.6. I would need another. Still, I would think it is doubtful that any bz of this nature would make it into RHEL5.6 at this point.

Comment 17 Andrius Benokraitis 2010-12-20 22:30:16 UTC
Comment on attachment 469849 [details]
Additional patches to fix the problem

needs to be in a new bugzilla since ON_QA

Comment 19 errata-xmlrpc 2011-01-13 22:03:10 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html