Bug 483646 - bridge: Fix LRO crash with tun (tun_chr_read()) [NEEDINFO]
bridge: Fix LRO crash with tun (tun_chr_read())
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
urgent Severity urgent
: rc
: ---
Assigned To: Andy Gospodarek
Red Hat Kernel QE team
: ZStream
: 503851 522458 (view as bug list)
Depends On:
Blocks: 522458 522636 5.5TechNotes-Updates
  Show dependency treegraph
 
Reported: 2009-02-02 13:55 EST by Mark Wagner
Modified: 2014-06-29 19:01 EDT (History)
21 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-03-30 02:49:18 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
agospoda: needinfo? (tao)


Attachments (Terms of Use)
rhel5-lro-test.patch (3.30 KB, patch)
2009-02-09 09:35 EST, Andy Gospodarek
no flags Details | Diff
rhel5-lro-test2.patch (3.24 KB, patch)
2009-02-09 10:19 EST, Andy Gospodarek
no flags Details | Diff
rhel5-tun-fix.patch (2.70 KB, patch)
2009-09-10 16:20 EDT, Andy Gospodarek
no flags Details | Diff

  None (edit)
Description Mark Wagner 2009-02-02 13:55:37 EST
Description of problem:
While trying to test some network performance to a guest, we switched to a bnx2x based card. When trying to run a high load from an external box to the guest we can make the kernel panic every time.  The crashes disappear when we disable the TPA functionality of the bnx2x driver. 

Version-Release number of selected component (if applicable):
This is "stock" RHEL5.3


How reproducible:
Everytime

Steps to Reproduce:
1. Start guest
2. Run netperf to guest from external box over bnx2x device
3.
  
Actual results:


Expected results:


Additional info:
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at drivers/net/tun.c:444
invalid opcode: 0000 [1] SMP 
last sysfs file: /class/net/lo/ifindex
CPU 0 
Modules linked in: tun ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter d
Pid: 6912, comm: qemu-kvm Tainted: G      2.6.18-128.el5 #1
RIP: 0010:[<ffffffff886f57b0>]  [<ffffffff886f57b0>] :tun:tun_chr_readv+0x2b1/0x3a6
RSP: 0018:ffff8102202c5e48  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8102202c5e98 RCX: 0000000004010000
RDX: ffff810227063680 RSI: ffff8102202c5e9e RDI: ffff8102202c5e92
RBP: 0000000000010ff6 R08: 0000000000000000 R09: 0000000000000001
R10: ffff8102202c5e94 R11: 0000000000000202 R12: ffff8102275357c0
R13: ffff81022755e500 R14: 0000000000000000 R15: ffff8102202c5ef8
FS:  00002ae4398db980(0000) GS:ffffffff803ac000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002ae4ab514000 CR3: 0000000221344000 CR4: 00000000000026e0
Process qemu-kvm (pid: 6912, threadinfo ffff8102202c4000, task ffff81022e58d820)
Stack:  00000000498735cb ffff810229d1a3c0 0000000000000000 ffff81022e58d820
 ffffffff8008a461 ffff81022755e528 ffff81022755e528 ffffffff8009f925
 000005ea05ea0000 ffff8102209d0000 00001051143e1600 ffffffff8003c00e
Call Trace:
 [<ffffffff8008a461>] default_wake_function+0x0/0xe
 [<ffffffff8009f925>] enqueue_hrtimer+0x55/0x70
 [<ffffffff8003c00e>] hrtimer_start+0xbc/0xce
 [<ffffffff886f58bf>] :tun:tun_chr_read+0x1a/0x1f
 [<ffffffff8000b3f3>] vfs_read+0xcb/0x171
 [<ffffffff800117d4>] sys_read+0x45/0x6e
 [<ffffffff8005d116>] system_call+0x7e/0x83


Code: 0f 0b 68 40 62 6f 88 c2 bc 01 f6 42 0a 08 74 0c 80 4c 24 41 
RIP  [<ffffffff886f57b0>] :tun:tun_chr_readv+0x2b1/0x3a6
 RSP <ffff8102202c5e48>
 <0>Kernel panic - not syncing: Fatal exception
Comment 1 Mark McLoughlin 2009-02-02 14:14:38 EST
The code is:

  if (sinfo->gso_type & SKB_GSO_TCPV4)
    gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
  else if (sinfo->gso_type & SKB_GSO_TCPV6)
    gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
  else
    BUG();

Since this seems to be bnx2x rx path specific, one potential theory is that the card is overrunning the linear data buffer supplied to it and overwriting skb_shared_info which lives at the end of the linear data.
Comment 2 Andy Gospodarek 2009-02-05 10:51:49 EST
My test kernels have been updated to include a patch for this bugzilla.

http://people.redhat.com/agospoda/#rhel5

Please test them and report back your results.  Without immediate
feedback there is a good chance this or any other fix for this driver
will not be included in the upcoming update.
Comment 3 Andy Gospodarek 2009-02-06 17:51:24 EST
Eilon,

We are noticing BUG-halts when using bnx2x inside a bridging interface with TPA enabled.  When I remove the place where gso_size is set: 

diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
index 19ddcc2..2fec4a9 100644
--- a/drivers/net/bnx2x_main.c
+++ b/drivers/net/bnx2x_main.c
@@ -1214,11 +1214,6 @@ static int bnx2x_fill_frag_skb(struct bnx2x *bp, struct bnx2x_fastpath *fp,
        frag_size = le16_to_cpu(fp_cqe->pkt_len) - len_on_bd;
        pages = BCM_PAGE_ALIGN(frag_size) >> BCM_PAGE_SHIFT;
 
-       /* This is needed in order to enable forwarding support */
-       if (frag_size)
-               skb_shinfo(skb)->gso_size = min((u32)BCM_PAGE_SIZE,
-                                              max(frag_size, (u32)len_on_bd));
-
 #ifdef BNX2X_STOP_ON_ERROR
        if (pages > 8*PAGES_PER_SGE) {
                BNX2X_ERR("SGL length is too long: %d. CQE index is %d\n",

the box doesn't hit the same bug-halt, but I'm not sure we should use TPA when the bnx2x cards are part of a bridge.  Performance is terrible with the above patch too.

The upstream driver does not allow TPA to be enabled when the device is a member of a bridge group since the ethtool call to disable LRO (TPA) is made whenever the device is inserted into a bridge (or when routing is enabled).  Since we don't have the same ethtool options available in RHEL5, should we just advise customers to disable TPA when using these cards with virtualization?

If you can elaborate on the purpose of the chunk of code I'm removing with the patch above that would be helpful too.  The comment indicates it was there for forwarding purposes, but it sure does introduce a panic when we are forwarding.

Thanks!
Comment 4 Eilon Greenstein 2009-02-08 05:09:44 EST
Hi,

As you wrote, TPA (or any other LRO) should be disabled when working with forwarding/bridging. The function skb_warn_if_lro should catch the case where the user mistakenly turns TPA (LRO) on when using forwarding/bridging – it does so by checking the gso_size != 0 while gso_type == 0. This is why we need those lines in the driver.

The problem is that not all devices can support fragmented SKB and so we cannot create such SKBs in forwarding. If we set the gso_type as well as the gso_size then we get good performance when using only the bnx2x as both the receiver and the transmitter – but this is not an acceptable solution since we cannot be sure which device will be used as the transmitter.

I think that the only current alternative is to set the gso_size and to document this limitation (no TPA when using forwarding/bridging).

Do you agree?

Eilon
Comment 5 Herbert Xu 2009-02-09 05:03:23 EST
Please see the patch I just posted to netdev (bridge: Fix LRO crash with tun).
Comment 6 Andy Gospodarek 2009-02-09 09:19:36 EST
Inclusion of skb_warn_lro_forwarding and friends should probably be added to RHEL5 along with the patch you just posted to netdev.  That will resolve the panic.  If we did that I would feel OK leaving TPA enabled by default.  If not TPA should probably be disabled by default.
Comment 7 Andy Gospodarek 2009-02-09 09:35:36 EST
Created attachment 331312 [details]
rhel5-lro-test.patch

This is a patch against rhel5 that would give us the functions that would look out for LRO/TPA usage with forwarding.  This would be a nice workaround since we can't use ethtool to disable it automatically.

Backport of this:

    commit 4497b0763cb1afae463f5e144c28b5d806e28b60
    Author: Ben Hutchings <bhutchings@solarflare.com>
    Date:   Thu Jun 19 16:22:28 2008 -0700

        net: Discard and warn about LRO'd skbs received for forwarding

        Add skb_warn_if_lro() to test whether an skb was received with LRO and
        warn if so.

        Change br_forward(), ip_forward() and ip6_forward() to call it) and
        discard the skb if it returns true.

and this:

    http://marc.info/?l=linux-netdev&m=123417379122404&w=2
Comment 8 Andy Gospodarek 2009-02-09 10:19:29 EST
Created attachment 331318 [details]
rhel5-lro-test2.patch

Sorry, that last patch applied, but didn't compile.  Here is an updated one.
Comment 9 Andy Gospodarek 2009-02-09 10:21:38 EST
(In reply to comment #4)
> Hi,
> 
> As you wrote, TPA (or any other LRO) should be disabled when working with
> forwarding/bridging. The function skb_warn_if_lro should catch the case where
> the user mistakenly turns TPA (LRO) on when using forwarding/bridging – it does
> so by checking the gso_size != 0 while gso_type == 0. This is why we need those
> lines in the driver.
> 
> The problem is that not all devices can support fragmented SKB and so we cannot
> create such SKBs in forwarding. If we set the gso_type as well as the gso_size
> then we get good performance when using only the bnx2x as both the receiver and
> the transmitter – but this is not an acceptable solution since we cannot be
> sure which device will be used as the transmitter.
> 
> I think that the only current alternative is to set the gso_size and to
> document this limitation (no TPA when using forwarding/bridging).
> 
> Do you agree?
> 
> Eilon

I do agree, Eilon.  I will try and get the patch in comment #8 included in RHEL5 at some point.  In the interim we will have to make sure that customers know to disable LRO/TPA when using virtualization of any kind since the use of bridging or routing for networking can obviously cause problems.
Comment 12 Andy Gospodarek 2009-09-10 16:20:32 EDT
Created attachment 360568 [details]
rhel5-tun-fix.patch

I am proposing the following patch.  Testing feedback is appreciated.
Comment 16 Sandy Garza 2009-09-17 11:01:38 EDT
(In reply to comment #12)
> Created an attachment (id=360568) [details]
> rhel5-tun-fix.patch
> I am proposing the following patch.  Testing feedback is appreciated.  

HP verified the patch successfully.
Comment 17 Sandy Garza 2009-09-17 14:54:58 EDT
(In reply to comment #16)
> (In reply to comment #12)
> > Created an attachment (id=360568) [details] [details]
> > rhel5-tun-fix.patch
> > I am proposing the following patch.  Testing feedback is appreciated.  
> HP verified the patch successfully.  

The patch appears to fix the panic issue, but doesn't address the second step of the fix which is how to disable LRO.  The patch will drop the packet and log a message "received packets cannot be forwarded while LRO is enabled".  

Do we need have a knowledgebase to let customers know how to disable LRO for specific servers and NIC's?
Comment 18 Don Zickus 2009-09-22 16:17:09 EDT
in kernel-2.6.18-166.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.
Comment 20 Mark Wagner 2009-09-25 06:36:51 EDT
Hushan

I was testing guest performance so the I was bridging the interface to the guest.
If you are running netperf TCP_STREAM to a guest successfully from an external box  then you aren't hitting the problem.
Comment 28 Prarit Bhargava 2010-03-28 08:27:33 EDT
*** Bug 522458 has been marked as a duplicate of this bug. ***
Comment 29 errata-xmlrpc 2010-03-30 02:49:18 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html
Comment 30 Herbert Xu 2010-05-21 06:58:56 EDT
*** Bug 503851 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.