Bug 518531 - [RHEL5]: Need "disable_tpa" parameter with bnx2x to get networking to work for guests
Summary: [RHEL5]: Need "disable_tpa" parameter with bnx2x to get networking to work fo...
Keywords:
Status: CLOSED DUPLICATE of bug 582367
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.3
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Xen Maintainance List
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 514491
TreeView+ depends on / blocked
 
Reported: 2009-08-20 19:05 UTC by Mike Overbo
Modified: 2013-01-09 21:53 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-04-16 14:30:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Mike Overbo 2009-08-20 19:05:35 UTC
Description of problem:

strange i/o network+disk throughput on windows VMs unless you put this in modprobe.conf on dom0:
options bnx2x disable_tpa=1

Version-Release number of selected component (if applicable):

bnx2x module version 1.45.23q
kernel-xen version2.6.18-128.4.1.el5xen

How reproducible:

we have 10 blades and this error is like clockwork.

Steps to Reproduce:
1. have an hp bl460c g6 with bnx2x passthrough ethernet
2. boot a xen kernel or try the kvm 5.4 beta
3. create a domU windows server guest - 2008 or 2003, 32b or 64b
4. from the windows server guest, download something

Actual results:

nearly no network throughput on guest. nothing good happens, throughput of 1-2kb/s until network eventually stalls.  traffic that doesn't incur disk i/o happens normally -- you can RDP to the virtual guest and it behaves normally. the system will never stop responding to pings. If you download something to that guest, and incur net+disk i/o, the network stalls.

Expected results:

normal virtualized network throughput

Additional info:

'options bnx2x disable_tpa=1' in modprobe.conf fixes this issue.

Comment 1 Chris Lalancette 2009-08-21 07:22:30 UTC
Thanks for the report.  When you say:

1. have an hp bl460c g6 with bnx2x passthrough ethernet

What does that mean, exactly?  Are you doing PCI passthrough of the device to the guest?  Or are you just bridging the device, and connecting the guest into the bridge?

Chris Lalancette

Comment 2 Mike Overbo 2009-08-21 17:02:54 UTC
it's a description of the hardware on the hp 7000 blade system that hosts the bl460c blade server.  

we're bridging the device and connecting the guest into the bridge.

Comment 3 Pasi Karkkainen 2009-11-16 15:53:09 UTC
I'm seeing this problem aswell. 

The host has bnx2x NIC, and it's running RHEL 5.4 with Xen, currently with -173 dzickus el5xen test kernel.

Xen PV guests using virbr0 nat/dhcp bridge have _really_ slow networking, 
like 1 - 5 kB/sec. It's impossible to install new guests using virt-install because of this. Dom0 networking is OK/fast.

Host/dom0 dmesg has huge amounts of:
<name_of_the_outbound_bridge>: received packets cannot be forwarded while LRO is enabled

Adding "options bnx2x disable_tpa=1" to /etc/modprobe.conf and rebooting fixes the problem for guest vms.

Comment 4 Dean Hamstead 2009-12-09 00:26:10 UTC
this also occurs with KVM and linux guests.

Comment 5 Andreas Thienemann 2009-12-14 10:14:57 UTC
Comment #3 claims that Dom0 networking is okay.
This is wrong.

iperf output for the machine running 2.6.18-164.6.1.el5:

[root@minos ~]# iperf -c buildvirt-01 -p 80
------------------------------------------------------------
Client connecting to buildvirt-01, TCP port 80
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 10.150.82.81 port 58780 connected with 10.147.103.16 port 80
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.1 sec  87.2 MBytes  72.8 Mbits/sec



iperf output for the machine running 2.6.18-164.6.1.el5.xen:
[root@minos ~]# iperf -c buildvirt-01 -p 80
------------------------------------------------------------
Client connecting to buildvirt-01, TCP port 80
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 10.150.82.81 port 57288 connected with 10.147.103.16 port 80
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.6 sec    176 KBytes    136 Kbits/sec

Comment 6 Andreas Thienemann 2009-12-14 10:17:40 UTC
When disabling tpa:

[root@minos ~]# iperf -c buildvirt-01 -p 80
------------------------------------------------------------
Client connecting to buildvirt-01, TCP port 80
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 10.150.82.81 port 38737 connected with 10.147.103.16 port 80
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  66.2 MBytes  55.3 Mbits/sec
[root@minos ~]#

Comment 8 Andy Gospodarek 2010-01-15 15:13:50 UTC
The differences between 128 and 164-based kernels is pretty interesting.  The 128-kernels have a bug where they do not appropriately enforce the limits that would normally be enforced by the napi weight.

That should explain some of the increased throughput, but I would be shocked if the interrupt load was so great that the performance was ~3X better just because we were consuming 300 frames instead of 64 with each napi poll.

Comment 9 Stanislaw Gruszka 2010-01-15 15:45:23 UTC
To confirm or deny that performance degradation is because of napi limit we can just set it to 300 as it was before i.e: echo 300 > /sys/class/net/ethX/weight

Comment 10 Herbert Xu 2010-01-16 07:03:09 UTC
TPA unfortunately is a form of LRO.  Unlike GRO, it is fundamentally incompatible with bridging.  So it has to be disabled when bridging is in use.

Comment 17 Andrew Jones 2010-03-22 10:03:25 UTC
This bug needs clarification in it if there is also a performance regression beyond what is corrected by disabling TPA. Or in other words, does disabling TPA resolve all issues for this bug? If not, we should open a new bug with the regression details.

Thanks,
Andrew

Comment 19 Stanislaw Gruszka 2010-03-22 12:43:33 UTC
(In reply to comment #17)
> This bug needs clarification in it if there is also a performance regression
> beyond what is corrected by disabling TPA. Or in other words, does disabling
> TPA resolve all issues for this bug?
 
Customer is seeing regression, some more details are in Comment 7. On my testing environment I did not see regression, however I only changed kernel and do tests with RHEL5.4 userland. Perhaps with full RHEL5.3 installation we can see regression as well, I'm going to check that.

> If not, we should open a new bug with the regression details.
Agree.

Comment 21 Mike Overbo 2010-03-24 15:02:12 UTC
Just curious -- any good reason that TPA needs to be enabled by default?

Comment 22 Stanislaw Gruszka 2010-03-25 08:12:17 UTC
For performance reasons :) on non virtual machines. Probably option disable_tpa=1 should be added to modprobe.d as default when bridges are used by installation program. If not we can eventually do something like that in bnx2x driver:

#ifdef XEN
static int disable_tpa = 1;
#else
static int disable_tpa = 0; 
#endif

But this does not look like good solution, for example what about KVM and RHEL6?

Comment 25 Stanislaw Gruszka 2010-03-25 09:56:12 UTC
(In reply to comment #22) 
> #ifdef XEN
> static int disable_tpa = 1;
> #else
> static int disable_tpa = 0; 
> #endif
> 
> But this does not look like good solution, for example what about KVM and
> RHEL6?    

Ok, there is better way. We can backport commit

commit 0187bdfb05674147774ca79a79942537f3ad54bd
Author: Ben Hutchings <bhutchings>
Date:   Thu Jun 19 16:15:47 2008 -0700

    net: Disable LRO on devices that are forwarding

That will be the best solution since this is the way it is done upstream.

Comment 26 Andy Gospodarek 2010-03-25 14:22:08 UTC
(In reply to comment #25)
> (In reply to comment #22) 
> > #ifdef XEN
> > static int disable_tpa = 1;
> > #else
> > static int disable_tpa = 0; 
> > #endif
> > 
> > But this does not look like good solution, for example what about KVM and
> > RHEL6?    
> 
> Ok, there is better way. We can backport commit
> 
> commit 0187bdfb05674147774ca79a79942537f3ad54bd
> Author: Ben Hutchings <bhutchings>
> Date:   Thu Jun 19 16:15:47 2008 -0700
> 
>     net: Disable LRO on devices that are forwarding
> 
> That will be the best solution since this is the way it is done upstream.    

That exact patch is a kABI breaker (the ethtool set_flags bits).  This is why we have not included it.  Dave Miller had an interesting suggestion to create a way to allow a device to register if it is capable of LRO and when needed we could call down and disable it.

This was taken from an email from Dave:


Therefore we can implement the handling using whatever datastructures
and interfaces we want.

For example, we could have:

typedef int (*lro_func_t)(struct net_device *, bool enable);

int register_lro_netdev(struct net_device *dev,
                        lro_func_t func);
void unregister_lro_netdev(struct net_device *dev);

and then a driver goes:

        int ret = register_netdevice(dev);
        if (ret)
                err_register;
        ret = register_lro_netdev(dev, mydev_lro_func);

We maintain a simple linked list of LRO netdevs, and when
bridging or routing wants to turn it off it calls some
interface we provide like:

struct netdev_lro_entry {
        struct list_head list;
        struct net_device *dev;
        lro_disable_func_t func;
};

static struct list_head lro_netdevs;

int netdev_lro_disable(struct net_device *dev)
{
        struct netdev_lro_entry *p;
        int err = -ENODEV;

        list_for_each_entry(p, &lro_netdevs, list) {
                if (p->dev == dev) {
                        err = p->func(dev, false);
                        break;
                }
        }
}
EXPORT_SYMBOL(netdev_lro_disable);

and there's an equivalent netdev_lro_enable().

Comment 27 Stanislaw Gruszka 2010-03-25 15:12:24 UTC
(In reply to comment #26)
> > commit 0187bdfb05674147774ca79a79942537f3ad54bd
> > Author: Ben Hutchings <bhutchings>
> > Date:   Thu Jun 19 16:15:47 2008 -0700
> > 
> >     net: Disable LRO on devices that are forwarding
> > 
> > That will be the best solution since this is the way it is done upstream.    
> 
> That exact patch is a kABI breaker (the ethtool set_flags bits).  This is why
> we have not included it.  Dave Miller had an interesting suggestion to create a
> way to allow a device to register if it is capable of LRO and when needed we
> could call down and disable it.

Whay just not create additional ethtool_ops_ext structure?

struct ethtool_ops_ext {
   struct ethtool_ops *ops;
   struct ethtool_aux *aux;
};

Plus some bit int netdev->flags that indicate if driver use ehtool_ops_ext or ethtool_ops. This will allow to add other additional ethtools methods in the future.

Comment 28 Andy Gospodarek 2010-03-25 17:14:58 UTC
(In reply to comment #27)
> 
> Whay just not create additional ethtool_ops_ext structure?
> 
> struct ethtool_ops_ext {
>    struct ethtool_ops *ops;
>    struct ethtool_aux *aux;
> };
> 
> Plus some bit int netdev->flags that indicate if driver use ehtool_ops_ext or
> ethtool_ops. This will allow to add other additional ethtools methods in the
> future.    

I see that as a hack that should only be used when no other option exists.

Comment 29 Stanislaw Gruszka 2010-03-26 08:26:58 UTC
(In reply to comment #28)
> I see that as a hack that should only be used when no other option exists.
    
It's not beauty but:
1) It's standard method in RHEL kernel, when want to add fields to structure and not break kABI, for example signal_with_aux_struct.
2) It's extensible, have it in place can easy add new ethtools methods.
3) Allow to have code close to upstream.

For me this is better way over that Dave proposed.

Comment 30 Andy Gospodarek 2010-03-26 13:16:38 UTC
(In reply to comment #29)
> (In reply to comment #28)
> > I see that as a hack that should only be used when no other option exists.
> 
> It's not beauty but:
> 1) It's standard method in RHEL kernel, when want to add fields to structure
> and not break kABI, for example signal_with_aux_struct.
> 2) It's extensible, have it in place can easy add new ethtools methods.
> 3) Allow to have code close to upstream.
> 
> For me this is better way over that Dave proposed.    

I'm well aware of how this can be used to hack around kABI limitations.

My personal opinion is that I would rather see a small deviation from upstream than a kABI workaround like you have proposed.  I do not like to see those used unless no other reasonable option exists.

Comment 32 Andy Gospodarek 2010-04-15 15:18:18 UTC
Broadcom posted fixes two weeks ago to remove LRO and use GRO for bnx2x and David Miller has added them to net-next.  I suggest we take that patch rather than focus energy finding creative ways to disable LRO on this driver.

commit 4fd89b7af28292e190650b9b9bc4308658d81dd1
Author: Dmitry Kravkov <dmitry>
Date:   Thu Apr 1 19:45:34 2010 -0700

    bnx2x: Added GRO support

Comment 33 Eilon Greenstein 2010-04-15 16:01:25 UTC
Actually, that patch just adds GRO on top of LRO and if LRO is still active, LRO is still the one that will be used. This is because our LRO is HW/FW based (TPA) and it is much better (about double) than the SW GRO solution.
Please see more information in Bug 573114

Comment 34 Andy Gospodarek 2010-04-15 18:18:37 UTC
Thanks, Eilon.  Though your hardware LRO is still used (as is the case with other network drivers that now support GRO), this does get around the problems LRO has when asked to be in a forwarding device, right?

Comment 35 Stanislaw Gruszka 2010-04-16 14:30:04 UTC

*** This bug has been marked as a duplicate of bug 582367 ***

Comment 36 Eilon Greenstein 2010-04-18 06:47:36 UTC
Bug 582367 really contains the answer to this question. With the enhancement from that bug, the user does not need to manually disable LRO. Without it, the user should disable LRO so just GRO will be used

Comment 37 Stanislaw Gruszka 2010-04-28 15:14:08 UTC
Here are kernel packages with GRO and auto disable LRO for bnx2x, if someone is interested in testing:
http://people.redhat.com/sgruszka/rhel5/bz573114/

Comment 38 Siert Z. 2010-05-13 11:23:07 UTC
Stanislaw,

I tested your xen kernel on RHEL5.5 x86_64. The hardware: HP BL460c + bnx2x (Virtual connect - Broadcom Corporation NetXtreme II BCM57711E 10-Gigabit PCIe).

[root@hsl0000 ~]# uname -a
Linux hsl0000.domain.local 2.6.18-197.el5.bnx2x_testxen #1 SMP Wed Apr 28 08:58:56 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Everything works like a charm now.


Note You need to log in before you can comment on or make changes to this bug.