Bug 854066

Summary: [rhel6] lvs: issues with GRO / icmp fragmentation needed
Product: Red Hat Enterprise Linux 6 Reporter: Marcelo Ricardo Leitner <mleitner>
Component: kernelAssignee: Jesper Brouer <jbrouer>
Status: CLOSED ERRATA QA Contact: Jan Tluka <jtluka>
Severity: high Docs Contact:
Priority: medium    
Version: 6.3CC: akrherz
Target Milestone: rcKeywords: Patch
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-328.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 854067 (view as bug list) Environment:
Last Closed: 2013-02-21 06:33:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
0001-Backport-of-8f1b03a-ipvs-allow-transmit-of-GRO-aggre.patch
none
0002-Also-handle-GSO-at-ip_vs_dr_xmit_v6.patch none

Description Marcelo Ricardo Leitner 2012-09-03 21:31:52 UTC
Created attachment 609477 [details]
0001-Backport-of-8f1b03a-ipvs-allow-transmit-of-GRO-aggre.patch

When using LVS LoadBalancer with GRO enabled, the server will often drop incoming packets and reply with ICMP Fragmentation Needed, nuking the performance.

This happens because GRO will make packets seem larger than they are at real and will confuse the sender.

Upstream commit 8f1b03a4c18e8f3f0801447b62330faa8ed3bb37 fixes this.

Attached is my backport of it for RHEL 6.

Comment 2 Marcelo Ricardo Leitner 2012-09-03 21:32:37 UTC
Created attachment 609478 [details]
0002-Also-handle-GSO-at-ip_vs_dr_xmit_v6.patch

This patch is also needed.

Comment 3 RHEL Program Management 2012-09-12 10:01:05 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 4 Jesper Brouer 2012-09-27 10:47:46 UTC
Fixing this is important, because of the bugs effect.

The bug will result in extremely bad TCP performance, when
enabling GRO/GSO on a machine running IPVS/LVS.  The TCP
connection will continue to "work", but only by retransmitting
all data (almost three time), as only TCP segments with a single
packet will be allowed through (without causing a ICMP frag
needed).

Comment 15 Jesper Brouer 2012-10-05 22:34:29 UTC
Simply make sure that GSO and TSO are enabled on all hosts.
 "ethtool -K ethX tso on gso on"

And run e.g. an iperf test through the LVS/IPVS setup.

Its the exact same test as in bug 854067 comment #3 (which is the RHEL5 equiv).

On my KVM system I see the following performance numbers:
 - With GSO enabled, and no patch:  58 Kbit/sec (very low, lots of TCP retrans)
 - Without GSO, and no patch:      1.3 Gbits/sec
 - With GSO, and with patch:      12.4 Gbits/sec

You can just tcpdump the traffic and see that big packets are transmitted, and observer that no ICMP error messages and TCP retransmits occur.

Comment 16 Jarod Wilson 2012-10-10 19:52:06 UTC
Patch(es) available on kernel-2.6.32-328.el6

Comment 19 Jan Tluka 2012-10-22 12:50:52 UTC
Reproduced on -279.el6, the TCP retransmission occurs and iperf gets to 90kb/s and retransmit occurs during test:

------------------------------------------------------------
Client connecting to 192.168.122.6, TCP port 5001
TCP window size: 23.2 KByte (default)
------------------------------------------------------------
[  3] local 192.168.122.1 port 39810 connected with 192.168.122.6 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-23.2 sec   256 KBytes  90.4 Kbits/sec


Verified on -330.el6 kernel, with gso/tso enabled I get throughput of 2 Gb/s and no retransmit occurs:

------------------------------------------------------------
Client connecting to 192.168.122.6, TCP port 5001
TCP window size: 23.2 KByte (default)
------------------------------------------------------------
[  3] local 192.168.122.1 port 39834 connected with 192.168.122.6 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  2.74 GBytes  2.35 Gbits/sec

Comment 22 errata-xmlrpc 2013-02-21 06:33:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html

Comment 23 daryl herzmann 2013-06-11 12:14:56 UTC
For what its probably not worth, I am still seeing this problem with RHEL6.4 2.6.32-358.11.1.el6.x86_64

I have a LVS NAT setup with a Broadcom Corporation NetXtreme II BCM5709 , I get brutal throughput with GRO enabled.  Turning it off and things are 'fine'.

# ethtool -k eth1
Features for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off

Comment 24 Marcelo Ricardo Leitner 2013-06-11 12:22:43 UTC
(In reply to Daryl Herzmann from comment #23)
> For what its probably not worth, I am still seeing this problem with RHEL6.4
> 2.6.32-358.11.1.el6.x86_64
> 
> I have a LVS NAT setup with a Broadcom Corporation NetXtreme II BCM5709 , I
> get brutal throughput with GRO enabled.  Turning it off and things are
> 'fine'.

Are you also seeing icmp fragmentation needed?

Anyway, as this is already in Errata state, please open a new bug. Feel free to Cc me on the new one.

Comment 25 daryl herzmann 2013-06-11 12:51:34 UTC
(In reply to Marcelo Ricardo Leitner from comment #24)
> Anyway, as this is already in Errata state, please open a new bug. Feel free
> to Cc me on the new one.

thanks, I opened https://bugzilla.redhat.com/show_bug.cgi?id=973190