Bug 1424076
Summary: | vxlan: performance can suffer unless GRO is disabled on vxlan interface | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Patrick Talbert <ptalbert> | |
Component: | kernel | Assignee: | Jiri Benc <jbenc> | |
kernel sub component: | Tunnel | QA Contact: | Jan Tluka <jtluka> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | urgent | CC: | atomlin, atragler, brault, dhoward, ealcaniz, fbaudin, hsowa, jbenc, jeharris, jiji, mleitner, mmilgram, network-qe, pneedle, qding, rmanes | |
Version: | 7.3 | Keywords: | ZStream | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | kernel-3.10.0-599.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1431197 (view as bug list) | Environment: | ||
Last Closed: | 2017-08-02 05:42:35 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1298243, 1323132, 1429597, 1431197 |
Description
Patrick Talbert
2017-02-17 15:49:23 UTC
I was able to reproduce the problem, identify the fix and verify it. This is fixed by the following upstream commit: commit 88340160f3ad22401b00f4efcee44f7ec4769b19 Author: Martin KaFai Lau <kafai> Date: Fri Jan 16 10:11:00 2015 -0800 ip_tunnel: Create percpu gro_cell In the ipip tunnel, the skb->queue_mapping is lost in ipip_rcv(). All skb will be queued to the same cell->napi_skbs. The gro_cell_poll is pinned to one core under load. In production traffic, we also see severe rx_dropped in the tunl iface and it is probably due to this limit: skb_queue_len(&cell->napi_skbs) > netdev_max_backlog. This patch is trying to alloc_percpu(struct gro_cell) and schedule gro_cell_poll to process the skb in the same core. Signed-off-by: Martin KaFai Lau <kafai> Acked-by: Eric Dumazet <edumazet> Signed-off-by: David S. Miller <davem> Reproduction script: #!/bin/bash iface=em1 h=1 # h=2 for the other side oh=$((3 - h)) ip l s em1 mtu 9000 up ip -4 a f em1 ip a a 192.168.99.$h/24 dev em1 ethtool -U em1 rx-flow-hash udp4 sdfn ovs-vsctl del-br ovs0 ovs-vsctl add-br ovs0 ovs-vsctl add-port ovs0 vxlan0 -- set interface vxlan0 type=vxlan options:remote_ip=192.168.99.$oh ovs-vsctl add-port ovs0 i0 -- set interface i0 type=internal ip l s i0 up ip a a 192.168.98.$h/24 dev i0 if [[ $h = 2 ]]; then iperf3 -s else iperf3 -c 192.168.98.2 -P 100 -w 200K fi Patch(es) committed on kernel repository and an interim kernel build is undergoing testing Patch(es) available on kernel-3.10.0-599.el7 Reproduced on 3.10.0-514.6.1.el7 on ixgbe NIC. Throughput at ~8 Gbit/s Verified on kernel 3.10.0-655.el7 on ixgbe NIC. The performance numbers got back to ~9.1 Gbit/s. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:1842 |