Bug 1288124

Summary: [UPSTREAM] net: performance regression on ixgbe (Intel 82599EB 10-Gigabit NIC)
Product: [Fedora] Fedora Reporter: Otto Sabart <osabart>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: aokuliar, gansalmon, itamar, jbrouer, jhladky, jonathan, kernel-maint, madhu.chinakonda, mchehab, nhorman
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-09 15:11:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
TCP maerts
none
TCP stream none

Description Otto Sabart 2015-12-03 15:31:48 UTC
Created attachment 1101835 [details]
TCP maerts

Description of problem:
We are using netperf for network performance testing.

We can see approximatelly 8% regression when testing TCP Stream and more than
15% when testing TCP Maerts on ixgbe when comparing 4.3 with 4.4-rc3 (see
attachments).

For every test setup we do five runs (30 seconds duration of each) and after we
calculate arithmetic mean.

It seems the problem started with 4.4-rc1.

Version-Release number of selected component (if applicable):
- Kernel 4.4-rc3 on RHEL7.2

How reproducible:
100%

Steps to reproduce:
1. install latest upstream kernel and boot

2.
For tcp simple stream:
$ netperf -P0 -cC -t TCP_STREAM -l 30 -L 172.16.29.10 -H 172.16.29.20 -T ,0  -T 0, -- -m $msg_size

For maerts test run:
$ netperf -P0 -cC -t TCP_MAERTS -l 30 -L 172.16.32.10 -H 172.16.32.20 -- -M $msg_size


Actual results:

TCP STREAM:
running command:  netperf -P0 -cC -t TCP_STREAM  -l 30  -L 172.16.29.10  -H 172.16.29.20  -T ,0  -T 0, -- -m 1024
testsys - INFO -  87380  16384   1024    30.01      8579.79   3.23     4.30     0.740   0.986
DEBUG - netperf test output: 87380  16384   1024    30.01      8579.79   3.23     4.30     0.740   0.986
exitcode:0
INFO - test results: rcv_socket_size : 87380 snd_socket_size : 16384 msg_size : 1024 time : 30.01 throughput : 8579.79 loc_util : 3.23 rem_util : 4.30 service_local : 0.740 service_remote : 0.986

TCP MAERTS:
running command:  netperf -P0 -cC -t TCP_STREAM  -l 30  -L 172.16.29.10  -H 172.16.29.20  -T ,0  -T 0, -- -M 1024
testsys - INFO -  87380  16384  16384    30.01      5287.22   0.83     4.31     0.310   1.603
DEBUG - netperf test output: 87380  16384  16384    30.01      5287.22   0.83     4.31     0.310   1.603
exitcode:0
INFO - test results: rcv_socket_size : 87380 snd_socket_size : 16384 msg_size : 16384 time : 30.01 throughput : 5287.22 loc_util : 0.83 rem_util : 4.31 service_local : 0.310 service_remote : 1.603


Expected results:
Throughtput improved to v4.3 levels (see attached graphs).

It means:
TCP STREAM: ~9379.92 (msg size 1024 B), ~9377.99 (msg size 2048 B)
TCP MAERTS: ~7503.38 (msg size 1024 B), ~9377.02 (msg size 2048 B)


Additional info:
If you need more information, just let me know.

Comment 1 Otto Sabart 2015-12-03 15:33:29 UTC
Created attachment 1101836 [details]
TCP stream

Comment 2 Otto Sabart 2015-12-07 11:42:36 UTC
After some discussion on netdev mailing list [0] we were able to find out
that this 'regression' is caused by disabled LRO.

$ ethtool -k ixgbe | grep large-receive-offload
large-receive-offload: off

The LRO on ixgbe is disabled by _default_ from commit 72bfd32d2f84 ("ixgbe: disable LRO by default").

It is possible to turn it on by running:
$ ethtool -K ixgbe lro on


[0] http://www.spinics.net/lists/netdev/msg355053.html