Bug 814051 - [virtio-win][performance] degression performance in 2/8 RX TCP sessions tests w/ rx checksumming in Win2008 R2 guest
[virtio-win][performance] degression performance in 2/8 RX TCP sessions test...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: virtio-win (Show other bugs)
6.3
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Dmitry Fleytman
Virtualization Bugs
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-19 02:40 EDT by Quan Wenli
Modified: 2013-11-21 18:56 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Do not document. Bug found between releases. Code paths added to support Windows 8\Windows 2012 caused performance degradations on older OSes.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-21 18:56:29 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
rx checksum off vs on result w/virtio-win-prewhql-0.1-24 (5.55 KB, text/plain)
2012-04-19 02:40 EDT, Quan Wenli
no flags Details

  None (edit)
Description Quan Wenli 2012-04-19 02:40:12 EDT
Created attachment 578515 [details]
rx checksum off vs on result w/virtio-win-prewhql-0.1-24

Description of problem:

From the attached result about w/o and w/ rx checksum performance , we could see following conclusion:

1.In single TCP_STREAM session tests, normalized result(measured by rx_throughput/host_cpu%) in guest get over 15% improvement w/ enabled rx checksum. 
2.In 4 TCP_STREAM session tests,normalized result get alomost one.
3.In 2/8 TCP_STREAM sessions tests,normalized result depress around 10% or more w/ enabled rx checksum. it also means lower total rx throughput w/ higher or alomost host cpu consumption w/ enabled rx checksumming. 


Version-Release number of selected component (if applicable):

kernel-2.6.32-251.el6.x86_64
qemu-kvm-0.12.1.2-2.265.el6.x86_64
virtio-win-prewhql-0.1-24

How reproducible:

always

Steps to Reproduce:
1.turn off gro on the host.

# ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off

2.boot guest on the host and pinning vhost & vcpus threads on same numa
node.(node 1). 

numactl -m 1  /usr/libexec/qemu-kvm -name vm1 -drive
file=/usr/local/autotest/tests/kvm/images/win2008r2-64-virtio.raw,index=0,if=none,id=drive-ide0-0-0,media=disk,cache=none,format=raw,aio=native
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device
virtio-net-pci,netdev=idsVdEtL,mac=9a:d9:46:25:27:51,id=ndev00idsVdEtL,bus=pci.0,addr=0x5
-netdev tap,id=idsVdEtL,vhost=on -m 4096 -smp 2,cores=1,threads=1,sockets=2
-cpu qemu64,+sse2 -spice port=8000,disable-ticketing -vga qxl -rtc
base=localtime,clock=host,driftfix=slew -boot order=cdn,once=c,menu=off -M
rhel6.3.0 -usb -device usb-tablet -enable-kvm -monitor stdio

taskset -p 20 $vhost_thread
taskset -p 40 $vcpu1_thread
taskset -p 80 $vcpu2_thread

3.turn on rx checksum feature in guest 
4.running netserver on the guest and running netperf test on the external host.
getting first batch result.
5.turn off checksum feature in guest.
6.running netserver on the guest and running netperf test on the external host.
getting second batch result.

Actual results:

degression performance in 2/8 RX TCP sessions tests w/ rx checksumming.

Expected results:


Additional info:
Comment 2 Ronen Hod 2012-04-24 05:26:58 EDT
This requires some more research, so I postpone this bug to 6.4.
Anyhow, our default is checksum=off.
It seems as if the pinning + the checksum calculation work somehow do not work well together.
Still, I would expect that with "rx-checksumming: on", the CPU consumption will be lower.
Comment 3 Yan Vugenfirer 2012-04-24 09:07:20 EDT
Several suggestions from Michael Tsirkin regarding the research:

1. Disable rx checksumming on the host
2. Try to tun the test of different host
3. Try to run busy loop on the gust during the test (Michael referenced some different bug with benchmarks).
Comment 4 Quan Wenli 2012-04-25 05:10:42 EDT
(In reply to comment #3)
> Several suggestions from Michael Tsirkin regarding the research:
> 
> 1. Disable rx checksumming on the host

I will test it w/ disabled rx checksumming on host.

> 2. Try to tun the test of different host

It's hard for me to try since the two 10Gb network hardwares dedicated on the certain two hosts.

> 3. Try to run busy loop on the gust during the test (Michael referenced some
> different bug with benchmarks).
could you describe more details about this research ?
Comment 6 Ronen Hod 2012-10-23 13:31:08 EDT
Too late for 6.4. Deferring again, to 6.5
Comment 8 Yan Vugenfirer 2013-04-22 09:05:06 EDT
Please retest with build 59.

Best regards,
Yan.
Comment 9 Quan Wenli 2013-04-23 23:10:55 EDT
(In reply to comment #8)
> Please retest with build 59.
> 
> Best regards,
> Yan.

The tests on build 59 is on running.
Comment 13 Quan Wenli 2013-05-06 04:59:41 EDT
According to results in comment #10, the issue does not existed on build-59. Change the bug to closed.
Comment 15 Mike Cao 2013-05-06 05:03:22 EDT
Moving status to VERIFIED based on comment #10
Comment 22 errata-xmlrpc 2013-11-21 18:56:29 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1729.html

Note You need to log in before you can comment on or make changes to this bug.