| Summary: | [virtio-win][performance] degression performance in 2/8 RX TCP sessions tests w/ rx checksumming in Win2008 R2 guest | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Quan Wenli <wquan> | ||||
| Component: | virtio-win | Assignee: | Dmitry Fleytman <dfleytma> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 6.3 | CC: | acathrow, areis, bcao, bsarathy, dfleytma, jasowang, juzhang, lnovich, michen, mst, rhod, ybendito, yvugenfi | ||||
| Target Milestone: | rc | Keywords: | Reopened | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Do not document. Bug found between releases. Code paths added to support Windows 8\Windows 2012 caused performance degradations on older OSes.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-11-21 23:56:29 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
This requires some more research, so I postpone this bug to 6.4. Anyhow, our default is checksum=off. It seems as if the pinning + the checksum calculation work somehow do not work well together. Still, I would expect that with "rx-checksumming: on", the CPU consumption will be lower. Several suggestions from Michael Tsirkin regarding the research: 1. Disable rx checksumming on the host 2. Try to tun the test of different host 3. Try to run busy loop on the gust during the test (Michael referenced some different bug with benchmarks). (In reply to comment #3) > Several suggestions from Michael Tsirkin regarding the research: > > 1. Disable rx checksumming on the host I will test it w/ disabled rx checksumming on host. > 2. Try to tun the test of different host It's hard for me to try since the two 10Gb network hardwares dedicated on the certain two hosts. > 3. Try to run busy loop on the gust during the test (Michael referenced some > different bug with benchmarks). could you describe more details about this research ? Too late for 6.4. Deferring again, to 6.5 Please retest with build 59. Best regards, Yan. (In reply to comment #8) > Please retest with build 59. > > Best regards, > Yan. The tests on build 59 is on running. According to results in comment #10, the issue does not existed on build-59. Change the bug to closed. Moving status to VERIFIED based on comment #10 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1729.html |
Created attachment 578515 [details] rx checksum off vs on result w/virtio-win-prewhql-0.1-24 Description of problem: From the attached result about w/o and w/ rx checksum performance , we could see following conclusion: 1.In single TCP_STREAM session tests, normalized result(measured by rx_throughput/host_cpu%) in guest get over 15% improvement w/ enabled rx checksum. 2.In 4 TCP_STREAM session tests,normalized result get alomost one. 3.In 2/8 TCP_STREAM sessions tests,normalized result depress around 10% or more w/ enabled rx checksum. it also means lower total rx throughput w/ higher or alomost host cpu consumption w/ enabled rx checksumming. Version-Release number of selected component (if applicable): kernel-2.6.32-251.el6.x86_64 qemu-kvm-0.12.1.2-2.265.el6.x86_64 virtio-win-prewhql-0.1-24 How reproducible: always Steps to Reproduce: 1.turn off gro on the host. # ethtool -k eth2 Offload parameters for eth2: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: off large-receive-offload: off 2.boot guest on the host and pinning vhost & vcpus threads on same numa node.(node 1). numactl -m 1 /usr/libexec/qemu-kvm -name vm1 -drive file=/usr/local/autotest/tests/kvm/images/win2008r2-64-virtio.raw,index=0,if=none,id=drive-ide0-0-0,media=disk,cache=none,format=raw,aio=native -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device virtio-net-pci,netdev=idsVdEtL,mac=9a:d9:46:25:27:51,id=ndev00idsVdEtL,bus=pci.0,addr=0x5 -netdev tap,id=idsVdEtL,vhost=on -m 4096 -smp 2,cores=1,threads=1,sockets=2 -cpu qemu64,+sse2 -spice port=8000,disable-ticketing -vga qxl -rtc base=localtime,clock=host,driftfix=slew -boot order=cdn,once=c,menu=off -M rhel6.3.0 -usb -device usb-tablet -enable-kvm -monitor stdio taskset -p 20 $vhost_thread taskset -p 40 $vcpu1_thread taskset -p 80 $vcpu2_thread 3.turn on rx checksum feature in guest 4.running netserver on the guest and running netperf test on the external host. getting first batch result. 5.turn off checksum feature in guest. 6.running netserver on the guest and running netperf test on the external host. getting second batch result. Actual results: degression performance in 2/8 RX TCP sessions tests w/ rx checksumming. Expected results: Additional info: