Created attachment 422566 [details] win2k3_64_nuttcp log for 30 sec Description of problem: Running nuttcp benchmark, Win2003-64 guest got 30% lower network throughput compared with rhel5 guest. Version-Release number of selected component (if applicable): rhev-hypervisor-5.5-2.2.4.el5rhev vdsm 2.2.0.62 kvm-83-164.el5_5.10 How reproducible: 100% Setup: 1. A rhel5 host working as nuttcp server 2. A rhev-h hosting both Win2003 and rhel5 guests 3. rhev-h connected to rhel5 host using machine-to-machine connection, no switch/hub was used Steps to Reproduce: 1. Start "nuttcp -s" on rhel5 host 2. Start only Win2003 guest with virtio NIC, and run "nuttcp -t -R1024M -T30s -i1 $rhel5host" 3. Start only rhel5 guest with virtio NIC, and run "nuttcp -t -R1024M -T30s -i1 $rhel5host" 4. compare the network throughputs returned from step 2 and step 3 Actual results: Win2003-64 guest (560.9239 Mbps) got 30% lower network throughput compared with rhel5 guest (820.7432 Mbps) Expected results: no such amount of difference between Windows-virtio-NIC and rhel-virtio-NIC Additional info: nuttcp-5.1.11 CLI for Windows: /usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -startdate 2010-06-09T19:59:52 -name Net_win03_64 -smp 1,cores=1 -k en-us -m 2048 -boot c -net nic,vlan=1,macaddr=00:1a:4a:a8:02:20,model=virtio -net tap,vlan=1,ifname=virtio_11_1,script=no -net nic,vlan=2,macaddr=00:1a:4a:a8:02:23,model=virtio -net tap,vlan=2,ifname=virtio_11_2,script=no -drive file=/rhev/data-center/6bfdfd58-c65c-4f26-8699-3f26b6266722/684efde0-002d-4590-b898-5ee571876796/images/8342d3f2-6382-4951-a607-6aee3476a291/02ff8dc4-1036-4a0d-9d9b-b95918095a08,media=disk,if=virtio,cache=off,serial=51-a607-6aee3476a291,boot=on,format=raw,werror=stop -pidfile /var/vdsm/50c9a0da-13e6-4d46-bb2a-acb4ca2594f9.pid -vnc 0:11,password -cpu qemu64,+sse2,+cx16 -M rhel5.5.0 -notify all -balloon none -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=5.5-2.2-4,serial=58A72E46-7236-DF11-BBDA-2905B918001F_00:1b:21:55:b3:b8,uuid=50c9a0da-13e6-4d46-bb2a-acb4ca2594f9 -vmchannel di:0200,unix:/var/vdsm/50c9a0da-13e6-4d46-bb2a-acb4ca2594f9.guest.socket,server -monitor unix:/var/vdsm/50c9a0da-13e6-4d46-bb2a-acb4ca2594f9.monitor.socket,server CLI for RHEL: /usr/libexec/qemu-kvm -no-hpet -no-kvm-pit-reinjection -usbdevice tablet -rtc-td-hack -startdate 2010-06-09T05:48:15 -name networ_rhel5u5 -smp 1,cores=1 -k en-us -m 1024 -boot c -net nic,vlan=1,macaddr=00:1a:4a:a8:02:21,model=virtio -net tap,vlan=1,ifname=virtio_10_1,script=no -net nic,vlan=2,macaddr=00:1a:4a:a8:02:22,model=virtio -net tap,vlan=2,ifname=virtio_10_2,script=no -drive file=/rhev/data-center/6bfdfd58-c65c-4f26-8699-3f26b6266722/684efde0-002d-4590-b898-5ee571876796/images/e3651fcf-1142-40fd-94ea-8b0cf1166ee3/72c91367-c018-40b4-8a33-5017fb744ce2,media=disk,if=virtio,cache=off,serial=fd-94ea-8b0cf1166ee3,boot=on,format=raw,werror=stop -pidfile /var/vdsm/9cdfedd2-4581-42c8-a9cb-3e76243ec634.pid -vnc 0:10,password -cpu qemu64,+sse2,+cx16 -M rhel5.5.0 -notify all -balloon none -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=5.5-2.2-4,serial=58A72E46-7236-DF11-BBDA-2905B918001F_00:1b:21:55:b3:b8,uuid=9cdfedd2-4581-42c8-a9cb-3e76243ec634 -vmchannel di:0200,unix:/var/vdsm/9cdfedd2-4581-42c8-a9cb-3e76243ec634.guest.socket,server -monitor unix:/var/vdsm/9cdfedd2-4581-42c8-a9cb-3e76243ec634.monitor.socket,server
Created attachment 422567 [details] rhel5 nuttcp log for 30 secs
Have you verified the nuttcp implementation on Windows is not 30% less efficient than the Linux implementation? I warmly suggest running more tests before reaching a conclusion here (though I suspect you are right and Windows is slower than Linux). For Windows I suggest using http://www.microsoft.com/whdc/device/network/tcp_tool.mspx .
(In reply to comment #3) > Have you verified the nuttcp implementation on Windows is not 30% less > efficient than the Linux implementation? > I warmly suggest running more tests before reaching a conclusion here (though I > suspect you are right and Windows is slower than Linux). > For Windows I suggest using > http://www.microsoft.com/whdc/device/network/tcp_tool.mspx . Thank you for the suggestion. I will test with more perf tools, then put results here later.
According to comment 3, today I use NTttcp tool to re-test this issue again. I try two direction testing: (1) Windows2003 guest -------> Winxp remote host : 540M throughput(Mbit/s) (2) Winxp remote host -------> Windows2003 guest : 550M throughput(Mbit/s) For sender, use command as: NTttcps.exe -m 1,0,192.168.5.1 -a 2 For receiver, use command as: NTttcpr.exe -m 1,0,192.168.5.1 -a 6 From the results, seems that the windows guest still get 550Mbps bandwidth.
Did you use the registry config from http://www.linux-kvm.org/page/WindowsGuestDrivers/kvmnet/registry ? Is TSO on in the guest and the host? What's the cpu consumption on the host? kvm_stat?
(In reply to comment #5) > According to comment 3, today I use NTttcp tool to re-test this issue again. I > try two direction testing: > (1) Windows2003 guest -------> Winxp remote host : 540M throughput(Mbit/s) > (2) Winxp remote host -------> Windows2003 guest : 550M throughput(Mbit/s) who's saturated? Is the guest in 100% CPU? > For sender, use command as: NTttcps.exe -m 1,0,192.168.5.1 -a 2 > For receiver, use command as: NTttcpr.exe -m 1,0,192.168.5.1 -a 6 We are also using '-l 256k -p' > From the results, seems that the windows guest still get 550Mbps bandwidth.
I guess XinSun will give answer to Comment 6. Below was what the iperf benchmark I got. From the result below, we can see that Windows could gain great network throughput with 128KB or 256KB TCP window size no matter vhost is on or off. qemu-kvm-0.12.1.2-2.71.el6.x86_64 Setup: Host A: RHEL6 host running iperf server # iperf -s -w $TCPWinSize Host B: hosting guests which run iperf client # iperf -c $serverip -w $TCPWinSize Both Host A and Host B have a "Broadcom Corporation NetXtreme BCM5754 Gigabit Ethernet PCI Express" ---------------------------------------------------------------------------------------- guest\TCP Window Size 16KB(Mb/s) 32KB(Mb/s) 64KB(Mb/s) 128KB(Mb/s) 256KB(Mb/s) ---------------------------------------------------------------------------------------- win2008r2-vhost-off 38 148 429 728 780 win2008r2-vhost-on 148 381 667 806 822 win7-64-vhost-off 75.5 157 409 711 790 win7-64-vhost-on 145 369 689 833 859 rhel6-vhost_off 73.8 63.8 217 702 906 rhel6-vhost_on 172 378 638 901 891 rhel5.5-64-vhost-off 81.8 164 346 794 896 rhel5.5-64-vhost-on 206 395 669 804 838 rhel4.9-64-vhost-off 87.6 135 317 578 597 rhel4.9-64-vhost-on 184 303 401 656 691 ---------------------------------------------------------------------------------------- Again, from the above table, we can see that network throughputs increase significantly with vhost on when TCP Window size is below 128KByte. When TCP window size is above 128KByte, network throughputs wouldn't gain benifit by enabling vhost, especially for RHEL guests.
(In reply to comment #8) > I guess XinSun will give answer to Comment 6. > > Below was what the iperf benchmark I got. From the result below, we can see > that Windows could gain great network throughput with 128KB or 256KB TCP window > size no matter vhost is on or off. Again - where's the bottleneck? If the VM is in 100%, can you see if it's in the user or kernel? In any case, if possible, can you try with iperf 2.05b1? At least on the Linux side, should be easy to compile. On Windows, you might need to try with cygwin, and might be more complex. BTW, it sounds wrong to test with a (only) 1GB/sec interface! How do you expect to go above 1Gb/sec?
(In reply to comment #9) >> Below was what the iperf benchmark I got. From the result below, we can see >> that Windows could gain great network throughput with 128KB or 256KB TCP window >> size no matter vhost is on or off. > Again - where's the bottleneck? If the VM is in 100%, can you see if it's in > the user or kernel? Sorry, I am not sure whether I understand your meaning. What I can tell from the table in Comment 8 is the network throughput is greatly affected by TCP window size (consider here only TCP perf) within a certain point (<256KB). For RHEL6 I can't change TCP window size greater than 256K by using "iperf -w $SIZE", Windows can though. Here is the CPU usage inside RHEL6-64 guest when running iperf client "top - 09:55:32 up 13 min, 3 users, load average: 0.06, 0.03, 0.00 Tasks: 100 total, 1 running, 99 sleeping, 0 stopped, 0 zombie Cpu(s): 0.5%us, 8.9%sy, 0.0%ni, 76.8%id, 0.0%wa, 1.3%hi, 12.6%si, 0.0%st Mem: 2055508k total, 229536k used, 1825972k free, 10872k buffers Swap: 4128760k total, 0k used, 4128760k free, 60752k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1739 root 20 0 97.7m 1284 1096 S 41.1 0.1 0:06.37 iperf 1743 root 20 0 14936 1136 880 R 0.7 0.1 0:00.05 top 10 root 20 0 0 0 0 S 0.3 0.0 0:00.48 events/1 1 root 20 0 19236 1408 1140 S 0.0 0.1 0:00.81 init " and the corresponding CPU usage of the host running the VM " top - 09:50:40 up 3 days, 17:51, 3 users, load average: 0.17, 0.06, 0.01 Tasks: 2 total, 1 running, 1 sleeping, 0 stopped, 0 zombie Cpu(s): 14.8%us, 10.6%sy, 0.0%ni, 70.7%id, 0.9%wa, 0.5%hi, 2.6%si, 0.0%st Mem: 7786200k total, 6797204k used, 988996k free, 38944k buffers Swap: 9764856k total, 0k used, 9764856k free, 5832556k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16254 root 20 0 2426m 349m 3228 S 79.2 4.6 1:27.54 qemu-kvm 2018 root 20 0 0 0 0 R 30.6 0.0 4:56.69 vhost " > In any case, if possible, can you try with iperf 2.05b1? At > least on the Linux side, should be easy to compile. On Windows, you might need > to try with cygwin, and might be more complex. I used iperf version 2.0.4 compiled from src at http://sourceforge.net/projects/iperf/. On Windows, I used iperf version 1.7.0 from http://www.noc.ucf.edu/Tools/Iperf/, which doesn't require cygwin. I will try iperf 2.05b1, but expect no big difference. > > BTW, it sounds wrong to test with a (only) 1GB/sec interface! How do you expect > to go above 1Gb/sec? It might not be enough to test only 1Gigabit NIC. However, I don't think it is wrong to produce result for such a environment.
(In reply to comment #6) > Did you use the registry config from > http://www.linux-kvm.org/page/WindowsGuestDrivers/kvmnet/registry ? > > Is TSO on in the guest and the host? Both are on. host# ethtool -k eth5 ... tcp segmentation offload: on .. > What's the cpu consumption on the host? > kvm_stat? host cpu consumption and kvm_stat will be posted later for new test results. Pls note that following status is NOT related to comment 5, but new tests instead.
(In reply to comment #6) > Did you use the registry config from > http://www.linux-kvm.org/page/WindowsGuestDrivers/kvmnet/registry ? comment 5's results didn't configure the registry mentioned right above, but new tests later did (see below). > > Is TSO on in the guest and the host? > What's the cpu consumption on the host? > kvm_stat?
Created attachment 423271 [details] kvm_stat of 6sec interval
Setup: Host A: WinXP-64 Host B: RHEV-H hosting Win2003-64 guest 1. Win2003 guest as sender running NTttcps.exe, and WinXP as receiver running NTttcpr.exe sender: NTttcps.exe -m 1,0,$IP_WinXP -a 2 $sender_option receiver: NTttcpr.exe -m 1,0,$IP_WinXP -a 6 $recv_option --------------------------------------------------------------------------- ID sender_option recv_option guest_cpu Throughput --------------------------------------------------------------------------- 1 -- -- 89.42% 523.394 2 -l 256k -- 92.45% 551.426 3 -l 256k -rb 128k 94.29% 738.787 4 -l 256k -rb 256k 95.86% 861.297 * 5 -l 256k -rb 2048k 94.61% 804.663 6 -l 1024k -rb 2048k 89.83% 776.489 7 -l 512k -rb 512k 93.57% 818.409 8 -l 128k -rb 128k 94.29% 712.764 --------------------------------------------------------------------------- From the table we can see the Win2003 guest got maximum throughput with case 4. Also notice that case 1 is just what was mentioned in comment 5. In such cases, Host B (which ran win2003 guest) cpu usage was as follows " top - 10:13:19 up 1 day, 1:07, 2 users, load average: 0.67, 0.36, 0.27 Tasks: 226 total, 1 running, 225 sleeping, 0 stopped, 0 zombie Cpu(s): 5.5%us, 4.2%sy, 0.0%ni, 89.3%id, 0.0%wa, 0.1%hi, 1.0%si, 0.0%st Mem: 32835872k total, 2899552k used, 29936320k free, 112996k buffers Swap: 24809464k total, 0k used, 24809464k free, 492388k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9519 vdsm 16 0 2272m 2.0g 3872 S 125.1 6.5 62:21.19 qemu-kvm 7665 ntp 15 0 19188 4888 3788 S 0.0 0.0 0:00.00 ntpd 7998 vdsm 10 -5 368m 12m 3004 S 0.3 0.0 5:37.21 vdsm 4047 root RT 0 87620 3676 2800 S 0.0 0.0 0:00.22 multipathd 24366 root 15 0 87972 3344 2620 S 0.0 0.0 0:00.07 sshd 8825 root 0 -15 4476 2432 1656 S 0.0 0.0 0:00.00 iscsid 7563 haldaemo 15 0 31196 4380 1640 S 0.0 0.0 0:00.64 hald 7994 vdsm 20 -5 82048 4988 1364 S 0.0 0.0 0:00.02 vdsm 26117 root 15 0 10900 1472 1136 S 0.0 0.0 0:00.00 bash " kvm_stat with 6 sec inteval can be seen from attachment of Comment 13.
Running iperf client in Win2003 guest and iperf server in WinXP host both with TCP window size 256K, I got a throughtput of 752Mbits/sec, which was smaller than the maximum value (861.297) listed in Comment 14. (I believe it is related to the different way Winsock APIs are used. Maybe I am wrong?) The nuttcp benchmark tool used in comment 0 doesn't make the best use of Winsock API, not to mention it calls cygwin lib. I think Yaniv was right as he mentioned in comment 3 that nuttcp was not efficient. In all, based on NTttcp test result, the performance of virtio network on Windows is generally good. Could I close this as NOTABUG?
FYI: (http://www.myri.com/serve/cache/511.html#windows) Benchmarking Network Performance on Windows? The performance of socket applications under the Windows operating system is very sensitive with respect to the underlying socket API. A key aspect for getting good performance is that the Winsock2 API has been used. Winsock2 introduces overlapping of communication and allows multiple outstanding send or recv requests at a time. Sockets need to be created using WSASocket with the overlap flag. The network benchmarking program * ntttcp is a good example of a benchmark program which uses the Winsock2 API. NTTTCP is a closed-source benchmark available from Microsoft at * http://www.microsoft.com/whdc/device/network/TCP_tool.mspx and is based on the original ttcp benchmark. Performance results can vary and are dependent on CPU type and the Windows operating system version. Refer to Myri-10G 10-Gigabit Ethernet Performance Measurements web page for further details. In contrast, some UNIX network benchmarking tools like * iperf, * netperf, * nuttcp, * ttcp, * NetIO and others do not use the Winsock2 API and may not perform well (and sometimes not even function correctly) on Windows. Even worse can be the performance when an additional intermediate library such as cygwin.dll is required to run the application. You can check your source code and look for WSASend and WSARecv which will indicate the use of the Winsock2 API
I think the underlying problem may be that your buffer size is the same or smaller than your message size. That would imply that the buffer can only hold at most one message. Can you try making the buffer size much bigger (factor of 4 or more) or using a much smaller message size, say 8K ? Either of those cases with the large buffer size should allow for multiple messages to get handled more efficiently.
(In reply to comment #17) > I think the underlying problem may be that your buffer size is the same or > smaller than your message size. That would imply that the buffer can only hold > at most one message. > > Can you try making the buffer size much bigger (factor of 4 or more) or using a > much smaller message size, say 8K ? Have you seen they are included in comment 14 already? case 4 being the same msg size (-l 256K) and receiver buffer size (-rb 256k); case 5 being the receiver buffer size (-rb 2048k) 4 times the size of the msg (-l 256k), however case 4 gives a higher throughput than case 5. Not the bigger the better, IMHO. > > Either of those cases with the large buffer size should allow for multiple > messages to get handled more efficiently.
In addition - registry setting for improved performance: http://www.linux-kvm.org/page/WindowsGuestDrivers/kvmnet/registry
Closed according to Comment 14 - 16, 19.