Description of problem: When starting a bridged guest under Xen, a vif gets created with a txqueuelen of 32. This kills throughput as the value is too low, especially on a 10gigabit network or with guest to guest on the same Dom0. A good default would be 1000 . Version-Release number of selected component (if applicable): How reproducible: every time Steps to Reproduce: 1. create two Xen guests that share a bridge 2. run a netperf TCP_STREAM test between the guests, record results 3. use ifconfig vif X.Y txqueuelen 1000 on both vifs 4 Repeat step #2 5 Compare / contrast results from steps 2 and 4 Actual results: Guest to guest throughput was roughly 3gb/sec Expected results: Much higher throughput, on my test system, I was able to go from 3Gb/s to 16 Gb/sec by increasing txqueuelen Additional info:
Mark, So, a couple of questions here. First, what does a normal e1000 NIC use as a default txqueuelen? How about a normal 10G card? Second, what is the downside to using a very large txqueuelen? Thanks, Chris Lalancette
I've done some quick performance testing using guest -> guest testing on the same host. These tests ran a single stream of netperf and I varied the txqueuelen of the vifs to get an idea of the performance curves. Note that these are "quick and dirty" numbers and should suffice for the purposes of this evaluation. txQlen netperf message size 512 byte 4096 byte ----------------------------------- 32 1634.13 8402.32 64 1292.05 14198.48 128 4142.58 14677.39 256 4439.77 14626.80 512 5251.48 14809.59 1024 4875.96 15358.55 The data shows that the throughput increases up through a txqueuelen of 512. This is best shown with the smaller message size of 512 bytes as the 4096 byte messages are hitting the memory bandwidth limitation of the system. I have not tried multiple external boxes with a single guest. This scenario would stress the vif queuelen as well as you would need the queuelen space help buffer the traffic.
If people are concerned about memory utilization, could someone please quantify it ? In other words, how much memory does a vif with a txqueuelen of 32 use vs a txqueuelen of 500 ?
Also, please note that kvm creates the tap device with a txqueuelen of 500 So based on the data above and the txqueuelen of kvm, perhaps we should go with 500 ?
I'm investigating this one and although I don't know what the value is there for e1000 card but on the e1000e card (my laptop) the txqueuelen value is set to value of 1000. Also, on my workstation with Broadcom Corporation NetXtreme BCM5755 Gigabit Ethernet PCI Express card the real ethernet device (pethX running the kernel-xen) is having txqueuelen of 1000 and the txqueuelen value of vifX.Y is 32 as stated above. On KVM the vnet0 device is being created for the guest with txqueuelen set to 500 so I guess 500 would be good. Michal
I did following testing on my Xen host: 1. started up 2 HVM guests 2. downloaded and recompiled netperf 2.4.5 from official FTP site [1] and installed to those guests 3. run netserver on guest A and also disabled iptables 4. run `netperf -t TCP_STREAM -H guestA` on guest B and the results were: guestB# netperf -t TCP_STREAM -H 10.34.26.225 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.34.26.225 (10.34.26.225) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 63.18 host# ifconfig vifX.Y txqueuelen 500 on vif devices for both guests guestB# netperf -t TCP_STREAM -H 10.34.26.225 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.34.26.225 (10.34.26.225) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 65.32 host# ifconfig vifX.Y txqueuelen 1000 on vif devices for both guests guestB# netperf -t TCP_STREAM -H 10.34.26.225 # netperf -t TCP_STREAM -H 10.34.26.225 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.34.26.225 (10.34.26.225) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 83.73 # So there's almost no change between txqueuelen 32 and 500 but there's a difference between txqueuelen 500 and 1000. Tested on quad-core Intel(R) Xeon(R) CPU X5460 with 8GB RAM using the RHEL-5 i386 and x86_64 HVM guests with 1G RAM each. Component versions were xen-3.0.3-117virttest31 and kernel-xen-2.6.18-222.el5. Based on those results where there's no significant change between txqueuelen 32 and 500 there's no point of changing it but re-reading the comment #0 I realized that may be hardware dependent as well. We could consider changing this to the same value like it's used for KVM but not sure about this one. Michal [1] ftp://ftp.netperf.org/netperf/
Mark, could you please review my steps and results and notify me whether I did something wrong with the procedure? Re-reading all the comments again I'm quite unsure my procedure was correct or the netperf tools is well-used by me (it's been the first time I used the netperf tool at all). Thanks, Michal
My testing showed there's not significant throughput increase with higher txqueuelen. As there's module parameter netbk_queue_length that can be used for setting default txqueuelen value, I recommended to just add technical note and close this bz without change.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: There's possibility to set default txqueuelen by using parameter netbk_queue_length for netback module. Default value of this parameter is 32.
Closing as there's no fix needed for this bz - Only technical note has to be added mentioning way to change txqueuelen default.
The developer tests were clearly flawed if you are only getting a max of 83 Mbits / sec. Guest to guest is typically limited by memory speed. The results here should be up in the Gb/sec range. In addition, going from 65 Mb to 83Mb is still a very significant change which seems to be ignored. Reopening as this solution is not acceptable from a performance perspective.
Hi Mark, why setting default txqueuelen via module parameter instead of hardcodding the value is not acceptible?
(In reply to comment #19) > The developer tests were clearly flawed if you are only getting a max of 83 > Mbits / sec. Guest to guest is typically limited by memory speed. The results > here should be up in the Gb/sec range. > Yes, this was guest to guest communication in my testing. Nevertheless I have to agree with Mirek since if there's the option to change it using the netbk_queue_length parameter for the netback module why should we hard-code this? There's no significant performance increase to have it about 200-500% better but just 30% performance increase is not that much to change default because of that. Considering the fact there's the option to change the value I recommend closing it again. Any objections Mark? If so then why? Michal
> why setting default txqueuelen via module parameter instead of hardcodding the > value is not acceptible? If the perf guys can find a better default, we should definitely use it. Changing the default is not "hardcoding" the value, anybody who wishes a smaller txqueuelen for some reason (memory usage) can throttle it down. I agree with Mark that Michal's experiments are probably invalid since 83 Mb/sec guest-to-guest is definitely below the expected guest-to-guest performance. Do you recall what guest were you using? With some differences probably due to the "beefiness" of the machines, I could reproduce Mark's results (at least on 512 byte messages) between two RHEL5 guests. With two different message sizes, and requesting a 5% confidence interval from netperf, I got: 4096 512 ---------------------------- 32 5386.56 1013.13 64 4933.22 1375.50 128 4711.75 1564.99 256 4733.52 1694.21 500 4773.61 1690.33 which also suggests a setting of 256 or 500.
(In reply to comment #22) > > why setting default txqueuelen via module parameter instead of hardcodding the > > value is not acceptible? > > If the perf guys can find a better default, we should definitely use it. > Changing the default is not "hardcoding" the value, anybody who wishes a > smaller txqueuelen for some reason (memory usage) can throttle it down. > > I agree with Mark that Michal's experiments are probably invalid since 83 > Mb/sec guest-to-guest is definitely below the expected guest-to-guest > performance. Do you recall what guest were you using? I've been using RHEL-5 guests for this kind of communication nevertheless Mark didn't tell me the right procedure (my procedure has been described in the comment #14). According to Mirek the test results for him were the same so if there was a bad procedure used in the testing I'd welcome some description what procedure did Mark use. > > With some differences probably due to the "beefiness" of the machines, I could > reproduce Mark's results (at least on 512 byte messages) between two RHEL5 > guests. With two different message sizes, and requesting a 5% confidence > interval from netperf, I got: > > 4096 512 > ---------------------------- > 32 5386.56 1013.13 > 64 4933.22 1375.50 > 128 4711.75 1564.99 > 256 4733.52 1694.21 > 500 4773.61 1690.33 > > which also suggests a setting of 256 or 500. Interesting, what procedure did you use to get those results? I was using RHEL-5 guests as well and the test results for my machine were those mentioned in comment #14. Michal
Here is what I used for one run: netperf -H ... -p ... -i 50,3 -- -m <message size> (The -i option does not affect the order of magnitude of the results, it only repeats the run until netperf reaches a good confidence level). Paolo
The problem is more apparent at smaller packet sizes. For netperf you should use something like this netperf -l 60 -H <ip addr> -- -m 512 In the data I shared in comment #8 you can see that we go for a throughput of 1.6 Gb/s to over 5 Gb/s using a 512 byte message in netperf. (cmdline above) This is greater than a 3X improvement in throughput. I am not sure why you are only getting 63 Mb/sec. Please make sure that you have guests with at least two VCPU, are using the PV Nics in the guest. Also, I had been testing with 64 bits Dom0 and DomU everywhere. Also, make sure you are testing on a server class machine. In addition this would have been with RHEL5.X from around the time I filed the BZ (11/2009)
Just a note...When I said, I've got same result as Michal I was talking about speed improvement. I was observing really small increase in throughput (about 2%). Throughput I was able to measure was +-6Gb/s. Anyway, I still think that performance impact is not so big to change existing defaults in code. As finding proper value is complicated and different machines gives different results, I'ms still prefer to let it be 32 by default and advice user to change it via module parameter in case he or she needs better performance. We have to take into account that longer queue means more data consumption and this can be problem in case of machine with high memory usage.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -There's possibility to set default txqueuelen by using parameter netbk_queue_length for netback module. Default value of this parameter is 32.+There's possibility to set default txqueuelen by using parameter netbk_queue_length for netback module. Default value of this parameter is 500.
Based on the data we recommend changing the default txqueuelen to be 500. (if this is what you are saying in comment #27, then we agree). - Mark is measuring upto 3x difference on 10Gbit on a highend Nahelem EP box. - Miroslav is measuring 27% 83/65 Mb/s (emulation must be happening to be this low) Thus lets make the change to get a better out-of-the-box performance w/ Xen on RHEL.
in kernel-2.6.18-229.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Test with: host: x86 and x86_64 xen-3.0.3-120.el5 Test steps: 1. create several guests, including PV and HVM, for HVM both ioemu and netfront vif are used 2. ifconfig check txqueuelen from host reproduced the bug with kernel-xen-2.6.18-226.el5: default txqueuelen for vifX.Y is 32 verified the bug with kernel-xen-2.6.18-238.el5: default txqueuelen for vifX.Y is 500 According to the test results above, move to VERIFIED.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html