Bug 539626 - default txqueuelen of vif device is too small
Summary: default txqueuelen of vif device is too small
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Miroslav Rezanina
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 514500
TreeView+ depends on / blocked
 
Reported: 2009-11-20 18:20 UTC by Mark Wagner
Modified: 2011-01-13 20:55 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
There's possibility to set default txqueuelen by using parameter netbk_queue_length for netback module. Default value of this parameter is 500.
Clone Of:
Environment:
Last Closed: 2011-01-13 20:55:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Mark Wagner 2009-11-20 18:20:49 UTC
Description of problem:
When starting a bridged guest under Xen, a vif gets created with a txqueuelen of 32. This kills throughput as the value is too low, especially on a 10gigabit network or with guest to guest on the same Dom0.  A good default would be 1000 . 

Version-Release number of selected component (if applicable):


How reproducible:
every time

Steps to Reproduce:
1. create two Xen guests that share a bridge
2. run a netperf TCP_STREAM test between the guests, record results
3. use ifconfig vif X.Y txqueuelen 1000 on both vifs
4 Repeat step #2
5 Compare / contrast results from steps 2 and 4
  
Actual results:
Guest to guest throughput was roughly 3gb/sec

Expected results:

Much higher throughput, on my test system, I was able to go from 3Gb/s to 16 Gb/sec by increasing txqueuelen

Additional info:

Comment 1 Chris Lalancette 2009-11-24 23:36:16 UTC
Mark,
     So, a couple of questions here.  First, what does a normal e1000 NIC use as a default txqueuelen?  How about a normal 10G card?  Second, what is the downside to using a very large txqueuelen?

Thanks,
Chris Lalancette

Comment 8 Mark Wagner 2009-11-30 18:08:30 UTC
I've done some quick performance testing using guest -> guest testing on the same host.  These tests ran a single stream of netperf and I varied the txqueuelen of the vifs to get an idea of the performance curves.   Note that these are "quick and dirty" numbers and should suffice for the purposes of this evaluation.

txQlen     netperf message size
        512 byte        4096 byte
-----------------------------------
32      1634.13          8402.32
64      1292.05         14198.48
128     4142.58         14677.39
256     4439.77         14626.80
512     5251.48         14809.59
1024    4875.96         15358.55


The data shows that the throughput increases up through a txqueuelen of 512. This is best shown with the smaller message size of 512 bytes as the 4096 byte messages are hitting the memory bandwidth limitation of the system. I have not tried multiple external boxes with a single guest.  This scenario would stress the vif queuelen as well as you would need the queuelen space help buffer the traffic.

Comment 9 Mark Wagner 2009-11-30 18:13:03 UTC
If people are concerned about memory utilization, could someone please quantify it ? 

In other words, how much memory does a vif with a txqueuelen of 32 use vs a txqueuelen of 500 ?

Comment 10 Mark Wagner 2009-11-30 21:04:19 UTC
Also, please note that kvm creates the tap device with a txqueuelen of 500

So based on the data above and the txqueuelen of kvm, perhaps we should go with 500 ?

Comment 13 Michal Novotny 2010-10-06 12:02:28 UTC
I'm investigating this one and although I don't know what the value is there for e1000 card but on the e1000e card (my laptop) the txqueuelen value is set to value of 1000.

Also, on my workstation with Broadcom Corporation NetXtreme BCM5755 Gigabit Ethernet PCI Express card the real ethernet device (pethX running the kernel-xen) is having txqueuelen of 1000 and the txqueuelen value of vifX.Y is 32 as stated above.

On KVM the vnet0 device is being created for the guest with txqueuelen set to 500 so I guess 500 would be good.

Michal

Comment 14 Michal Novotny 2010-10-07 11:47:58 UTC
I did following testing on my Xen host:

1. started up 2 HVM guests
2. downloaded and recompiled netperf 2.4.5 from official FTP site [1] and installed to those guests
3. run netserver on guest A and also disabled iptables
4. run `netperf -t TCP_STREAM -H guestA` on guest B and the results were:
guestB# netperf -t TCP_STREAM -H 10.34.26.225
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.34.26.225 (10.34.26.225) port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00      63.18   

host# ifconfig vifX.Y txqueuelen 500 on vif devices for both guests
guestB# netperf -t TCP_STREAM -H 10.34.26.225
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.34.26.225 (10.34.26.225) port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00      65.32

host# ifconfig vifX.Y txqueuelen 1000 on vif devices for both guests
guestB# netperf -t TCP_STREAM -H 10.34.26.225
# netperf -t TCP_STREAM -H 10.34.26.225
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.34.26.225 (10.34.26.225) port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00      83.73
#

So there's almost no change between txqueuelen 32 and 500 but there's a difference between txqueuelen 500 and 1000. Tested on quad-core Intel(R) Xeon(R) CPU X5460 with 8GB RAM using the RHEL-5 i386 and x86_64 HVM guests with 1G RAM each. Component versions were xen-3.0.3-117virttest31 and kernel-xen-2.6.18-222.el5.

Based on those results where there's no significant change between txqueuelen 32 and 500 there's no point of changing it but re-reading the comment #0 I realized that may be hardware dependent as well. We could consider changing this to the same value like it's used for KVM but not sure about this one.

Michal

[1] ftp://ftp.netperf.org/netperf/

Comment 15 Michal Novotny 2010-10-07 11:53:03 UTC
Mark, could you please review my steps and results and notify me whether I did something wrong with the procedure? Re-reading all the comments again I'm quite unsure my procedure was correct or the netperf tools is well-used by me (it's been the first time I used the netperf tool at all).

Thanks,
Michal

Comment 16 Miroslav Rezanina 2010-10-20 06:39:20 UTC
My testing showed there's not significant throughput increase with higher txqueuelen. 

As there's module parameter netbk_queue_length that can be used for setting default txqueuelen value, I recommended to just add technical note and close this bz without change.

Comment 17 Miroslav Rezanina 2010-10-20 06:39:21 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
There's possibility to set default txqueuelen by using parameter netbk_queue_length for netback module. Default value of this parameter is 32.

Comment 18 Miroslav Rezanina 2010-10-20 12:36:35 UTC
Closing as there's no fix needed for this bz - Only technical note has to be added mentioning way to change txqueuelen default.

Comment 19 Mark Wagner 2010-10-20 14:41:38 UTC
The developer tests were clearly flawed if you are only getting a max of 83 Mbits / sec.  Guest to guest is typically limited by memory speed. The results here should be up in the Gb/sec range. 

In addition, going from 65 Mb to 83Mb is still a very significant change which seems to be ignored. 

Reopening as this solution is not acceptable from a performance perspective.

Comment 20 Miroslav Rezanina 2010-10-21 04:33:14 UTC
Hi Mark,
why setting default txqueuelen via module parameter instead of hardcodding the value is not acceptible?

Comment 21 Michal Novotny 2010-10-21 10:32:04 UTC
(In reply to comment #19)
> The developer tests were clearly flawed if you are only getting a max of 83
> Mbits / sec.  Guest to guest is typically limited by memory speed. The results
> here should be up in the Gb/sec range. 
> 

Yes, this was guest to guest communication in my testing. Nevertheless I have to agree with Mirek since if there's the option to change it using the netbk_queue_length parameter for the netback module why should we hard-code this? There's no significant performance increase to have it about 200-500% better but just 30% performance increase is not that much to change default because of that. Considering the fact there's the option to change the value I recommend closing it again. Any objections Mark? If so then why?

Michal

Comment 22 Paolo Bonzini 2010-10-21 15:14:15 UTC
> why setting default txqueuelen via module parameter instead of hardcodding the
> value is not acceptible?

If the perf guys can find a better default, we should definitely use it.  Changing the default is not "hardcoding" the value, anybody who wishes a smaller txqueuelen for some reason (memory usage) can throttle it down.

I agree with Mark that Michal's experiments are probably invalid since 83 Mb/sec guest-to-guest is definitely below the expected guest-to-guest performance.  Do you recall what guest were you using?

With some differences probably due to the "beefiness" of the machines, I could reproduce Mark's results (at least on 512 byte messages) between two RHEL5 guests.  With two different message sizes, and requesting a 5% confidence interval from netperf, I got:

        4096      512
----------------------------
32      5386.56   1013.13
64      4933.22   1375.50
128     4711.75   1564.99
256     4733.52   1694.21
500     4773.61   1690.33

which also suggests a setting of 256 or 500.

Comment 23 Michal Novotny 2010-10-21 16:07:40 UTC
(In reply to comment #22)
> > why setting default txqueuelen via module parameter instead of hardcodding the
> > value is not acceptible?
> 
> If the perf guys can find a better default, we should definitely use it. 
> Changing the default is not "hardcoding" the value, anybody who wishes a
> smaller txqueuelen for some reason (memory usage) can throttle it down.
> 
> I agree with Mark that Michal's experiments are probably invalid since 83
> Mb/sec guest-to-guest is definitely below the expected guest-to-guest
> performance.  Do you recall what guest were you using?

I've been using RHEL-5 guests for this kind of communication nevertheless Mark didn't tell me the right procedure (my procedure has been described in the comment #14). According to Mirek the test results for him were the same so if there was a bad procedure used in the testing I'd welcome some description what procedure did Mark use.

> 
> With some differences probably due to the "beefiness" of the machines, I could
> reproduce Mark's results (at least on 512 byte messages) between two RHEL5
> guests.  With two different message sizes, and requesting a 5% confidence
> interval from netperf, I got:
> 
>         4096      512
> ----------------------------
> 32      5386.56   1013.13
> 64      4933.22   1375.50
> 128     4711.75   1564.99
> 256     4733.52   1694.21
> 500     4773.61   1690.33
> 
> which also suggests a setting of 256 or 500.

Interesting, what procedure did you use to get those results? I was using RHEL-5 guests as well and the test results for my machine were those mentioned in comment #14.

Michal

Comment 24 Paolo Bonzini 2010-10-21 16:36:38 UTC
Here is what I used for one run:

netperf -H ... -p ... -i 50,3 -- -m <message size>

(The -i option does not affect the order of magnitude of the results, it only repeats the run until netperf reaches a good confidence level).

Paolo

Comment 25 Mark Wagner 2010-10-21 19:28:55 UTC
The problem is more apparent at smaller packet sizes.  For netperf you should use something like this 

netperf -l 60 -H <ip addr> -- -m 512 

In the data I shared in comment #8 you can see that we go for a throughput of 1.6 Gb/s to over 5 Gb/s using a 512 byte message in netperf. (cmdline above)  This is greater than a 3X improvement in throughput. 

I am not sure why you are only getting 63 Mb/sec. Please make sure that you have guests with at least two VCPU, are using the PV Nics in the guest. Also, I had been testing with 64 bits Dom0 and DomU everywhere. Also, make sure you are testing on a server class machine. In addition this would have been with RHEL5.X from around the time I filed the BZ (11/2009)

Comment 26 Miroslav Rezanina 2010-10-22 07:33:45 UTC
Just a note...When I said, I've got same result as Michal I was talking about speed improvement. I was observing really small increase in throughput (about 2%). Throughput I was able to measure was +-6Gb/s. 

Anyway, I still think that performance impact is not so big to change existing defaults in code. As finding proper value is complicated and different machines gives different results, I'ms still prefer to let it be 32 by default and advice user to change it via module parameter in case he or she needs better performance. 

We have to take into account that longer queue means more data consumption and this can be problem in case of machine with high memory usage.

Comment 27 Miroslav Rezanina 2010-10-22 10:04:42 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-There's possibility to set default txqueuelen by using parameter netbk_queue_length for netback module. Default value of this parameter is 32.+There's possibility to set default txqueuelen by using parameter netbk_queue_length for netback module. Default value of this parameter is 500.

Comment 28 John Shakshober 2010-10-22 14:42:13 UTC
Based on the data we recommend changing the default txqueuelen to be 500.
(if this is what you are saying in comment #27, then we agree).

 - Mark is measuring upto 3x difference on 10Gbit on a highend Nahelem EP box.
 - Miroslav is measuring 27% 83/65 Mb/s (emulation must be happening to be this low)

Thus lets make the change to get a better out-of-the-box performance w/ Xen on RHEL.

Comment 30 Jarod Wilson 2010-10-28 20:08:23 UTC
in kernel-2.6.18-229.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 32 Lei Wang 2010-12-22 07:23:46 UTC
Test with:
host: x86 and x86_64
xen-3.0.3-120.el5

Test steps:
1. create several guests, including PV and HVM, for HVM both ioemu and netfront vif are used
2. ifconfig check txqueuelen from host

reproduced the bug with kernel-xen-2.6.18-226.el5:
default txqueuelen for vifX.Y is 32

verified the bug with kernel-xen-2.6.18-238.el5:
default txqueuelen for vifX.Y is 500

According to the test results above, move to VERIFIED.

Comment 34 errata-xmlrpc 2011-01-13 20:55:54 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.