Bug 569476

Summary:

Bonded virtio-net does not exceed 1Gb/s

Product:

Red Hat Enterprise Linux 5

Reporter:

Didier <d.bz-redhat>

Component:

kvm

Assignee:

Michael S. Tsirkin <mst>

Status:

CLOSED WONTFIX

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

medium

Docs Contact:

Priority:

low

Version:

5.5

CC:

ehabkost, lihuang, syeghiay, tburke, virt-maint, ykaul

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-01-14 15:32:40 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

580948

Attachments:

Description	Flags
KVM network test results (VH with 1 VG)	none
bnx2 GRO patch against 2.6.18-194.8.1.el5 (but fails to compile)	none
Add GRO support	none
bnx2x: Add GRO support	none
netserver strace on bare metal, with 'Interrupted system call'	none
bnx2x: Add GRO support (v2)	none

Description Didier 2010-03-01 15:14:05 UTC

Description of problem:
Total bandwidth to bridged virtio guests on a 802.3ad multiple GbE-bonded host is limited to 1 Gb/s.


Version-Release number of selected component (if applicable):
kvm-83-105.el5_4.22

How reproducible:
Always

Steps to Reproduce:
1. create 3x 1GbE bonded interface on host
2. verify that the host has 3 Gb/s bandwidth available
3. connect from several workstations to 3 virtio guests simultaneously
  
Actual results:
Total bandwidth to the guests never exceeds 1 Gb/s

Expected results:
As the host has a working 3 Gb/s bonded interface, total bandwidth to the guests should be higher than 1 Gb/s.


Additional info:

- Dell R805with 4x 1GbE network interfaces.  We are consolidating servers by means of KVM. To increase bandwidth and resiliency/failover, 3 host network interfaces are bonded. A capped bandwidth renders our virtualization useless.


- Bandwidth tests (with iperf) :

* hostA : 3x 1 GbE 802.3ad-bonded KVM host
    virtA1-virtA2-virtA3 : bridged virtio KVM guests
* hostB, hostC hostD : 1 GbE each (situated at different locations within the network)

- Tests (simultaneous connections from clients -> servers in Mb/s) :
1. B,C,D -> A : 990,600,700 Mb/s = 2.3 Gb/s
2. B -> A1 : 980 Mb/s
3. C -> A2 : 650 Mb/s (iperf variance ?)
4. B,C -> A1,A2 : 900,100 Mb/s
5. B,C,D -> A1,A2,A3 : 750,150,100 Mb/s

- Result [1] confirms a successful 802.3ad setup for hostA ;
- Results [2] and [3] indicate that each virtio guest has 1 Gb/s available ;
- Results [4] and [5] illustrate the problem : incoming bandwidth to all
virtio guests is capped at 1 Gb/s.

Comment 1 Dor Laor 2010-03-04 09:33:59 UTC

B->A1 does 980Mb, this is more than what a 1G nic can do. This means that at least bonding can work and there is no cap of 1Gb.

Can you provide cpu consumption and kvm_stat data?
Also, we recommend of using netperf and not iperf which is inefficient.

Comment 2 Didier 2010-03-05 09:44:19 UTC

Dor, thank you for your reply.

1. I've compiled and installed netperf (v2.4.5), and am running some initial tests with it ; I will provide a summary of these tests somewhere next week, as thorough tests will take some extra days.

Currently, I am executing these preliminary tests as "# netperf -H xxx,4 -D 1 -f M" (yes, as root). As I have no prior experience with netperf, do you recommend certain command-line parameters for reproducible results ?

2. As which format would you preferably like me to provide cpu & kvm_stat data ? I still need to install kvm_stat.

3. Would it be beneficial to install RHEL 5.5b1 as KVM guests, and provide test results with these ? I am a bit hesitant to install RHEL 5.5b1 on the KVM host(s), unless there is a good reason to do this (e.g. kernel/KVM updates).

Comment 3 Dor Laor 2010-03-14 15:15:02 UTC

(In reply to comment #2)
> Dor, thank you for your reply.
> 
> Currently, I am executing these preliminary tests as "# netperf -H xxx,4 -D 1
> -f M" (yes, as root). As I have no prior experience with netperf, do you
> recommend certain command-line parameters for reproducible results ?

You can play with msg size and buffer len like http://markmc.fedorapeople.org/virtio-netperf/2009-04-15/scripts/

> 
> 2. As which format would you preferably like me to provide cpu & kvm_stat data
> ? I still need to install kvm_stat.

It's enough to get %idle time and total of vmexits.
kvm_stat exist in the kvm package itself.

> 
> 3. Would it be beneficial to install RHEL 5.5b1 as KVM guests, and provide test
> results with these ? I am a bit hesitant to install RHEL 5.5b1 on the KVM
> host(s), unless there is a good reason to do this (e.g. kernel/KVM updates).    

It's up to you, it should be an issue

Comment 4 Herbert Xu 2010-04-07 12:10:15 UTC

Didier, so a single iperf can sustain 3Gbps on the host? If so can you please test a single guest (as opposed to 3 guests) and see what its throughput is?

Once we get a single guest sorted out then we can progress to the issue with 3 guests.

Thanks!

Comment 5 Didier 2010-04-07 14:24:59 UTC

Herbert, by "a single guest", do you mean :
- a single client (B, C or D), or
- a single virtualized guest (virtA1, VirtA2) ?

The scenario of a single client connecting to a single virtualized guest is covered by results [2] and [3] from the original comment (i.e. limited to 1 Gbs).



(also, please note that in the meantime, I've upgraded the target infrastructure to RHEL5.5 ; I'll update shortly with new tests).

Comment 6 Herbert Xu 2010-05-21 06:18:39 UTC

No Didier, what I mean is the scenario B,C,D => A1.  If we still have a problem with this, then that narrows our field considerably since it would rule out interference between guests.

Anyway, please let us know how your new tests went.  Thanks!

Comment 7 Didier 2010-07-16 15:38:46 UTC

Created attachment 432417 [details]
KVM network test results (VH with 1 VG)

Herbert,
My apologies for not getting back sooner (this issue was/is on the top of my to-do, so go figure).

As requested, in attachment the tests with 1 virtual guest in 1 host.

FYI, I tested with both RHEL5.5 and RHEL6b2.





Notes and observations :

1. Although the bond is 4x 1GbE, the throughput to the host maxes out at approx. 3000 Gb/s ; this may be a local topology issue, which I'll need to investigate.

2. Dynamically changing numbers (e.g. %CPU, kvm_stat) are guesstimated averages, based on visual observation ...

3. Both hosts RHEL5 (result {1}) and RHEL6 (result {4}) max out at 3 Gb/s, regardless of number of connecting clients ;

4. Guest RHEL5 on host RHEL5 maxes out at ~1 Gb/s (result {2}), and decreases with increasing number of clients ; this is very worrisome, and illustrates my original concern in comment #1 ;

5. For the sake of testing : guest RHEL6 on host RHEL5 performs abysmally (result {3}) ;

6. Guest RHEL6 on host RHEL6 maxes out at ~1.7 Gb/s (result {5}), and performs steady at ~ 1.5 Gb/s. This is still a 50% performance hit compared to bare metal.



As one of the virtual guests would serve SMB/CIFS data to some tens of Windows clients and NFS data to other servers (indeed, I'd like to isolate fileservers in guest sessions), I am quite uncertain on how to proceed with virtualization in view of these network performance degradations.

Comment 8 Herbert Xu 2010-07-19 06:02:37 UTC

Thanks that's very helpful!

Your data show that the CPU processing guest traffic is probably maxed out.

The most obvious to do in this case is to activate GRO.  Unfortunately it would appear that the bnx2 driver you're using doesn't support GRO.

I'll see if I can get you a patch to enable GRO on bnx2.

Comment 9 Didier 2010-07-22 20:21:37 UTC

Dear Herbert,

(In reply to comment #8)

> I'll see if I can get you a patch to enable GRO on bnx2.    


- Is http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c67938a9e071fa51c91ed17a14382e128368d115 the patch which should be applied, against both RHEL5.5 and/or RHEL6b2 ?

- What would be the best practice for recompiling the bnx2 module from source ?

- Is there a reasonable assessment of the risks involved in applying this patch (considering this will be a production system) ?

Comment 10 Herbert Xu 2010-07-23 08:20:53 UTC

(In reply to comment #9)
> - Is
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c67938a9e071fa51c91ed17a14382e128368d115
> the patch which should be applied, against both RHEL5.5 and/or RHEL6b2 ?

Yes.

> - What would be the best practice for recompiling the bnx2 module from source ?

What I do (not necessarily the best practice :) is grab the kernel source rpm of the exact same version as you're currently using, unpack it with rpmbuild, ensure the config file is identical to the one you're using, then apply the patch and build the bnx2x driver:

make SUBDIRS=drivers/net

You should then be able to load that module without even rebooting (you'll need some form of access other than through the bnx2x NIC to be on the safe side).

> - Is there a reasonable assessment of the risks involved in applying this patch
> (considering this will be a production system) ?    

While the change itself is not very risky, any changes to the kernel always carries an element of risk.  So I cannot make any guarantees.

Comment 11 Didier 2010-07-23 08:56:00 UTC

Herbert,

Apologies, I should rephrase my comment :

Is the patch from comment #9 applicable to the bnx2 module source from current (e.g. 2.6.18-194.8.1.el5) RHEL5.5 kernels ?

Comment 12 Herbert Xu 2010-07-23 15:04:33 UTC

It should be, even if it doesn't apply cleanly, the changes should be fairly easy to make by hand.  Let me know if you hit any snags.

Comment 13 Didier 2010-07-27 13:14:31 UTC

Created attachment 434685 [details]
bnx2 GRO patch against 2.6.18-194.8.1.el5 (but fails to compile)

(In reply to comment #12)
> It should be, even if it doesn't apply cleanly, the changes should be fairly
> easy to make by hand.  Let me know if you hit any snags.    

Unfortunately, the patch does not apply cleanly against 2.6.18-194.8.1.el5-x86_64.

Applying the patch manually (see attachment), yields :


drivers/net/bnx2.c: In function ‘bnx2_rx_int’:
drivers/net/bnx2.c:3186: error: ‘struct bnx2_napi’ has no member named ‘napi’
drivers/net/bnx2.c:3189: error: ‘struct bnx2_napi’ has no member named ‘napi’

Comment 14 Herbert Xu 2010-07-28 07:46:03 UTC

Created attachment 434942 [details]
Add GRO support

Here's a totally untested patch, use at your own risk!

Comment 15 Herbert Xu 2010-07-28 07:47:15 UTC

Created attachment 434943 [details]
bnx2x: Add GRO support

Still untested, but at least this time it's the right file :)

Comment 16 Didier 2010-07-28 09:48:10 UTC

Patch applies cleanly.
GRO is by default enabled in eth[0-3], but not in the bond :

# ethtool -k eth0
Offload parameters for eth0:
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off
generic-receive-offload: on

# ethtool -k bond0
Offload parameters for bond0:
Cannot get device rx csum settings: Operation not supported
rx-checksumming: off
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off
generic-receive-offload: off



* The results are not entirely satisfying :
- netperf to the host and the guest aborts most of the time with "Interrupted system call  netperf: remote error 4" (see attachment for an netserver strace on the host) ;
- iperf performance to the host is in the Kbit range.

* Disabling GRO on the 4 physical interfaces ('ethtool -K ethX gro off') restores standard performance.



(Herbert, if that would be of help, I can provide you with shell access to the host, the guest, the RHEL5.5 compilation guest and/or a local netperf client.)

Comment 17 Didier 2010-07-28 09:49:33 UTC

Created attachment 434965 [details]
netserver strace on bare metal, with 'Interrupted system call'

Comment 18 Herbert Xu 2010-07-28 11:31:59 UTC

Created attachment 434986 [details]
bnx2x: Add GRO support (v2)

Sorry, I forgot to add a call to flush GRO packets which is needed on RHEL5.

Comment 19 Didier 2010-07-28 14:49:29 UTC

Thank you for your quick interaction, Herbert ; Red Hat's technical support (mostly ;) ) never ceases to satisfy.


* Previous results (see https://bugzilla.redhat.com/attachment.cgi?id=432417)

{1} TARGET = [A_rh5]
	cl#	tput Mbs - host %CPU(hi$si)/cum.netserver %CPU
	[B-H]	2700 - 0hi28si/240

{2} TARGET = [A1_rh5]   (host = [A_rh5])
	cl#	tput Mbs - host %CPU(us+sy)/qemu-kvm %CPU - kvm_stat - guest %CPU(hi&si)/cum.netserver %CPU
	[B]	 990 - 15/125 - 60000 - 8hi11si/35
	[BC]	1043 - 23/180 - 47000 - 13hi22si/160
	[BCD]	 740 - 25/190 - 41000 - 13hi27si/170
	[BCDE]	 640 - 23/185 - 43000 - 13hi27si/155
	[B-H]	 640 - 26/210 - 45000 - 10hi38si/186

* Results with bnx2 GRO support :
(kvm_stat : idling : ~6000)

{1} TARGET = [A_rh5]
	cl#	tput Mbs - host %CPU(hi$si)/cum.netserver %CPU	
	[B-H]	2900 - 0hi13si/115


{2} TARGET = [A1_rh5]   (host = [A_rh5])
	cl#	tput Mbs - host %CPU(us+sy)/qemu-kvm %CPU - kvm_stat - guest %CPU(hi&si)/cum.netserver %CPU
	[B]	 940 - 22/175 - 62000 - 1hi1si/2
	[BC]	1680 - 25/202 - 52000 - 4hi6si/30
	[BCD]	1430 - 23/194 - 41000 - 5hi7si/36
	[BCDE]	1360 - 24/194 - 43000 - 9hi13si/(45-80)
	[B-H]	 940 - 27/220 - 55000 - 9hi24si/(45-100)



* GRO clearly improves the throughput (and decreases CPU usage), but it would be interesting to know why :

1. tput decreases to 940 Mbs with 7 clients (CPU starvation ? In guest or host ?) ?
2. max. tput to a KVM guest is limited to ~1700 Mbs (~2900 Mbs to host) ?


* From here, I can either :
3. test with RHEL6b2 host/guest with bnx2+GRO support ;
4. test with multiple KVM guests on an RHEL5.5 host (which is a not-too-critical production system).

* Additionally :
5. Is SR-IOV (e.g. Intel E1G44ET with Intel 82576) a hardware solution for this issue, either in RHEL5.5 or RHEL6b2 ?

Comment 20 Didier 2010-07-29 08:56:32 UTC

My sincere apologies : in comment #19, I mistakenly tested an RHEL6b2 guest in the RHEL5.5 host.

In that comment's data set, the results with bnx2 GRO support should read (and be compared to) :
{3} TARGET = [A1_rh6]   (host = [A_rh5])
instead of :
{2} TARGET = [A1_rh5]   (host = [A_rh5])

Conclusion : the very large drop in performance with an RHEL6 guest in an RHEL5 host is resolved with the GRO patch ; of course, this is only of academic interest, as there is no immediate benefit of running a beta guest on a host in a production system.



I reran the GRO-patch tests with an RHEL5.5 guest on the GRO-patched RHEL 5.5 host. To prevent CPU starvation, I increased the number of guest CPU's to 8 (8 available on the bare metal host) and the memory to 4 GB (24 GB available on host).

Results :

{2} TARGET = [A1_rh5]   (host = [A_rh5])
 cl# tput Mbs - host %CPU(us+sy)/qemu-kvm %CPU - kvm_stat - guest %CPU(hi&si)/cum.netserver %CPU

- without GRO (see https://bugzilla.redhat.com/attachment.cgi?id=432417)
 [BCD]   740 - 25/190 - 41000 - 13hi27si/170
 [B-H]   640 - 26/210 - 45000 - 10hi38si/186
- with GRO
 [BCD]  1360 - / -  - 0hi0si/
 [B-H]  1010 - 25/205 - 53000 - 6hi14si/165

GRO clearly improves the situation (> 1 Gbs with 3 clients), but throughput with 7 clients is still only 1/3 of the bare metal throughput.


Would you like me to perform/apply additional tests/patches (see also the raised points in comment #19 wrt multiple guests and/or SR-IOV) ?

Comment 21 Herbert Xu 2010-07-29 09:14:48 UTC

If you could test with our latest RHEL6 host that would be great.  I haven't checked in a while but it is possible that the RHEL5 host/guest virtio_net path isn't passing GRO packets through as is and *may* be refragmenting them.

Comment 22 Herbert Xu 2010-07-29 09:17:22 UTC

Oh, I see that you've already tested with RHEL6 as a host, were you using vhost?

Comment 23 Didier 2010-07-29 15:28:04 UTC

- Retested with RHEL6b2 host & guest, both with latest updates :
   kernel-2.6.32-44.2.el6.x86_64 (unpatched)
   qemu-kvm-0.12.1.2-2.90.el6.x86_64

- vhost-net is loaded :

# lsmod |grep vhost
vhost_net              23098  1 
macvtap                 7701  1 vhost_net
tun                    16583  3 vhost_net

- GRO is by default disabled :

# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off


Netperf test results :
{5} TARGET = [A1_rh6]   (host = [A_rh6])

kvm_stat idle : ~300

GRO=off

 cl# tput Mbs - host %CPU(us+sy)/qemu-kvm %CPU - kvm_stat - guest
%CPU(hi&si)/cum.netserver %CPU
 [BCD]  1950 - / -  - 0hi0si/
 [B-H]  1590 - 33/240 - 190000 - 0hi20si/(176-200)

GRO=on

 cl# tput Mbs - host %CPU(us+sy)/qemu-kvm %CPU - kvm_stat - guest
%CPU(hi&si)/cum.netserver %CPU
 [BCD]  1930 - / -  - 0hi0si/
 [B-H]  1590 - 36/240 - 210000 - 0hi19si/(184-189)


Observations :

1. Test results slightly better than with kernel-2.6.32-37.el6.x86_64 (see result set {5} in https://bugzilla.redhat.com/attachment.cgi?id=432417) ;
2. 30% performance drop with 3 clients [BCD] compared to bare metal ;
3. An additional 20% performance drop with 7 clients compared to 3 clients ;
4. GRO on/off does not seem to make a difference.

I guess kernel-2.6.32-44.2.el6.x86_64 does need a bnx2 GRO patch too (which probably needs some love compared to http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c67938a9e071fa51c91ed17a14382e128368d115 ?

Comment 24 Dor Laor 2010-07-29 21:05:33 UTC

(In reply to comment #23)
> 
> Observations :
> 
> 1. Test results slightly better than with kernel-2.6.32-37.el6.x86_64 (see
> result set {5} in https://bugzilla.redhat.com/attachment.cgi?id=432417) ;
> 2. 30% performance drop with 3 clients [BCD] compared to bare metal ;
> 3. An additional 20% performance drop with 7 clients compared to 3 clients ;
> 4. GRO on/off does not seem to make a difference.
> 
> I guess kernel-2.6.32-44.2.el6.x86_64 does need a bnx2 GRO patch too (which
> probably needs some love compared to
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c67938a9e071fa51c91ed17a14382e128368d115
> ?    

Stay tuned for this bug: 615118   	 mst   	creates a vhost thread per device

Comment 25 Didier 2010-08-31 08:43:18 UTC

BZ #615118 changed state three weeks ago (patches in kernel-2.6.32-61.el6), but it appears the rhel6 beta channel has not been updated for a month ...

Any chance to test this kernel/patch through other means ?

Comment 26 Dor Laor 2010-09-02 14:05:27 UTC

We decided to keep vhost disabled by default due to stability reasons in 6.0.
So userspace is still the primary preferred option in 6.0 as well.

Comment 29 RHEL Program Management 2011-01-11 20:43:58 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 30 RHEL Program Management 2011-01-11 22:49:59 UTC

This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 32 Eduardo Habkost 2011-01-14 15:32:40 UTC

Closing for RHEL-5, testing this on RHEL-6 instead of RHEL-5 is recommended.