Bug 569476
Summary: | Bonded virtio-net does not exceed 1Gb/s | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Didier <d.bz-redhat> |
Component: | kvm | Assignee: | Michael S. Tsirkin <mst> |
Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 5.5 | CC: | ehabkost, lihuang, syeghiay, tburke, virt-maint, ykaul |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-01-14 15:32:40 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 580948 | ||
Attachments: |
Description
Didier
2010-03-01 15:14:05 UTC
B->A1 does 980Mb, this is more than what a 1G nic can do. This means that at least bonding can work and there is no cap of 1Gb. Can you provide cpu consumption and kvm_stat data? Also, we recommend of using netperf and not iperf which is inefficient. Dor, thank you for your reply. 1. I've compiled and installed netperf (v2.4.5), and am running some initial tests with it ; I will provide a summary of these tests somewhere next week, as thorough tests will take some extra days. Currently, I am executing these preliminary tests as "# netperf -H xxx,4 -D 1 -f M" (yes, as root). As I have no prior experience with netperf, do you recommend certain command-line parameters for reproducible results ? 2. As which format would you preferably like me to provide cpu & kvm_stat data ? I still need to install kvm_stat. 3. Would it be beneficial to install RHEL 5.5b1 as KVM guests, and provide test results with these ? I am a bit hesitant to install RHEL 5.5b1 on the KVM host(s), unless there is a good reason to do this (e.g. kernel/KVM updates). (In reply to comment #2) > Dor, thank you for your reply. > > Currently, I am executing these preliminary tests as "# netperf -H xxx,4 -D 1 > -f M" (yes, as root). As I have no prior experience with netperf, do you > recommend certain command-line parameters for reproducible results ? You can play with msg size and buffer len like http://markmc.fedorapeople.org/virtio-netperf/2009-04-15/scripts/ > > 2. As which format would you preferably like me to provide cpu & kvm_stat data > ? I still need to install kvm_stat. It's enough to get %idle time and total of vmexits. kvm_stat exist in the kvm package itself. > > 3. Would it be beneficial to install RHEL 5.5b1 as KVM guests, and provide test > results with these ? I am a bit hesitant to install RHEL 5.5b1 on the KVM > host(s), unless there is a good reason to do this (e.g. kernel/KVM updates). It's up to you, it should be an issue Didier, so a single iperf can sustain 3Gbps on the host? If so can you please test a single guest (as opposed to 3 guests) and see what its throughput is? Once we get a single guest sorted out then we can progress to the issue with 3 guests. Thanks! Herbert, by "a single guest", do you mean : - a single client (B, C or D), or - a single virtualized guest (virtA1, VirtA2) ? The scenario of a single client connecting to a single virtualized guest is covered by results [2] and [3] from the original comment (i.e. limited to 1 Gbs). (also, please note that in the meantime, I've upgraded the target infrastructure to RHEL5.5 ; I'll update shortly with new tests). No Didier, what I mean is the scenario B,C,D => A1. If we still have a problem with this, then that narrows our field considerably since it would rule out interference between guests. Anyway, please let us know how your new tests went. Thanks! Created attachment 432417 [details] KVM network test results (VH with 1 VG) Herbert, My apologies for not getting back sooner (this issue was/is on the top of my to-do, so go figure). As requested, in attachment the tests with 1 virtual guest in 1 host. FYI, I tested with both RHEL5.5 and RHEL6b2. Notes and observations : 1. Although the bond is 4x 1GbE, the throughput to the host maxes out at approx. 3000 Gb/s ; this may be a local topology issue, which I'll need to investigate. 2. Dynamically changing numbers (e.g. %CPU, kvm_stat) are guesstimated averages, based on visual observation ... 3. Both hosts RHEL5 (result {1}) and RHEL6 (result {4}) max out at 3 Gb/s, regardless of number of connecting clients ; 4. Guest RHEL5 on host RHEL5 maxes out at ~1 Gb/s (result {2}), and decreases with increasing number of clients ; this is very worrisome, and illustrates my original concern in comment #1 ; 5. For the sake of testing : guest RHEL6 on host RHEL5 performs abysmally (result {3}) ; 6. Guest RHEL6 on host RHEL6 maxes out at ~1.7 Gb/s (result {5}), and performs steady at ~ 1.5 Gb/s. This is still a 50% performance hit compared to bare metal. As one of the virtual guests would serve SMB/CIFS data to some tens of Windows clients and NFS data to other servers (indeed, I'd like to isolate fileservers in guest sessions), I am quite uncertain on how to proceed with virtualization in view of these network performance degradations. Thanks that's very helpful! Your data show that the CPU processing guest traffic is probably maxed out. The most obvious to do in this case is to activate GRO. Unfortunately it would appear that the bnx2 driver you're using doesn't support GRO. I'll see if I can get you a patch to enable GRO on bnx2. Dear Herbert, (In reply to comment #8) > I'll see if I can get you a patch to enable GRO on bnx2. - Is http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c67938a9e071fa51c91ed17a14382e128368d115 the patch which should be applied, against both RHEL5.5 and/or RHEL6b2 ? - What would be the best practice for recompiling the bnx2 module from source ? - Is there a reasonable assessment of the risks involved in applying this patch (considering this will be a production system) ? (In reply to comment #9) > - Is > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c67938a9e071fa51c91ed17a14382e128368d115 > the patch which should be applied, against both RHEL5.5 and/or RHEL6b2 ? Yes. > - What would be the best practice for recompiling the bnx2 module from source ? What I do (not necessarily the best practice :) is grab the kernel source rpm of the exact same version as you're currently using, unpack it with rpmbuild, ensure the config file is identical to the one you're using, then apply the patch and build the bnx2x driver: make SUBDIRS=drivers/net You should then be able to load that module without even rebooting (you'll need some form of access other than through the bnx2x NIC to be on the safe side). > - Is there a reasonable assessment of the risks involved in applying this patch > (considering this will be a production system) ? While the change itself is not very risky, any changes to the kernel always carries an element of risk. So I cannot make any guarantees. Herbert, Apologies, I should rephrase my comment : Is the patch from comment #9 applicable to the bnx2 module source from current (e.g. 2.6.18-194.8.1.el5) RHEL5.5 kernels ? It should be, even if it doesn't apply cleanly, the changes should be fairly easy to make by hand. Let me know if you hit any snags. Created attachment 434685 [details] bnx2 GRO patch against 2.6.18-194.8.1.el5 (but fails to compile) (In reply to comment #12) > It should be, even if it doesn't apply cleanly, the changes should be fairly > easy to make by hand. Let me know if you hit any snags. Unfortunately, the patch does not apply cleanly against 2.6.18-194.8.1.el5-x86_64. Applying the patch manually (see attachment), yields : drivers/net/bnx2.c: In function ‘bnx2_rx_int’: drivers/net/bnx2.c:3186: error: ‘struct bnx2_napi’ has no member named ‘napi’ drivers/net/bnx2.c:3189: error: ‘struct bnx2_napi’ has no member named ‘napi’ Created attachment 434942 [details]
Add GRO support
Here's a totally untested patch, use at your own risk!
Created attachment 434943 [details]
bnx2x: Add GRO support
Still untested, but at least this time it's the right file :)
Patch applies cleanly. GRO is by default enabled in eth[0-3], but not in the bond : # ethtool -k eth0 Offload parameters for eth0: Cannot get device udp large send offload settings: Operation not supported rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: on udp fragmentation offload: off generic segmentation offload: off generic-receive-offload: on # ethtool -k bond0 Offload parameters for bond0: Cannot get device rx csum settings: Operation not supported rx-checksumming: off tx-checksumming: on scatter-gather: on tcp segmentation offload: on udp fragmentation offload: off generic segmentation offload: off generic-receive-offload: off * The results are not entirely satisfying : - netperf to the host and the guest aborts most of the time with "Interrupted system call netperf: remote error 4" (see attachment for an netserver strace on the host) ; - iperf performance to the host is in the Kbit range. * Disabling GRO on the 4 physical interfaces ('ethtool -K ethX gro off') restores standard performance. (Herbert, if that would be of help, I can provide you with shell access to the host, the guest, the RHEL5.5 compilation guest and/or a local netperf client.) Created attachment 434965 [details]
netserver strace on bare metal, with 'Interrupted system call'
Created attachment 434986 [details]
bnx2x: Add GRO support (v2)
Sorry, I forgot to add a call to flush GRO packets which is needed on RHEL5.
Thank you for your quick interaction, Herbert ; Red Hat's technical support (mostly ;) ) never ceases to satisfy. * Previous results (see https://bugzilla.redhat.com/attachment.cgi?id=432417) {1} TARGET = [A_rh5] cl# tput Mbs - host %CPU(hi$si)/cum.netserver %CPU [B-H] 2700 - 0hi28si/240 {2} TARGET = [A1_rh5] (host = [A_rh5]) cl# tput Mbs - host %CPU(us+sy)/qemu-kvm %CPU - kvm_stat - guest %CPU(hi&si)/cum.netserver %CPU [B] 990 - 15/125 - 60000 - 8hi11si/35 [BC] 1043 - 23/180 - 47000 - 13hi22si/160 [BCD] 740 - 25/190 - 41000 - 13hi27si/170 [BCDE] 640 - 23/185 - 43000 - 13hi27si/155 [B-H] 640 - 26/210 - 45000 - 10hi38si/186 * Results with bnx2 GRO support : (kvm_stat : idling : ~6000) {1} TARGET = [A_rh5] cl# tput Mbs - host %CPU(hi$si)/cum.netserver %CPU [B-H] 2900 - 0hi13si/115 {2} TARGET = [A1_rh5] (host = [A_rh5]) cl# tput Mbs - host %CPU(us+sy)/qemu-kvm %CPU - kvm_stat - guest %CPU(hi&si)/cum.netserver %CPU [B] 940 - 22/175 - 62000 - 1hi1si/2 [BC] 1680 - 25/202 - 52000 - 4hi6si/30 [BCD] 1430 - 23/194 - 41000 - 5hi7si/36 [BCDE] 1360 - 24/194 - 43000 - 9hi13si/(45-80) [B-H] 940 - 27/220 - 55000 - 9hi24si/(45-100) * GRO clearly improves the throughput (and decreases CPU usage), but it would be interesting to know why : 1. tput decreases to 940 Mbs with 7 clients (CPU starvation ? In guest or host ?) ? 2. max. tput to a KVM guest is limited to ~1700 Mbs (~2900 Mbs to host) ? * From here, I can either : 3. test with RHEL6b2 host/guest with bnx2+GRO support ; 4. test with multiple KVM guests on an RHEL5.5 host (which is a not-too-critical production system). * Additionally : 5. Is SR-IOV (e.g. Intel E1G44ET with Intel 82576) a hardware solution for this issue, either in RHEL5.5 or RHEL6b2 ? My sincere apologies : in comment #19, I mistakenly tested an RHEL6b2 guest in the RHEL5.5 host. In that comment's data set, the results with bnx2 GRO support should read (and be compared to) : {3} TARGET = [A1_rh6] (host = [A_rh5]) instead of : {2} TARGET = [A1_rh5] (host = [A_rh5]) Conclusion : the very large drop in performance with an RHEL6 guest in an RHEL5 host is resolved with the GRO patch ; of course, this is only of academic interest, as there is no immediate benefit of running a beta guest on a host in a production system. I reran the GRO-patch tests with an RHEL5.5 guest on the GRO-patched RHEL 5.5 host. To prevent CPU starvation, I increased the number of guest CPU's to 8 (8 available on the bare metal host) and the memory to 4 GB (24 GB available on host). Results : {2} TARGET = [A1_rh5] (host = [A_rh5]) cl# tput Mbs - host %CPU(us+sy)/qemu-kvm %CPU - kvm_stat - guest %CPU(hi&si)/cum.netserver %CPU - without GRO (see https://bugzilla.redhat.com/attachment.cgi?id=432417) [BCD] 740 - 25/190 - 41000 - 13hi27si/170 [B-H] 640 - 26/210 - 45000 - 10hi38si/186 - with GRO [BCD] 1360 - / - - 0hi0si/ [B-H] 1010 - 25/205 - 53000 - 6hi14si/165 GRO clearly improves the situation (> 1 Gbs with 3 clients), but throughput with 7 clients is still only 1/3 of the bare metal throughput. Would you like me to perform/apply additional tests/patches (see also the raised points in comment #19 wrt multiple guests and/or SR-IOV) ? If you could test with our latest RHEL6 host that would be great. I haven't checked in a while but it is possible that the RHEL5 host/guest virtio_net path isn't passing GRO packets through as is and *may* be refragmenting them. Oh, I see that you've already tested with RHEL6 as a host, were you using vhost? - Retested with RHEL6b2 host & guest, both with latest updates : kernel-2.6.32-44.2.el6.x86_64 (unpatched) qemu-kvm-0.12.1.2-2.90.el6.x86_64 - vhost-net is loaded : # lsmod |grep vhost vhost_net 23098 1 macvtap 7701 1 vhost_net tun 16583 3 vhost_net - GRO is by default disabled : # ethtool -k eth0 Offload parameters for eth0: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: off large-receive-offload: off Netperf test results : {5} TARGET = [A1_rh6] (host = [A_rh6]) kvm_stat idle : ~300 GRO=off cl# tput Mbs - host %CPU(us+sy)/qemu-kvm %CPU - kvm_stat - guest %CPU(hi&si)/cum.netserver %CPU [BCD] 1950 - / - - 0hi0si/ [B-H] 1590 - 33/240 - 190000 - 0hi20si/(176-200) GRO=on cl# tput Mbs - host %CPU(us+sy)/qemu-kvm %CPU - kvm_stat - guest %CPU(hi&si)/cum.netserver %CPU [BCD] 1930 - / - - 0hi0si/ [B-H] 1590 - 36/240 - 210000 - 0hi19si/(184-189) Observations : 1. Test results slightly better than with kernel-2.6.32-37.el6.x86_64 (see result set {5} in https://bugzilla.redhat.com/attachment.cgi?id=432417) ; 2. 30% performance drop with 3 clients [BCD] compared to bare metal ; 3. An additional 20% performance drop with 7 clients compared to 3 clients ; 4. GRO on/off does not seem to make a difference. I guess kernel-2.6.32-44.2.el6.x86_64 does need a bnx2 GRO patch too (which probably needs some love compared to http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c67938a9e071fa51c91ed17a14382e128368d115 ? (In reply to comment #23) > > Observations : > > 1. Test results slightly better than with kernel-2.6.32-37.el6.x86_64 (see > result set {5} in https://bugzilla.redhat.com/attachment.cgi?id=432417) ; > 2. 30% performance drop with 3 clients [BCD] compared to bare metal ; > 3. An additional 20% performance drop with 7 clients compared to 3 clients ; > 4. GRO on/off does not seem to make a difference. > > I guess kernel-2.6.32-44.2.el6.x86_64 does need a bnx2 GRO patch too (which > probably needs some love compared to > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c67938a9e071fa51c91ed17a14382e128368d115 > ? Stay tuned for this bug: 615118 mst creates a vhost thread per device BZ #615118 changed state three weeks ago (patches in kernel-2.6.32-61.el6), but it appears the rhel6 beta channel has not been updated for a month ... Any chance to test this kernel/patch through other means ? We decided to keep vhost disabled by default due to stability reasons in 6.0. So userspace is still the primary preferred option in 6.0 as well. This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release. Closing for RHEL-5, testing this on RHEL-6 instead of RHEL-5 is recommended. |