| Summary: | macvtap enabled KVMs recurrently slow down extremely <40kb/sec | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Kai Mosebach <redhat-bugzilla> |
| Component: | qemu-kvm | Assignee: | Michael S. Tsirkin <mst> |
| Status: | CLOSED WORKSFORME | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.2 | CC: | acathrow, juzhang, michen, mkenneth, rhod, tburke, virt-maint, vyasevic, wquan, xfu |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-07-21 20:11:58 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
- vhost_net is enabled on the KVM host
- ip link looks similar to all vms
the server interface :
6: macvhost0@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether XXX brd ff:ff:ff:ff:ff:ff
one of the guest interfaces
8: macvtap1@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 500
link/ether XXX brd ff:ff:ff:ff:ff:ff
On one windows guest a "connectivity problem" was shown once on the (virtual) network interface but the connection was working (slowly)
running "nload" on one KVM host against the windows vtap interface repeatedly showed this traffic pattern while downloading a file in the according guest : 100kb/s -> 40kb/s -> 0kb/s -> 100kb/s -> 40kb/s -> 0kb/s
wireshark reveals more info, between a kvm (http-server) and a client we see a lot of HTTP [TCP Retransmission] Continuation or non-HTTP traffic HTTP [TCP Fast Retransmission] Continuation or non-HTTP traffic HTTP [TCP Out-Of-Order] Continuation or non-HTTP traffic which explains that icmp has no problems but tcp has... will go an try the EL6.2 beta kernel next. So to clarify, is it a single stream that is affected? other streams keep going? would a certain number of packets getting reordered incorrectly before being transmitted out of host explain the observed behaviour? all streams are slow... (and there is no "keep going" since they are just slow... its the whole traffic of a machine which is affected (as soon as tcp is involved - could not try udp yet) i dont get the 2nd question, you mean, that an intial disorder would/could keep the whole stream "disordered" by some packages? Could it be that restarting a VM causes the problem? Does the problem happen if you don't restart any VMs? Could the problem be the host being out of memory? Coould you try scripting it so we get the /proc/meminfo and /proc/slabinfo on host around the time traffic slows? Note : the EL6.2 beta kernel does not change the behaviour Memory : its highly unprobable, the machine has 72gb, others we saw the problem have 256gb and only one 32GB machine on it. Restart of the VM: Nope, also a shutdown and clean start brought back the problem. Migrating it away to another KVM solved the issue though. it also seems, that under load machines sometimes "loose" their macvtap adapter, but thats another story we will investigate first and file a bug report if we have enough details, but mayve its related (i.e. the "disconnected network interface" in windows could be related). what values of slabinfo / meminfo should i monitor? anything else i could look in? bridge states etc...? MemFree and kmalloc-4096 I guess. Yes could be related. What do you mean "loose"? - guest is not pingable anymore - macv adapter is still visible in the kvm server - login fails (they are ldap-connected and time out, even for root) in the serial console - log-server does not reveal anything (assumable due to network failure) - /var/log/libvirt/qemu/<host>.log does not show anomalities - VM can be shutdown cleanly with virsh shutdown <vmname> there is no MemFree or kmalloc in either /proc/slabinfo nor /proc/meminfo... additional info to the "lost" connection : - virsh save on the one host and then - virsh restore on another host restores the network connectivity To Comment 10: what about restore on the same host after this? This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development. This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4. restoring needinfo: comment 14 does this get qa ack? can qa reproduce? Hi Kai, Thank you for taking the time to enter a bug report with us. We appreciate the feedback and look to use reports such as this to guide our efforts at improving our products. That being said, this bug tracking system is not a mechanism for requesting support, and we are not able to guarantee the timeliness or suitability of a resolution. If this issue is critical or in any way time sensitive, please raise a ticket through your regular Red Hat support channels to make certain it receives the proper attention and prioritization to assure a timely resolution. For information on how to contact the Red Hat production support team, please visit: https://www.redhat.com/support/process/production/#howto Thanks, Ronen. Hi Michael, sorry for the long delays, atm we do not see the problem too often. i strongly assume though, its related to this http://comments.gmane.org/gmane.comp.emulators.libvirt.user/2706 and vhost checksum problems, since we saw dropped packages on the according devices, i.e. RX packets:526 errors:84511 dropped:84511 overruns:0 frame:0 next time we see the problem i will try to save and restore the machine. best Kai (In reply to comment #20) > does this get qa ack? > can qa reproduce? cann't reproduce this issue. below is steps of testing. 1.setup 20 macvtap interface i=11 while [ $i -lt 21 ] do ip link add link eth0 dev macvtap$i type macvtap ip link set macvtap$i address 00:24:E8:81:14:5$i up i=$(($i+1)) done i=10 while [ $i -lt 21 ] do ip link add link eth0 dev macvtap$i type macvtap ip link set macvtap$i address 00:24:E8:81:14:$i up i=$(($i+1)) done 2.boot 20 guests in the same host use command line /usr/libexec/qemu-kvm -cpu host -enable-kvm -smp 2 -m 2G -usb -device usb-tablet,id=input0 -name test -uuid `uuidgen` -drive file=/root/vm-images/rhel6.1,if=none,id=hd,format=qcow2,aio=native,cache=none,werror=stop,rerror=stop -device ide-drive,drive=hd,id=blk_image,bootindex=1 -netdev tap,id=netdev0,fd=42 -device virtio-net-pci,netdev=netdev0,id=device-net0,mac=00:24:e8:81:14:19 42<>/dev/tap42 -vnc :20 -balloon none -device sga -chardev socket,id=serial0,path=/var/test1,server,nowait -device isa-serial,chardev=serial0 & 3. check each guest and transfer size 2G file to other machine by scp 4. repeat multiple times and transfer rate is normal. for above steps,if I have any missing, please correct me. Closing. For now, QE cannot reproduce, and the reporter also doesn't encounter it (or at least much less). Let's reopen if we have new data. Added Vlad to the CC list since it reminds me a little of Bug 795314 (GRO). Thanks, Ronen. |
Description of problem: We run several hundred KVMs on 12 Dell Servers (Dell R710 / Dell R815), the physical network HW is always a Broadcom NetXtreme II BCM5709 Gigabit against Cisco switches. The KVM Host servers are setup with eth0 bridged with a macvhost0 interface which gets the IP of the KVM Host. Each of the KVM guests (mostly el6, some windows xp/7) gets a macvtap interface of the type 'bridge' and the source 'eth0', driver virtio, e.g. in libvirt : <interface type='direct'> <mac address='xxxxxxxxx'/> <source dev='eth0' mode='bridge'/> <target dev='macvtap1'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> now some of the machines network connectivity becomes very slow from time to time (not reproducible by hand) which means down to 40KB/s where they normally reach 40MB/s. this is true for TCP traffic, but if i for instance run a flood ping i do not see any lost packages (even with payload >60kb / ping). A reboot of the VM does not help ( a reboot of the KVM host is not possible ATM). To fix it : - migrating the machine to another KVM host usually helps - sometimes the problem disappears the same way it came along Version-Release number of selected component (if applicable): kernels tested : - 2.6.32-131.17.1.el6.x86_64 - 2.6.32-131.12.1.el6.x86_64 - 2.6.32-131.6.1.el6.x86_64 qemu : - 0.12.1.2-2.160.el6_1.8 - 0.12.1.2-2.160.el6_1.6 seabios : - seabios-0.6.1.2-3.el6 - seabios-0.6.1.2-3.el6_1.1 - seabios-0.6.3-0 (public release) How reproducible: recurrently, but sporadic Steps to Reproduce: 1. 2. 3. Actual results: network performance (assumably tcp only) drops below 40kb/s Expected results: stable network performance (normally ~ 40 MB/s) Additional info: