Created attachment 322433 [details] tcpdump.log -i br0 (virt host) Description of problem: Network traffic to/from my bridged KVM guests stalls during large file transfers. This can be observed while scp'ing a DVD.iso to a guest, and often just by starting a network installation on a guest (stalls while transferring install.img). I'm not clear on what component this should be assigned to. Please advise. Version-Release number of selected component (if applicable): libvirt-0.4.6-3.fc10.x86_64 kernel-2.6.27.4-68.fc10.x86_64 bridge-utils-1.2-6.fc10.x86_64 How reproducible: 100% Steps to Reproduce: 1. Install F10 x86_64 2. Install F10 KVM x86_64 or i386 guest 3. SCP a large file from the F10 host to the F10 KVM guest Actual results: $ scp ~guest/Download/Fedora-10-Preview-i386-DVD.iso root.34.91:/iso/ root.34.91's password: Fedora-10-Preview-i386-DVD.iso 0% 5088KB 26.2KB/s - stalled - Expected results: No network stall Additional info: * `service network restart` is required to get networking to guests running again. * I've seen this occur while downloading install.img during installation of bridged guests. Rebooting eventually works around the issue.
Created attachment 322434 [details] dmesg
Created attachment 322435 [details] /var/log/messages
Stalled networking traffic isn't a libvirt problem - its almost certainly a KVM device emulation problem, so changing to KVM component.
Created attachment 322451 [details] /var/log/libvirt/qemu/vguest1.log (i386 guest)
Created attachment 322452 [details] /var/log/libvirt/qemu/vguest2.log (x86_64 guest)
I experienced similar problems. Thsi might be a regression, have a look at: http://article.gmane.org/gmane.comp.emulators.kvm.devel/21423
jlaska: could you try and reproduce with the e1000 and virtio NICs ? The default is rtl8139 You can do that by using "virsh edit MyDomain" to edit the guests definition and changing the <interface> to add a <model> tag: <interface type='bridge'> ... <model type='virtio'/> </interface> Also, note that if you tell virt-install (with e.g. --os-variant=fedora10) or virt-manager that you are install F9/F10 it will use virtio automatically. (It's not the timer issue I suggested in the link Fabian points to; that's fixed in F10)
Is the vm guest on a host connected to the network via a 1 GiG link? If so, you might post the guests /proc/net/softnet_stat after stalling.
markmc: After make the suggested change noted in comment#7 ... I'm not able to reproduce this anymore
Okay, so we need to do further debugging to figure out whether this is a qemu rtl8139 emulation issue or a problem with the driver in the guest. The fact that upstream 2.6.28-rc2 guest kernel worked for me suggests the latter, but it still needs further confirmation.
I think I'd still bet on a bug in QEMU emulation of rtl8139 - if new kernel works I'm more inclined to think its merely changed somehow to avoid tickling a QEMU bug. FYI I checked Xen's QEMU tree which traditionally has a lot of rtl8139 fixes, but there's only one now that isn't in upstream QEMU, and the comment suggests it is only relevant for windows changeset: 17420:40c0dda6eae6 user: Keir Fraser <keir.fraser> date: Wed Apr 09 16:03:40 2008 +0100 files: tools/ioemu/hw/rtl8139.c description: ioemu: Fix rtl8139 emulation so that reboot works correctly in 64-bit Windows VMs. Return an error if the guest OS tries to transmit a packet with the transmitter disabled, so that it doesn't spin forever waiting for it to complete. Signed-off-by: Steven Smith <Steven.Smith.com>
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle. Changing version to '10'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
I've been seeing the network stall, causing net based virt-install's to hang too, while doing RHEL-5.3 and F-10 test installs inside a F-10 kvm guest. Today I've been doing some RHEL-4.8 installs and those outright oops as soon as the stage1 loader tries to download install.img . I've also been seeing the virtual network sometimes being painfully slow (atleast 10 times as slow as normal) strange enough passing --sound to the virt-install command fixes this slowness (this was observed with rhel5.3 test installs), so this might be irq-routing related ?? Note that I've only seen the network stalls in the fast case, iow in the case where I passed --sound (I've never bother to finish a slow install). I've done the following to work around the 100% reproducable rhel4 oops: --- FullVirtGuest.py~ 2008-12-11 09:55:39.000000000 +0100 +++ FullVirtGuest.py 2008-12-11 09:55:39.000000000 +0100 @@ -68,7 +68,10 @@ "rhel3": { "label": "Red Hat Enterprise Linux 3", "distro": "rhel" }, "rhel4": { "label": "Red Hat Enterprise Linux 4", - "distro": "rhel" }, + "distro": "rhel", + "devices" : { + "net" : { "model" : [ (["kvm"], "e1000") ] } + }}, "rhel5": { "label": "Red Hat Enterprise Linux 5", "distro": "rhel" }, "fedora5": { "label": "Fedora Core 5", "distro": "fedora" }, So this definitively is an issue with the rtl8139 support, may I suggest changing the default to e1000 as a workaround until this is fixed ?
(In reply to comment #13) > I've been seeing the network stall, causing net based virt-install's to hang > too, while doing RHEL-5.3 and F-10 test installs inside a F-10 kvm guest. Today > I've been doing some RHEL-4.8 installs and those outright oops as soon as the > stage1 loader tries to download install.img . FYI; this last one is *probably* a bug in the 4.8 kernel; it's BZ 474479 (despite being called an ia64 issue, it affects all RHEL-4 kernels). Chris Lalancette
(In reply to comment #14) > FYI; this last one is *probably* a bug in the 4.8 kernel; it's BZ 474479 > (despite being called an ia64 issue, it affects all RHEL-4 kernels). > > Chris Lalancette If that is the case may I then advocate to apply my workaround from comment 13 to python-virtinst ?
Your workaround from Comment #13 will have no effect on the RHEL-4 bug; it's in the generic network stack, so it will effect all drivers. Chris Lalancette
(In reply to comment #16) > Your workaround from Comment #13 will have no effect on the RHEL-4 bug; it's in > the generic network stack, so it will effect all drivers. > > Chris Lalancette In that case that is not the bug I'm hitting, as I've just completed a RHEL-4.8 nightly install with my workaround, where as without it it wouldn't even start to download install.img .
Please get a backtrace of the RHEL-4.8 crash, so we can compare it with the other BZ. Chris Lalancette
Created attachment 326600 [details] Screenshot showing the kernel panic when trying to install RHEL4.8 under kvm Ok, here is a screenshot of the kernel panic (not an oops, but a panic, sorry I wasn't clear before) I get when trying to install rhel4.8 i386 latest nightly in kvm. I also have a different dump, although I believe the cause is the same, I'll attach that too.
Created attachment 326601 [details] Screenshot showing slightly different kernel panic when trying to install RHEL4.8 under kvm
OK, yeah. That is the same bug as BZ 474479. The thing is, it doesn't necessarily happen all of the time, and certain things tickle it more than others. In any case, we can't work around all guest bugs, especially ones in pre-released versions. The above bug will be fixed for 4.8 (there's already a patch pending), so there is no real need for the patch in Comment #13. Whether we change the default to e1000 to work around other slowness issues is up in the air. Chris Lalancette
*** Bug 476452 has been marked as a duplicate of this bug. ***
Right, so in Fedora 10 proper I have the same issue across the board with RHEL5.x installs being extremely slow. Essentially, watching from vty3 during install it takes approximately 3 minutes and 10 seconds for url/images/updates.img to not be found before moving onto url/disc1/updates.img for another 3 minutes and 10 seconds, then onto url/images/product.img for yet another 3 minutes and 10 seconds, etc... This is killing me... # virt-install -n guest -r 512 -s 5 --os-type=linux --os-variant=rhel5 --accelerate -l http://url/RHEL-5-Server/U1/i386/os -x "text" -f /var/lib/libvirt/images/guest.img I am happy to provide whatever information is necessary.
james: in order to work around the issue, try hacking virt-install to use e1000 for 5.x guests like Hans did in comment #13 If anyone can check whether this is reproducible with an F11 host, that would be very helpful
I still see stalls using QEMU from F11 host with rtl8139 nic. Sometimes it just gets stuck while anaconda is downloading the stage2 image, othertimes it gets stuck during download of the RPMs. This seems more flakey that before, because rtl8193 used to work reasonably reliably in the past.
Just a note: I'm hitting the same problem but using the virtio driver instead of the rtl8139 driver while installing F10. I was able to work around it by switching it to e1000.
(In reply to comment #26) > Just a note: I'm hitting the same problem but using the virtio driver instead > of the rtl8139 driver while installing F10. I was able to work around it by > switching it to e1000. Please file a new bug about your virtio hang - they are quite likely to be different issues
This message is a reminder that Fedora 10 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 10. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '10'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 10's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 10 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Well, it seems as if this issue is solved or a known restriction. Can we close this bug?
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.