I have two guests that I use for most all of my testing of libvirt and qemu-kvm on a Fedora 19 host. After recently enabling the updates-testing repo and updating the kernel from 3.11.4-201 to 3.11.6-200, I found that both of these guests cause the host to lockup within some short time (sometimes shorter than others). Typical symptom is that everything except mouse movement stops after some time, and if I attempt to switch to a text console, then the mouse is lost too. I have to hit the reset button to recover. At least twice I've seen messages like this: kernel:[...] BUG: soft lockup - CPU#7 stuck for 22s! [qemu-system-x86:12034] This message would pop up in the console output several times, and the machine would finally lock up. Other times I didn't see this message. Once I was able to get a look at the ps output for the qemu process, and it showed the WCHAN was "futex_". Unfortunately I was unable to attach gdb to the qemu process (don't know if it would have given any useful information). I'm filing this as a kernel bug rather than qemu because this problem *completely* disappears if I just reboot the system with kernel 3.11.4-201. This bug is *very easily* reproducible (100%), and I'd be happy to do any experimentation requested to try and find the root cause. (because it's two different guests, I haven't bothered to post my guest config, but if you have trouble reproducing, I can do that too.)
Created attachment 819818 [details] simple libvirt guest config that locks up the host I've found that the trigger to the lockup is simply having a macvtap network device in the domain. I've attached a very simple libvirt config that reproduces the lockup 100%. When I replace the macvtap network device with a standard tap device, the guest starts up and runs normally. BTW, I upgraded the kernel to the latest in updates-testing (3.6.11-201) and it still locks.
There were no changes in macvtap specifically from 3.11.4 to 3.11.6 that I can remember. There was a change to virtio-net and tuntap: commit 68ab7445f56cea1c3ad4de6777d31395323a74e2 Author: Jason Wang <jasowang> Date: Tue Oct 15 11:18:59 2013 +0800 virtio-net: refill only when device is up during setting queues [ Upstream commit 35ed159bfd96a7547ec277ed8b550c7cbd9841b6 ] commit cfc85a8e6612dbf742b518a6cd7ec3c637822d63 Author: Jason Wang <jasowang> Date: Tue Oct 15 11:18:58 2013 +0800 virtio-net: don't respond to cpu hotplug notifier if we're not ready [ Upstream commit 3ab098df35f8b98b6553edc2e40234af512ba877 ] commit 6fc265f7a86d81e052508d03da555188f5882c3e Author: Jason Wang <jasowang> Date: Wed Sep 11 18:09:48 2013 +0800 tuntap: correctly handle error in tun_set_iff() [ Upstream commit 662ca437e714caaab855b12415d6ffd815985bc0 ] If you switch to using an emulated network device (e.g. e1000) in the guest, do the problems go away?
I switched the virtio to e1000. For about a minute I thought it might, but it turned out it just took it longer for the same thing to happen (I was able to get the boot screen in VNC and it started to boot). When I tried the same thing a second time, it happened almost immediately - not even enough time for the blank black guest screen to come up. So it seems it's not related to virtio.
Hi Laine: Could you please provides the full calltrace of the lockup? Thanks
Created attachment 820911 [details] dmesg output at time of lockup dmesg output of CPU backtraces at the time of the lockup (or "stall" as the dmesg output calls it) is attached. I've found that if I'm at a text console at the time I start the guest, the text console itself remains responsive. All network communication is cutoff, though, and if I switch to the X display, the keyboard becomes unresponsive and I can no longer switch back to the text console. Also, if it makes any difference, the CPU is an AMD FX-8350.
(In reply to Laine Stump from comment #5) > I've found that if I'm at a text console at the time I start the guest, the > text console itself remains responsive. Er... *sometimes*. I tried changing the network device back to virtio and starting from a text console, and the lockup was immediate and complete, so I wasn't even able to collect another dmesg.
Created attachment 822572 [details] a different stack trace from dmesg This is the complete dmesg output of the machine (from boot until the problem began occurring, just in case there may be something of significance in the boot logs). In this case I had started the guest then stopped it within a few seconds. The host didn't lock up, but host networking was immediately rendered unusable, and I repeatedly got the "CPU #n stuck for 22s!" message until I finally rebooted. (Maybe or maybe not interesting (due to too many differences in the two systems): I tried the same config and guest image on an F20 Intel i7 machine, and it runs with no problems. In that case, the kernel is 3.11.6-302 and qemu is 1.6.0-10.fc20, so it's unclear if the changed behavior is due to the change in kernel, qemu, or the different hardware. )
Same lockup is also present with kernel-3.11.7-200.fc19.x86_64 and kernel-3.11.8-200.fc19.x86_64. No issue with kernel-3.11.4-201.fc19.x86_64.
Paul (and anyone else experiencing this lockup) - can you provide some basic information about your hardware? In particular, since I could only test an F20 kernel on Intel hardware (my locking system is AMD), I'd like to find out if that combination worked due to the different kernel, or different hardware). Also, just to verify - your host is also locking up only when a macvtap interface is present, and not otherwise, correct?
CPU: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz Other (host) hardware: $ lspci -nn 00:00.0 Host bridge [0600]: Intel Corporation 2nd Generation Core Processor Family DRAM Controller [8086:0100] (rev 09) 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port [8086:0101] (rev 09) 00:16.0 Communication controller [0780]: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 [8086:1c3a] (rev 04) 00:16.3 Serial controller [0700]: Intel Corporation 6 Series/C200 Series Chipset Family KT Controller [8086:1c3d] (rev 04) 00:19.0 Ethernet controller [0200]: Intel Corporation 82579LM Gigabit Network Connection [8086:1502] (rev 04) 00:1a.0 USB controller [0c03]: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 [8086:1c2d] (rev 04) 00:1b.0 Audio device [0403]: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller [8086:1c20] (rev 04) 00:1c.0 PCI bridge [0604]: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 [8086:1c10] (rev b4) 00:1c.2 PCI bridge [0604]: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 [8086:1c14] (rev b4) 00:1c.4 PCI bridge [0604]: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 [8086:1c18] (rev b4) 00:1d.0 USB controller [0c03]: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 [8086:1c26] (rev 04) 00:1e.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev a4) 00:1f.0 ISA bridge [0601]: Intel Corporation Q67 Express Chipset Family LPC Controller [8086:1c4e] (rev 04) 00:1f.2 SATA controller [0106]: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller [8086:1c02] (rev 04) 00:1f.3 SMBus [0c05]: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller [8086:1c22] (rev 04) 01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos [Radeon HD 6450/7450/8450] [1002:6779] 01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98] 03:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express [14e4:1659] (rev 21) 04:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos [Radeon HD 6450/7450/8450] [1002:6779] (rev ff) 04:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98] (rev ff) My guests do use macvtap, so it could be a factor. I haven't tested without macvtap to see if that's a workaround.
The first few lines in Laine's dump are these: [350307.310660] ------------[ cut here ]------------ [350307.310674] WARNING: CPU: 6 PID: 0 at net/sched/sch_generic.c:260 dev_watchdog+0x248/0x260() [350307.310678] NETDEV WATCHDOG: p42p1 (r8169): transmit queue 0 timed out so it could be a driver issue?
(In reply to Michael S. Tsirkin from comment #11) > The first few lines in Laine's dump are these: > > [350307.310660] ------------[ cut here ]------------ > [350307.310674] WARNING: CPU: 6 PID: 0 at net/sched/sch_generic.c:260 > dev_watchdog+0x248/0x260() > [350307.310678] NETDEV WATCHDOG: p42p1 (r8169): transmit queue 0 timed out > > so it could be a driver issue? Yeah, I meant to point out something strange about that line before, but forgot - there *isn't any* p42p1 netdev on the system, and never has been. There is a "p32p1" (which is handled by the r8169 driver), a "p13p1", and a "p13p2" (both handled by igb). Back on the subject - after seeing your comment, I did try moving the macvtap device from the r8169 interface to one of the igb's, and unfortunately it still hangs, although I haven't been able to collect any sort of dmesg/traceback, because the hang is immediate and absolute, providing no oppurtunity to see why the hang is occurring (even by starting up a script continuously emptying out the dmesg buffer to a file).
Tested using kernel-3.11.8-200.fc19.x86_64 to confirm relevance of macvtap (as requested in comment #9). Test 1 ====== Guest without macvtap - PASSED Guest has a single NIC using NAT (default provided by virt-manager GUI). qemu command: $ ps -ef | grep qemu qemu 3370 1 99 13:41 ? 00:00:05 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name SLES_11_trunk -S -machine pc-i440fx-1.4,accel=kvm,usb=off -m 8192 -smp 4,sockets=4,cores=1,threads=1 -uuid 922efb6f-5e68-f743-2c89-5a103da60762 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/SLES_11_trunk.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/media/sata/VirtualMachines/SLES_11_trunk.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:b9:6a,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5901,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 Note "-netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25" From "lsof -p 3370" qemu-syst 3370 qemu 24u CHR 10,200 0t0 10961 /dev/net/tun qemu-syst 3370 qemu 25u CHR 10,238 0t0 10963 /dev/vhost-net Test 2 ====== Guest with macvtap interfaces for direct access to physical LAN - FAILED Guest has three NICs, one with NAT the other two with macvtap onto two physical LANs. qemu command: qemu 3767 1 20 13:47 ? 00:00:32 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name NVR -S -machine pc-0.14,accel=kvm,usb=off -m 2048 -smp 1,sockets=1,cores=1,threads=1 -uuid f2d0befb-3ab2-e01c-3418-d9200faf6077 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/NVR.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot order=cd,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x9 -drive file=/media/sata/VirtualMachines/NVR_Live_Installer-4.5.0.347.x86_64.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/media/sata/VirtualMachines/NVR.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f5:13:ca,bus=pci.0,addr=0x3 -netdev tap,fd=27,id=hostnet1,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:70:63:90,bus=pci.0,addr=0x8 -netdev tap,fd=29,id=hostnet2,vhost=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:87:8d:b7,bus=pci.0,addr=0x7 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5902,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 From "lsof -p 3767" qemu-syst 3767 qemu 25u CHR 10,200 0t0 10961 /dev/net/tun qemu-syst 3767 qemu 26u CHR 10,238 0t0 10963 /dev/vhost-net qemu-syst 3767 qemu 27u CHR 247,1 0t0 177695 /dev/tap8 qemu-syst 3767 qemu 28u CHR 10,238 0t0 10963 /dev/vhost-net qemu-syst 3767 qemu 29u CHR 247,2 0t0 121573 /dev/tap9 qemu-syst 3767 qemu 30u CHR 10,238 0t0 10963 /dev/vhost-net /dev/tap8 and /dev/tap9 are used by macvtap interfaces. =================== Conclusion: Guests using macvtap cause F19 to lockup with kernels newer than kernel-3.11.4-201.fc19.x86_64. Kernel 3.11.4 works as expected.
I just installed kernel 3.11.9-200 and it still hangs reliably. Returning to 3.11.4-201 still solves the problem.
In comment #7 it's stated that the F20 kernel does not have this bug. What's different between the F19 and F20 kernels? What about the F20 kernel on an F19 system? What else can we do to collect more info? Would that help or is more kernel developer attention required instead?
I'm still seeing this bug on F20. Comment #7 happens to state that the F20 system they tested was not the same hardware as the F19 system that was seeing the problems.
And I just tried booting the latest rawhide kernel (kernel-3.13.0--0.rc2.git2.1.fc21.x86_64) on my F19 system and encountered the same problem. So this isn't a problem that is special to F19 and has been magically fixed in newer version. It is an ongoing problem in *all* kernels (at least all that have been tested) beyond 3.11.1-201.fc19, starting with 3.11.6-201.fc19. What would get this bug the best attention: 1) move to F20, 2) move to rawhide, 3) clone to F20 or rawhide, 4) change the priority or severity fields. As far as I can tall, whatever the problem is, it renders use of macvtap interfaces by virtual guests completely unusable (and in a very nasty way).
*** Bug 1038343 has been marked as a duplicate of this bug. ***
OK I have a theory. I noticed that br_stp_rcv is in these traces although packets come in from macvtap and so bridge is not handling these packets. Looking at code, I see this: static const struct stp_proto br_stp_proto = { .rcv = br_stp_rcv, }; static int __init br_init(void) { int err; err = stp_proto_register(&br_stp_proto); if (err < 0) { pr_err("bridge: can't register sap for STP\n"); return err; } so any stp packet will go to bridge. Now that one does: p = br_port_get_rcu(dev); if (!p) goto err; and finally static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev) { return rcu_dereference(dev->rx_handler_data); } So RX handler data is cast to bridge port, but we don't really know bridge is the rx handler. This results in a crash if macvtap is bound to device, since it also creates an rx handler.
So if this is the problem, the following will help (though it is clearly not the correct fix, need to think what the right fix using RCU properly): diff --git a/net/bridge/br_stp_bpdu.c b/net/bridge/br_stp_bpdu.c index 8660ea3..e1e494d 100644 --- a/net/bridge/br_stp_bpdu.c +++ b/net/bridge/br_stp_bpdu.c @@ -153,6 +153,9 @@ void br_stp_rcv(const struct stp_proto *proto, struct sk_buff *skb, if (buf[0] != 0 || buf[1] != 0 || buf[2] != 0) goto err; + if (dev->rx_handler != br_handle_frame) + goto err; + p = br_port_get_rcu(dev); if (!p) goto err;
pls test patch above
also please try removing all bridges in your system and removing the bridge module, see if the issue reproduces.
ok if true this was introduced by this patch: commit f350a0a87374418635689471606454abc7beaa3a Author: Jiri Pirko <jpirko> Date: Tue Jun 15 06:50:45 2010 +0000 bridge: use rx_handler_data pointer to store net_bridge_port pointer Register net_bridge_port pointer as rx_handler data pointer. As br_port is removed from struct net_device, another netdev priv_flag is added to indicate the device serves as a bridge port. Also rcuized pointers are now correctly dereferenced in br_fdb.c and in netfilter parts. Signed-off-by: Jiri Pirko <jpirko> Signed-off-by: David S. Miller <davem> and so a slighly cleaner hack: diff --git a/net/bridge/br_stp_bpdu.c b/net/bridge/br_stp_bpdu.c index 8660ea3..4bb0255 100644 --- a/net/bridge/br_stp_bpdu.c +++ b/net/bridge/br_stp_bpdu.c @@ -153,6 +153,9 @@ void br_stp_rcv(const struct stp_proto *proto, struct sk_buff *skb, if (buf[0] != 0 || buf[1] != 0 || buf[2] != 0) goto err; + if (!br_port_exists(dev)) + goto err; + p = br_port_get_rcu(dev); if (!p) goto err; but it is still wrong: br_port_exists requires rtnl and this is not called under rtnl lock. Cc jpirko who wrote the patch.
I reproduced using laine's guest config with current F20 kernel: first 'virsh start' attempt hung, and syslog started firing off soft lockup warnings to my terminal. Applied the patch, built the rpm, rebooted, guest seems to start fine, ping inside the guest works fine. So I think your analysis was correct, thanks Michael!
Yep. I was also finally able to test the patch in Comment 20 and it solves the hang. In addition to F19, F20, and rawhide, I just noticed that F18 is also at kernel 3.11.10 now, so presumably it has the same problem. If so, hopefully it will get the patch to fix it before it goes EOL.
I created rhel7 bz for this: bug 1039118
When there's been an patch sent to upstream, mst or Jiri please reference here so we can set this bug to POST.
patch posted upstream: http://patchwork.ozlabs.org/patch/297169/
Thanks Jiri. We'll grab this as soon as it hits Linus' tree.
Fixed in Fedora git.
kernel-3.12.5-302.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/FEDORA-2013-23445/kernel-3.12.5-302.fc20
kernel-3.12.5-200.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/kernel-3.12.5-200.fc19
I have tried the new kernel in the previous comment, and it does fix this problem. Thanks!
Package kernel-3.12.5-302.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.12.5-302.fc20' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-23445/kernel-3.12.5-302.fc20 then log in and leave karma (feedback).
Tested on 3.12.5 on F19. Issue fixed, karma added. Haven't upgrade to F20 yet. Thanks everyone.
kernel-3.12.5-200.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.
kernel-3.12.5-302.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.