Red Hat Bugzilla – Bug 651639
On AMD host, running an F14 guest with 2 cores assigned hangs for "a long time" (several 10's of minutes) at start of boot
Last modified: 2013-01-09 18:19:03 EST
+++ This bug was initially created as a clone of Bug #644973 +++ Cloning this and requesting dev ack for RHEL 6.1 Created attachment 454618 [details] kernel info from /var/log/messages I have an AMD Thuban 1055T, which is a 6 core Phenom II, and have installed the F14 Beta from Sept 28, then updated from the updates-testing repo to Oct 19. When I use virt-manager to create a guest with 1 vcpu assigned, and point it at the same F14 DVD ISO, it installs, then boots the OS with no issues. If I change the config for that guest to have 2 vcpus assigned, or create a new guest with 2 vcpus and try to boot the install ISO, it hangs just after the screen is cleared past the initial SeaBIOS post screen (ie, before the "shades of blue" progress bar begins). This "hang" continues for "a very long time" (I haven't yet caught it as it finally continued, but did witness it hanging for at least 30 minutes) before it finally decides to continue the boot process. During this time, "top" on the host shows that qemu is using 124% of CPU time (I've been unable to attach gdb to the qemu-kvm process to get a backtrace) None of my other guests have this problem (I've tried RHEL5, F13, and WinXP - they all work fine with 2 vcpus). I also installed the same F14 Beta (then updated to Oct 19) on Intel Xeon 8-core hardware (an IBM Thinkstation), and an install of F14 on a 2-core guest completed with no problems. I can provide login credentials and exclusive access to this particular machine if necessary. Note that I have tried installing the upstream qemu-kvm and kvm packages on this same AMD machine, and once I've done that, I'm unable to get any F13 or F14 guest to boot properly, even with a single vcpu assigned (RHEL5 and WinXP still work fine single or multi cpu), so the utility of a comparison/bi-sect is dubious. Here is the qemu commandline that's issued by libvirt: LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.13 -cpu phenom,+wdt,+skinit,+osvw,+3dnowprefetch,+misalignsse,+sse4a,+abm,+cr8legacy,+extapic,+cmp_legacy,+lahf_lm,+rdtscp,+pdpe1gb,+popcnt,+cx16,+ht,+vme -enable-nesting -enable-kvm -m 2048 -smp 2,sockets=1,cores=6,threads=1 -name f14hvirt -uuid 24feae2d-3335-3dc3-297f-6ee826d6e634 -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/f14hvirt.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -boot c -drive file=/var/lib/libvirt/images/f14hvirt-1.img,if=none,id=drive-virtio-disk0,boot=on,format=raw -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -device virtio-net-pci,vlan=0,id=net0,mac=52:54:00:b3:e5:43,bus=pci.0,addr=0x3 -net tap,fd=45,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:2 -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 char device redirected to /dev/pts/0 Note that a line like the following will also trigger the problem (ie less specification of CPU features): LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.13 -enable-kvm -m 900 -smp 2,sockets=2,cores=1,threads=1 -name f14alphatest -uuid 335e5300-e9a8-fd86-0986-748f02a8e69e -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/f14alphatest.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -no-reboot -boot d -drive file=/var/lib/libvirt/images/f14alphatest.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/dev/sr0,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -device virtio-net-pci,vlan=0,id=net0,mac=52:54:00:c0:62:6c,bus=pci.0,addr=0x3 -net tap,fd=54,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:3 -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 --- Additional comment from avi@redhat.com on 2010-10-21 06:52:30 EDT --- Am interested in access. --- Additional comment from laine@redhat.com on 2010-10-21 08:09:43 EDT --- Avi - the necessary info is in an IRC private chat I just sent. Contact me if it didn't show up. --- Additional comment from avi@redhat.com on 2010-10-21 09:46:16 EDT --- kvm.git with F14's qemu-kvm appears to work fine. --- Additional comment from avi@redhat.com on 2010-10-21 09:47:41 EDT --- qemu-kvm.git fails on kvm.git with a segmentation fault appears to be a different problem. --- Additional comment from avi@redhat.com on 2010-10-21 11:01:02 EDT --- Plain 2.6.35.6 fails, same way. --- Additional comment from avi@redhat.com on 2010-10-21 11:12:51 EDT --- 2.6.36 works. Bisecting. --- Additional comment from avi@redhat.com on 2010-10-21 11:58:52 EDT --- Works with -cpu ...,-kvmclock Zachary, what do we need to backport to 2.6.35.6? Marcelo, did you already send it? --- Additional comment from zamsden@redhat.com on 2010-10-21 22:13:48 EDT --- Backport list is probably quite short, although it's unclear if this is a host or guest problem or a mix of both. Avi, did you bisect the host's qemu or the guest? --- Additional comment from avi@redhat.com on 2010-10-22 04:00:27 EDT --- The bisection was irrelevant. The findings are: 2.6.35.6: fails 2.6.35.6, -kvmclock: works 2.6.36: works Conclusion: 2.6.35.6 is missing some patches that went into 2.6.36. --- Additional comment from avi@redhat.com on 2010-10-22 04:01:08 EDT --- Kernel versions above are for host kernel. --- Additional comment from zamsden@redhat.com on 2010-10-25 20:15:33 EDT --- These two patches just went to stable.. 2.6.35-stable review patch. If anyone has any objections, please let us know. ------------------ From: Marcelo Tosatti <mtosatti@redhat.com> commit 58877679fd393d3ef71aa383031ac7817561463d upstream. On reset, VMCB TSC should be set to zero. Instead, code was setting tsc_offset to zero, which passes through the underlying TSC. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> 2.6.35-stable review patch. If anyone has any objections, please let us know. ------------------ From: Marcelo Tosatti <mtosatti@redhat.com> commit 47008cd887c1836bcadda123ba73e1863de7a6c4 upstream. The VMCB is reset whenever we receive a startup IPI, so Linux is setting TSC back to zero happens very late in the boot process and destabilizing the TSC. Instead, just set TSC to zero once at VCPU creation time. Why the separate patch? So git-bisect is your friend. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> --- Additional comment from laine@redhat.com on 2010-10-27 12:09:01 EDT --- Zach - can you give me the right git incantation to get a kernel source tree with these two patches? I have the kvm kernel tree already, if that helps. --- Additional comment from zamsden@redhat.com on 2010-10-27 12:18:37 EDT --- I believe you want to git-cherry-pick
No, this is not a host issue. This is a KVM issue which needs to be fixed in the KVM module.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
commit 9170170007aa8886583ada73c0ea1122231c708a Author: Zachary Amsden <zamsden@redhat.com> Date: Thu Jan 27 16:57:31 2011 -0500 Patches are in RHEL6 tree...
This is already commited and in the tree... I tagged it with the wrong buzilla number apparently. commit 0479353816108b3a1da7e803621d193f8dbcb0a2 Author: Zachary Amsden <zamsden@redhat.com> Date: Thu Jan 27 16:57:23 2011 -0500 [virt] KVM: backport of SVM TSC init fixes Message-id: <1296147453-27805-1-git-send-email-zamsden@redhat.com> Patchwork-id: 32943 O-Subject: [RHEL6.1 01/11] KVM: backport of SVM TSC init fixes Bugzilla: 651635 RH-Acked-by: Rik van Riel <riel@redhat.com> RH-Acked-by: Marcelo Tosatti <mtosatti@redhat.com> RH-Acked-by: Glauber Costa <glommer@redhat.com>
Can not find core Phenom II CPU host. Tried on Dual-Core AMD Opteron(tm) Processor 1216 instead. Tried both kernel-2.6.32-71.el6 & kernel-2.6-32-130.el6 with the steps in comment #0, Actual Results: guest works fine with -smp 2,I can not reproduce this issue on both 2 kernels . bcao---->ikent@redhat.com Hi,Ian Could you help verify it on core Phenom II CPU host ? thanks, Mike
I tested kernels 2.6.32-71.el6 and 2.6-32-130.el6 on F14. The test consisted of setting an F14 VM to use 2 CPUs and then booting the VM, shutting it down followed by booting it again, then setting the CPUs to 1 and booting the VM, timing each by using date. The definition of "booting" was from the time the run triangle was pressed until the login screen was presented. The results were: Kernel: 2.6.32-71.el6.x86_64 2 CPUs: Start: Mon Apr 18 17:10:19 WST 2011 Finish: Mon Apr 18 17:24:25 WST 2011 1 CPU: Start: Mon Apr 18 17:30:36 WST 2011 Finish: Mon Apr 18 17:31:14 WST 2011 Kernel: 2.6.32-130.el6.x86_64 2 CPUs: Start: Mon Apr 18 17:41:18 WST 2011 Finish: Mon Apr 18 17:41:50 WST 2011 1 CPU: Start: Mon Apr 18 17:43:09 WST 2011 Finish: Mon Apr 18 17:43:41 WST 2011
Ian ,thanks very much ! Referring to comment #14 ,the issue was reproduced on kernel 2.6.32-71.el6.x86_64 and verified on kernel 2.6.32-130.el6.x86_64 Based on bove ,move status to VERIFIED.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html