Bug 644973

Summary: On an AMD F14 host, running an F14 guest with 2 cores assigned hangs for "a long time" (several 10's of minutes) at start of boot
Product: [Fedora] Fedora Reporter: Laine Stump <laine>
Component: qemuAssignee: Zachary Amsden <zamsden>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 14CC: amit.shah, berrange, clalance, digimer, dwmw2, ehabkost, extras-orphan, gcosta, ikent, itamar, jaswinder, jforbes, knoel, markmc, mtosatti, notting, ondrejj, quintela, scottt.tw, virt-maint, zamsden
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 651639 (view as bug list) Environment:
Last Closed: 2010-12-09 12:07:13 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 651639, 654912    
Attachments:
Description Flags
kernel info from /var/log/messages none

Description Laine Stump 2010-10-20 13:19:08 EDT
Created attachment 454618 [details]
kernel info from /var/log/messages

I have an AMD Thuban 1055T, which is a 6 core Phenom II, and have installed the F14 Beta from Sept 28, then updated from the updates-testing repo to Oct 19.

When I use virt-manager to create a guest with 1 vcpu assigned, and point it at the same F14 DVD ISO, it installs, then boots the OS with no issues.


If I change the config for that guest to have 2 vcpus assigned, or create a new guest with 2 vcpus and try to boot the install ISO, it hangs just after the screen is cleared past the initial SeaBIOS post screen (ie, before the "shades of blue" progress bar begins). This "hang" continues for "a very long time" (I haven't yet caught it as it finally continued, but did witness it hanging for at least 30 minutes) before it finally decides to continue the boot process. During this time, "top" on the host shows that qemu is using 124% of CPU time (I've been unable to attach gdb to the qemu-kvm process to get a backtrace)

None of my other guests have this problem (I've tried RHEL5, F13, and WinXP - they all work fine with 2 vcpus).

I also installed the same F14 Beta (then updated to Oct 19) on Intel Xeon 8-core hardware (an IBM Thinkstation), and an install of F14 on a 2-core guest completed with no problems.

I can provide login credentials and exclusive access to this particular machine if necessary.


Note that I have tried installing the upstream qemu-kvm and kvm packages on this same AMD machine, and once I've done that, I'm unable to get any F13 or F14 guest to boot properly, even with a single vcpu assigned (RHEL5 and WinXP still work fine single or multi cpu), so the utility of a comparison/bi-sect is dubious.

Here is the qemu commandline that's issued by libvirt:


LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.13 -cpu phenom,+wdt,+skinit,+osvw,+3dnowprefetch,+misalignsse,+sse4a,+abm,+cr8legacy,+extapic,+cmp_legacy,+lahf_lm,+rdtscp,+pdpe1gb,+popcnt,+cx16,+ht,+vme -enable-nesting -enable-kvm -m 2048 -smp 2,sockets=1,cores=6,threads=1 -name f14hvirt -uuid 24feae2d-3335-3dc3-297f-6ee826d6e634 -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/f14hvirt.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -boot c -drive file=/var/lib/libvirt/images/f14hvirt-1.img,if=none,id=drive-virtio-disk0,boot=on,format=raw -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -device virtio-net-pci,vlan=0,id=net0,mac=52:54:00:b3:e5:43,bus=pci.0,addr=0x3 -net tap,fd=45,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:2 -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 
char device redirected to /dev/pts/0

Note that a line like the following will also trigger the problem (ie less specification of CPU features):

LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.13 -enable-kvm -m 900 -smp 2,sockets=2,cores=1,threads=1 -name f14alphatest -uuid 335e5300-e9a8-fd86-0986-748f02a8e69e -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/f14alphatest.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -no-reboot -boot d -drive file=/var/lib/libvirt/images/f14alphatest.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/dev/sr0,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -device virtio-net-pci,vlan=0,id=net0,mac=52:54:00:c0:62:6c,bus=pci.0,addr=0x3 -net tap,fd=54,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:3 -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
Comment 1 Avi Kivity 2010-10-21 06:52:30 EDT
Am interested in access.
Comment 2 Laine Stump 2010-10-21 08:09:43 EDT
Avi - the necessary info is in an IRC private chat I just sent. Contact me if it didn't show up.
Comment 3 Avi Kivity 2010-10-21 09:46:16 EDT
kvm.git with F14's qemu-kvm appears to work fine.
Comment 4 Avi Kivity 2010-10-21 09:47:41 EDT
qemu-kvm.git fails on kvm.git with a segmentation fault appears to be a different problem.
Comment 5 Avi Kivity 2010-10-21 11:01:02 EDT
Plain 2.6.35.6 fails, same way.
Comment 6 Avi Kivity 2010-10-21 11:12:51 EDT
2.6.36 works.  Bisecting.
Comment 7 Avi Kivity 2010-10-21 11:58:52 EDT
Works with -cpu ...,-kvmclock

Zachary, what do we need to backport to 2.6.35.6?  Marcelo, did you already send it?
Comment 8 Zachary Amsden 2010-10-21 22:13:48 EDT
Backport list is probably quite short, although it's unclear if this is a host or guest problem or a mix of both.

Avi, did you bisect the host's qemu or the guest?
Comment 9 Avi Kivity 2010-10-22 04:00:27 EDT
The bisection was irrelevant.  The findings are:

  2.6.35.6: fails
  2.6.35.6, -kvmclock: works
  2.6.36: works

Conclusion: 2.6.35.6 is missing some patches that went into 2.6.36.
Comment 10 Avi Kivity 2010-10-22 04:01:08 EDT
Kernel versions above are for host kernel.
Comment 11 Zachary Amsden 2010-10-25 20:15:33 EDT
These two patches just went to stable..


2.6.35-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Marcelo Tosatti <mtosatti@redhat.com>

commit 58877679fd393d3ef71aa383031ac7817561463d upstream.

On reset, VMCB TSC should be set to zero.  Instead, code was setting
tsc_offset to zero, which passes through the underlying TSC.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

2.6.35-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Marcelo Tosatti <mtosatti@redhat.com>

commit 47008cd887c1836bcadda123ba73e1863de7a6c4 upstream.

The VMCB is reset whenever we receive a startup IPI, so Linux is setting
TSC back to zero happens very late in the boot process and destabilizing
the TSC.  Instead, just set TSC to zero once at VCPU creation time.

Why the separate patch?  So git-bisect is your friend.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Comment 12 Laine Stump 2010-10-27 12:09:01 EDT
Zach - can you give me the right git incantation to get a kernel source tree with these two patches? I have the kvm kernel tree already, if that helps.
Comment 13 Zachary Amsden 2010-10-27 12:18:37 EDT
I believe you want to git-cherry-pick
Comment 14 digimer 2010-11-12 17:10:38 EST
*** Bug 652489 has been marked as a duplicate of this bug. ***
Comment 15 Justin M. Forbes 2010-12-09 12:07:13 EST
This should be resolved in kernel-2.6.35.9-64.fc14
Comment 16 Ian Kent 2010-12-14 22:40:39 EST
(In reply to comment #15)
> This should be resolved in kernel-2.6.35.9-64.fc14

Appears to have resolved the problem for me.
Thanks.