This service will be undergoing maintenance at 00:00 UTC, 2016-09-28. It is expected to last about 1 hours
Bug 651639 - On AMD host, running an F14 guest with 2 cores assigned hangs for "a long time" (several 10's of minutes) at start of boot
On AMD host, running an F14 guest with 2 cores assigned hangs for "a long tim...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.1
All Linux
high Severity high
: beta
: 6.1
Assigned To: Zachary Amsden
Virtualization Bugs
: Triaged
Depends On: 644973
Blocks: Rhel6KvmTier1 654912 689904
  Show dependency treegraph
 
Reported: 2010-11-09 18:48 EST by Zachary Amsden
Modified: 2013-01-09 18:19 EST (History)
24 users (show)

See Also:
Fixed In Version: kernel-2.6.32-115.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 644973
: 654912 (view as bug list)
Environment:
Last Closed: 2011-05-23 16:28:42 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Zachary Amsden 2010-11-09 18:48:17 EST
+++ This bug was initially created as a clone of Bug #644973 +++

Cloning this and requesting dev ack for RHEL 6.1

Created attachment 454618 [details]
kernel info from /var/log/messages

I have an AMD Thuban 1055T, which is a 6 core Phenom II, and have installed the F14 Beta from Sept 28, then updated from the updates-testing repo to Oct 19.

When I use virt-manager to create a guest with 1 vcpu assigned, and point it at the same F14 DVD ISO, it installs, then boots the OS with no issues.


If I change the config for that guest to have 2 vcpus assigned, or create a new guest with 2 vcpus and try to boot the install ISO, it hangs just after the screen is cleared past the initial SeaBIOS post screen (ie, before the "shades of blue" progress bar begins). This "hang" continues for "a very long time" (I haven't yet caught it as it finally continued, but did witness it hanging for at least 30 minutes) before it finally decides to continue the boot process. During this time, "top" on the host shows that qemu is using 124% of CPU time (I've been unable to attach gdb to the qemu-kvm process to get a backtrace)

None of my other guests have this problem (I've tried RHEL5, F13, and WinXP - they all work fine with 2 vcpus).

I also installed the same F14 Beta (then updated to Oct 19) on Intel Xeon 8-core hardware (an IBM Thinkstation), and an install of F14 on a 2-core guest completed with no problems.

I can provide login credentials and exclusive access to this particular machine if necessary.


Note that I have tried installing the upstream qemu-kvm and kvm packages on this same AMD machine, and once I've done that, I'm unable to get any F13 or F14 guest to boot properly, even with a single vcpu assigned (RHEL5 and WinXP still work fine single or multi cpu), so the utility of a comparison/bi-sect is dubious.

Here is the qemu commandline that's issued by libvirt:


LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.13 -cpu phenom,+wdt,+skinit,+osvw,+3dnowprefetch,+misalignsse,+sse4a,+abm,+cr8legacy,+extapic,+cmp_legacy,+lahf_lm,+rdtscp,+pdpe1gb,+popcnt,+cx16,+ht,+vme -enable-nesting -enable-kvm -m 2048 -smp 2,sockets=1,cores=6,threads=1 -name f14hvirt -uuid 24feae2d-3335-3dc3-297f-6ee826d6e634 -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/f14hvirt.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -boot c -drive file=/var/lib/libvirt/images/f14hvirt-1.img,if=none,id=drive-virtio-disk0,boot=on,format=raw -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -device virtio-net-pci,vlan=0,id=net0,mac=52:54:00:b3:e5:43,bus=pci.0,addr=0x3 -net tap,fd=45,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:2 -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 
char device redirected to /dev/pts/0

Note that a line like the following will also trigger the problem (ie less specification of CPU features):

LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.13 -enable-kvm -m 900 -smp 2,sockets=2,cores=1,threads=1 -name f14alphatest -uuid 335e5300-e9a8-fd86-0986-748f02a8e69e -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/f14alphatest.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -no-reboot -boot d -drive file=/var/lib/libvirt/images/f14alphatest.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/dev/sr0,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -device virtio-net-pci,vlan=0,id=net0,mac=52:54:00:c0:62:6c,bus=pci.0,addr=0x3 -net tap,fd=54,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:3 -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

--- Additional comment from avi@redhat.com on 2010-10-21 06:52:30 EDT ---

Am interested in access.

--- Additional comment from laine@redhat.com on 2010-10-21 08:09:43 EDT ---

Avi - the necessary info is in an IRC private chat I just sent. Contact me if it didn't show up.

--- Additional comment from avi@redhat.com on 2010-10-21 09:46:16 EDT ---

kvm.git with F14's qemu-kvm appears to work fine.

--- Additional comment from avi@redhat.com on 2010-10-21 09:47:41 EDT ---

qemu-kvm.git fails on kvm.git with a segmentation fault appears to be a different problem.

--- Additional comment from avi@redhat.com on 2010-10-21 11:01:02 EDT ---

Plain 2.6.35.6 fails, same way.

--- Additional comment from avi@redhat.com on 2010-10-21 11:12:51 EDT ---

2.6.36 works.  Bisecting.

--- Additional comment from avi@redhat.com on 2010-10-21 11:58:52 EDT ---

Works with -cpu ...,-kvmclock

Zachary, what do we need to backport to 2.6.35.6?  Marcelo, did you already send it?

--- Additional comment from zamsden@redhat.com on 2010-10-21 22:13:48 EDT ---

Backport list is probably quite short, although it's unclear if this is a host or guest problem or a mix of both.

Avi, did you bisect the host's qemu or the guest?

--- Additional comment from avi@redhat.com on 2010-10-22 04:00:27 EDT ---

The bisection was irrelevant.  The findings are:

  2.6.35.6: fails
  2.6.35.6, -kvmclock: works
  2.6.36: works

Conclusion: 2.6.35.6 is missing some patches that went into 2.6.36.

--- Additional comment from avi@redhat.com on 2010-10-22 04:01:08 EDT ---

Kernel versions above are for host kernel.

--- Additional comment from zamsden@redhat.com on 2010-10-25 20:15:33 EDT ---

These two patches just went to stable..


2.6.35-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Marcelo Tosatti <mtosatti@redhat.com>

commit 58877679fd393d3ef71aa383031ac7817561463d upstream.

On reset, VMCB TSC should be set to zero.  Instead, code was setting
tsc_offset to zero, which passes through the underlying TSC.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

2.6.35-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Marcelo Tosatti <mtosatti@redhat.com>

commit 47008cd887c1836bcadda123ba73e1863de7a6c4 upstream.

The VMCB is reset whenever we receive a startup IPI, so Linux is setting
TSC back to zero happens very late in the boot process and destabilizing
the TSC.  Instead, just set TSC to zero once at VCPU creation time.

Why the separate patch?  So git-bisect is your friend.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

--- Additional comment from laine@redhat.com on 2010-10-27 12:09:01 EDT ---

Zach - can you give me the right git incantation to get a kernel source tree with these two patches? I have the kvm kernel tree already, if that helps.

--- Additional comment from zamsden@redhat.com on 2010-10-27 12:18:37 EDT ---

I believe you want to git-cherry-pick
Comment 3 Zachary Amsden 2010-11-18 12:53:10 EST
No, this is not a host issue.  This is a KVM issue which needs to be fixed in the KVM module.
Comment 5 RHEL Product and Program Management 2010-11-18 14:19:37 EST
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.
Comment 6 Zachary Amsden 2011-03-28 17:13:44 EDT
commit 9170170007aa8886583ada73c0ea1122231c708a
Author: Zachary Amsden <zamsden@redhat.com>
Date:   Thu Jan 27 16:57:31 2011 -0500

Patches are in RHEL6 tree...
Comment 9 Zachary Amsden 2011-03-29 15:18:33 EDT
This is already commited and in the tree... I tagged it with the wrong buzilla number apparently.

commit 0479353816108b3a1da7e803621d193f8dbcb0a2
Author: Zachary Amsden <zamsden@redhat.com>
Date:   Thu Jan 27 16:57:23 2011 -0500

    [virt] KVM: backport of SVM TSC init fixes
    
    Message-id: <1296147453-27805-1-git-send-email-zamsden@redhat.com>
    Patchwork-id: 32943
    O-Subject: [RHEL6.1 01/11] KVM: backport of SVM TSC init fixes
    Bugzilla: 651635
    RH-Acked-by: Rik van Riel <riel@redhat.com>
    RH-Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
    RH-Acked-by: Glauber Costa <glommer@redhat.com>
Comment 11 Mike Cao 2011-04-12 05:32:14 EDT
Can not find core Phenom II CPU host.
Tried on Dual-Core AMD Opteron(tm) Processor 1216 instead.

Tried both kernel-2.6.32-71.el6 & kernel-2.6-32-130.el6 with the steps in comment #0,

Actual Results:
guest works fine with -smp 2,I can not reproduce this issue on both 2 kernels .

bcao---->ikent@redhat.com

Hi,Ian

Could you help verify it on core Phenom II CPU host ?

thanks,
Mike
Comment 14 Ian Kent 2011-04-18 06:06:57 EDT
I tested kernels 2.6.32-71.el6 and 2.6-32-130.el6 on F14.

The test consisted of setting an F14 VM to use 2 CPUs and then
booting the VM, shutting it down followed by booting it again,
then setting the CPUs to 1 and booting the VM, timing each by
using date. The definition of "booting" was from the time the
run triangle was pressed until the login screen was presented.

The results were:

Kernel: 2.6.32-71.el6.x86_64

2 CPUs:
Start: Mon Apr 18 17:10:19 WST 2011
Finish: Mon Apr 18 17:24:25 WST 2011

1 CPU:
Start: Mon Apr 18 17:30:36 WST 2011
Finish: Mon Apr 18 17:31:14 WST 2011

Kernel: 2.6.32-130.el6.x86_64

2 CPUs:
Start: Mon Apr 18 17:41:18 WST 2011
Finish: Mon Apr 18 17:41:50 WST 2011

1 CPU:
Start: Mon Apr 18 17:43:09 WST 2011
Finish: Mon Apr 18 17:43:41 WST 2011
Comment 15 Mike Cao 2011-04-18 06:22:23 EDT
Ian ,thanks very much !

Referring to comment #14 ,the issue was reproduced on kernel 2.6.32-71.el6.x86_64
 and verified on kernel 2.6.32-130.el6.x86_64

Based on bove ,move status to VERIFIED.
Comment 16 errata-xmlrpc 2011-05-23 16:28:42 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Note You need to log in before you can comment on or make changes to this bug.