Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 651639

Summary:	On AMD host, running an F14 guest with 2 cores assigned hangs for "a long time" (several 10's of minutes) at start of boot
Product:	Red Hat Enterprise Linux 6	Reporter:	Zachary Amsden <zamsden>
Component:	kernel	Assignee:	Zachary Amsden <zamsden>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.1	CC:	amit.shah, bcao, berrange, bsarathy, clalance, dwmw2, ehabkost, gcosta, ikent, itamar, jaswinder, jforbes, knoel, laine, lihuang, markmc, mtosatti, notting, ondrejj, quintela, scottt.tw, syeghiay, tburke, zamsden
Target Milestone:	beta	Keywords:	Triaged
Target Release:	6.1
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	kernel-2.6.32-115.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	644973
Clones:	654912 (view as bug list)		Environment:
Last Closed:	2011-05-23 20:28:42 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	644973
Bug Blocks:	580951, 654912, 689904

Description Zachary Amsden 2010-11-09 23:48:17 UTC

+++ This bug was initially created as a clone of Bug #644973 +++

Cloning this and requesting dev ack for RHEL 6.1

Created attachment 454618 [details]
kernel info from /var/log/messages

I have an AMD Thuban 1055T, which is a 6 core Phenom II, and have installed the F14 Beta from Sept 28, then updated from the updates-testing repo to Oct 19.

When I use virt-manager to create a guest with 1 vcpu assigned, and point it at the same F14 DVD ISO, it installs, then boots the OS with no issues.


If I change the config for that guest to have 2 vcpus assigned, or create a new guest with 2 vcpus and try to boot the install ISO, it hangs just after the screen is cleared past the initial SeaBIOS post screen (ie, before the "shades of blue" progress bar begins). This "hang" continues for "a very long time" (I haven't yet caught it as it finally continued, but did witness it hanging for at least 30 minutes) before it finally decides to continue the boot process. During this time, "top" on the host shows that qemu is using 124% of CPU time (I've been unable to attach gdb to the qemu-kvm process to get a backtrace)

None of my other guests have this problem (I've tried RHEL5, F13, and WinXP - they all work fine with 2 vcpus).

I also installed the same F14 Beta (then updated to Oct 19) on Intel Xeon 8-core hardware (an IBM Thinkstation), and an install of F14 on a 2-core guest completed with no problems.

I can provide login credentials and exclusive access to this particular machine if necessary.


Note that I have tried installing the upstream qemu-kvm and kvm packages on this same AMD machine, and once I've done that, I'm unable to get any F13 or F14 guest to boot properly, even with a single vcpu assigned (RHEL5 and WinXP still work fine single or multi cpu), so the utility of a comparison/bi-sect is dubious.

Here is the qemu commandline that's issued by libvirt:


LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.13 -cpu phenom,+wdt,+skinit,+osvw,+3dnowprefetch,+misalignsse,+sse4a,+abm,+cr8legacy,+extapic,+cmp_legacy,+lahf_lm,+rdtscp,+pdpe1gb,+popcnt,+cx16,+ht,+vme -enable-nesting -enable-kvm -m 2048 -smp 2,sockets=1,cores=6,threads=1 -name f14hvirt -uuid 24feae2d-3335-3dc3-297f-6ee826d6e634 -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/f14hvirt.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -boot c -drive file=/var/lib/libvirt/images/f14hvirt-1.img,if=none,id=drive-virtio-disk0,boot=on,format=raw -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -device virtio-net-pci,vlan=0,id=net0,mac=52:54:00:b3:e5:43,bus=pci.0,addr=0x3 -net tap,fd=45,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:2 -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 
char device redirected to /dev/pts/0

Note that a line like the following will also trigger the problem (ie less specification of CPU features):

LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.13 -enable-kvm -m 900 -smp 2,sockets=2,cores=1,threads=1 -name f14alphatest -uuid 335e5300-e9a8-fd86-0986-748f02a8e69e -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/f14alphatest.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -no-reboot -boot d -drive file=/var/lib/libvirt/images/f14alphatest.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/dev/sr0,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -device virtio-net-pci,vlan=0,id=net0,mac=52:54:00:c0:62:6c,bus=pci.0,addr=0x3 -net tap,fd=54,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:3 -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

--- Additional comment from avi on 2010-10-21 06:52:30 EDT ---

Am interested in access.

--- Additional comment from laine on 2010-10-21 08:09:43 EDT ---

Avi - the necessary info is in an IRC private chat I just sent. Contact me if it didn't show up.

--- Additional comment from avi on 2010-10-21 09:46:16 EDT ---

kvm.git with F14's qemu-kvm appears to work fine.

--- Additional comment from avi on 2010-10-21 09:47:41 EDT ---

qemu-kvm.git fails on kvm.git with a segmentation fault appears to be a different problem.

--- Additional comment from avi on 2010-10-21 11:01:02 EDT ---

Plain 2.6.35.6 fails, same way.

--- Additional comment from avi on 2010-10-21 11:12:51 EDT ---

2.6.36 works.  Bisecting.

--- Additional comment from avi on 2010-10-21 11:58:52 EDT ---

Works with -cpu ...,-kvmclock

Zachary, what do we need to backport to 2.6.35.6?  Marcelo, did you already send it?

--- Additional comment from zamsden on 2010-10-21 22:13:48 EDT ---

Backport list is probably quite short, although it's unclear if this is a host or guest problem or a mix of both.

Avi, did you bisect the host's qemu or the guest?

--- Additional comment from avi on 2010-10-22 04:00:27 EDT ---

The bisection was irrelevant.  The findings are:

  2.6.35.6: fails
  2.6.35.6, -kvmclock: works
  2.6.36: works

Conclusion: 2.6.35.6 is missing some patches that went into 2.6.36.

--- Additional comment from avi on 2010-10-22 04:01:08 EDT ---

Kernel versions above are for host kernel.

--- Additional comment from zamsden on 2010-10-25 20:15:33 EDT ---

These two patches just went to stable..


2.6.35-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Marcelo Tosatti <mtosatti>

commit 58877679fd393d3ef71aa383031ac7817561463d upstream.

On reset, VMCB TSC should be set to zero.  Instead, code was setting
tsc_offset to zero, which passes through the underlying TSC.

Signed-off-by: Zachary Amsden <zamsden>
Signed-off-by: Marcelo Tosatti <mtosatti>
Signed-off-by: Greg Kroah-Hartman <gregkh>

2.6.35-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Marcelo Tosatti <mtosatti>

commit 47008cd887c1836bcadda123ba73e1863de7a6c4 upstream.

The VMCB is reset whenever we receive a startup IPI, so Linux is setting
TSC back to zero happens very late in the boot process and destabilizing
the TSC.  Instead, just set TSC to zero once at VCPU creation time.

Why the separate patch?  So git-bisect is your friend.

Signed-off-by: Zachary Amsden <zamsden>
Signed-off-by: Marcelo Tosatti <mtosatti>
Signed-off-by: Greg Kroah-Hartman <gregkh>

--- Additional comment from laine on 2010-10-27 12:09:01 EDT ---

Zach - can you give me the right git incantation to get a kernel source tree with these two patches? I have the kvm kernel tree already, if that helps.

--- Additional comment from zamsden on 2010-10-27 12:18:37 EDT ---

I believe you want to git-cherry-pick

Comment 3 Zachary Amsden 2010-11-18 17:53:10 UTC

No, this is not a host issue.  This is a KVM issue which needs to be fixed in the KVM module.

Comment 5 RHEL Program Management 2010-11-18 19:19:37 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 6 Zachary Amsden 2011-03-28 21:13:44 UTC

commit 9170170007aa8886583ada73c0ea1122231c708a
Author: Zachary Amsden <zamsden>
Date:   Thu Jan 27 16:57:31 2011 -0500

Patches are in RHEL6 tree...

Comment 9 Zachary Amsden 2011-03-29 19:18:33 UTC

This is already commited and in the tree... I tagged it with the wrong buzilla number apparently.

commit 0479353816108b3a1da7e803621d193f8dbcb0a2
Author: Zachary Amsden <zamsden>
Date:   Thu Jan 27 16:57:23 2011 -0500

    [virt] KVM: backport of SVM TSC init fixes
    
    Message-id: <1296147453-27805-1-git-send-email-zamsden>
    Patchwork-id: 32943
    O-Subject: [RHEL6.1 01/11] KVM: backport of SVM TSC init fixes
    Bugzilla: 651635
    RH-Acked-by: Rik van Riel <riel>
    RH-Acked-by: Marcelo Tosatti <mtosatti>
    RH-Acked-by: Glauber Costa <glommer>

Comment 11 Mike Cao 2011-04-12 09:32:14 UTC

Can not find core Phenom II CPU host.
Tried on Dual-Core AMD Opteron(tm) Processor 1216 instead.

Tried both kernel-2.6.32-71.el6 & kernel-2.6-32-130.el6 with the steps in comment #0,

Actual Results:
guest works fine with -smp 2,I can not reproduce this issue on both 2 kernels .

bcao---->ikent

Hi,Ian

Could you help verify it on core Phenom II CPU host ?

thanks,
Mike

Comment 14 Ian Kent 2011-04-18 10:06:57 UTC

I tested kernels 2.6.32-71.el6 and 2.6-32-130.el6 on F14.

The test consisted of setting an F14 VM to use 2 CPUs and then
booting the VM, shutting it down followed by booting it again,
then setting the CPUs to 1 and booting the VM, timing each by
using date. The definition of "booting" was from the time the
run triangle was pressed until the login screen was presented.

The results were:

Kernel: 2.6.32-71.el6.x86_64

2 CPUs:
Start: Mon Apr 18 17:10:19 WST 2011
Finish: Mon Apr 18 17:24:25 WST 2011

1 CPU:
Start: Mon Apr 18 17:30:36 WST 2011
Finish: Mon Apr 18 17:31:14 WST 2011

Kernel: 2.6.32-130.el6.x86_64

2 CPUs:
Start: Mon Apr 18 17:41:18 WST 2011
Finish: Mon Apr 18 17:41:50 WST 2011

1 CPU:
Start: Mon Apr 18 17:43:09 WST 2011
Finish: Mon Apr 18 17:43:41 WST 2011

Comment 15 Mike Cao 2011-04-18 10:22:23 UTC

Ian ,thanks very much !

Referring to comment #14 ,the issue was reproduced on kernel 2.6.32-71.el6.x86_64
 and verified on kernel 2.6.32-130.el6.x86_64

Based on bove ,move status to VERIFIED.

Comment 16 errata-xmlrpc 2011-05-23 20:28:42 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html