526862 – [RHEL5 Xen]: Mask out CPU features by default

Bug 526862 - [RHEL5 Xen]: Mask out CPU features by default

Summary: [RHEL5 Xen]: Mask out CPU features by default

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel-xen
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Andrew Jones
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	711322 (view as bug list)
Depends On:
Blocks:	514489 514490 711070
TreeView+	depends on / blocked

Reported:	2009-10-02 07:08 UTC by Chris Lalancette
Modified:	2013-01-08 13:38 UTC (History)
CC List:	13 users (show)
Fixed In Version:	kernel-2.6.18-294.el5
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-02-21 03:26:24 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
cpuid whitelist function (8.21 KB, patch) 2011-10-07 15:57 UTC, Andrew Jones	no flags	Details \| Diff
Do not expose X86_FEATURE_POPCNT feature to avoid crash on migration to a host that doesn't have it (706 bytes, patch) 2011-10-18 13:11 UTC, Igor Mammedov	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2012:0150	0	normal	SHIPPED_LIVE	Moderate: Red Hat Enterprise Linux 5.8 kernel update	2012-02-21 07:35:24 UTC

Description Chris Lalancette 2009-10-02 07:08:15 UTC

Description of problem:
Trying to boot Xen guests on newer hardware is always an adventure.  One of the reasons for this is that, by default, the Xen hypervisor only masks out features it knows it can't support.  However, it can't know about support for newer features until the features are there.  So what it should instead do is to mask out *all* features, and then selectively enable the ones that it knows it can support.  Upstream currently does not work like this (nor does RHEL-5), so we will have to submit patches there.

Comment 1 Paolo Bonzini 2010-06-22 16:58:36 UTC

Chris,

considering upstream's handling of CPUID is completely different and done in the tools, the solution to this issue would likely have to be done twice for RHEL-5 and upstream.

It seems more palatable (if anything) to backport upstream's userspace handling of CPUID.  The patches are relatively large but quite self-contained, and it would make it easier to tweak the defaults without requiring kernel upgrades.  What do you think?

Comment 2 Chris Lalancette 2010-06-29 20:43:12 UTC

Hey Paolo,
     I'm OK with going with upstream's userspace implementation, though I'm not quite sure how it works.  In particular, does it *always* send a list of supported flags down to the hypervisor when starting a guest?  As long as there is always a whitelist (that will mask out things like GB pages, etc), then I think doing the userspace version would be just fine.  The only thing we'll have to be careful of is that since this is (probably?) a change to the hypervisor/tools ABI, we'll have to have a compat mode so that a new userspace could run on an older hypervisor.

Chris Lalancette

Comment 4 Andrew Jones 2011-06-14 07:52:08 UTC

Some features current I'd like to mask out haven't necessarily caused problems, but one never knows going into the future, and of course the idea behind this bug is to guard against features that don't currently exist.

One current feature I'd like to mask is X86_FEATURE_HT. This hasn't caused problems yet, but it does cause a warning to be output on every boot of RHEL6 PV guests.

CPU: Unsupported number of siblings

This is output from detect_ht(). After that warning, the guest kernel decides to to forget the whole thing and is fine. The warning could be avoided by simply masking the HT feature though.

Comment 5 Andrew Jones 2011-10-07 15:57:23 UTC

Created attachment 526923 [details]
cpuid whitelist function

Comment 7 Konrad Rzeszutek Wilk 2011-10-07 18:35:58 UTC

It looks like it could be quite useful in the upstream Xen? Why not post there as well?

Comment 8 Laszlo Ersek 2011-10-07 19:48:44 UTC

Hello Konrad,

it was our understanding (... any inaccuracy in representing my colleagues' understanding is my fault ...) that upstream Xen "has a mix of white and black listing depending on guest type and does its cpuid management in userspace".

The set of whitelisted features might be useful for upstream, but then it should be specified somewhere in the vm configs or another default setting in userspace, shouldn't it? (Eg. tools/libxc/xc_cpuid_x86.c, amd_xc_cpuid_policy() / intel_xc_cpuid_policy().)

Comment 9 Igor Mammedov 2011-10-10 12:50:52 UTC

*** Bug 711070 has been marked as a duplicate of this bug. ***

Comment 10 Igor Mammedov 2011-10-18 13:11:57 UTC

Created attachment 528802 [details]
Do not expose X86_FEATURE_POPCNT feature to avoid crash on migration to a host that doesn't have it

FC16 HVM will crash after migration with invalid op if it was started on host with X86_FEATURE_POPCNT feature but have been migrated to a host without it.

Attached patch, applied on top of white-listing-V2, fixes this issue.

Comment 14 Jarod Wilson 2011-10-27 13:09:32 UTC

Patch(es) available in kernel-2.6.18-294.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.

Comment 16 Andrew Jones 2011-11-06 10:27:34 UTC

*** Bug 711322 has been marked as a duplicate of this bug. ***

Comment 22 Qin Guan 2012-01-18 08:24:06 UTC

Testing of this problem is covered by running acceptance/functional test with
several Snapshot builds (from Snapshot1 to Snapshot4) on different CPU models. 

No any problem found during the testing, marked it as Verified:SanityOnly.

Comment 24 errata-xmlrpc 2012-02-21 03:26:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0150.html

Note You need to log in before you can comment on or make changes to this bug.