Bug 526862

Summary: [RHEL5 Xen]: Mask out CPU features by default
Product: Red Hat Enterprise Linux 5 Reporter: Chris Lalancette <clalance>
Component: kernel-xenAssignee: Andrew Jones <drjones>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: bgollahe, drjones, imammedo, jzheng, ketuzsezr, leiwang, lersek, pbonzini, pcao, qguan, qwan, xen-maint, zliu
Target Milestone: rcKeywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.18-294.el5 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-21 03:26:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 514489, 514490, 711070    
Attachments:
Description Flags
cpuid whitelist function
none
Do not expose X86_FEATURE_POPCNT feature to avoid crash on migration to a host that doesn't have it none

Description Chris Lalancette 2009-10-02 07:08:15 UTC
Description of problem:
Trying to boot Xen guests on newer hardware is always an adventure.  One of the reasons for this is that, by default, the Xen hypervisor only masks out features it knows it can't support.  However, it can't know about support for newer features until the features are there.  So what it should instead do is to mask out *all* features, and then selectively enable the ones that it knows it can support.  Upstream currently does not work like this (nor does RHEL-5), so we will have to submit patches there.

Comment 1 Paolo Bonzini 2010-06-22 16:58:36 UTC
Chris,

considering upstream's handling of CPUID is completely different and done in the tools, the solution to this issue would likely have to be done twice for RHEL-5 and upstream.

It seems more palatable (if anything) to backport upstream's userspace handling of CPUID.  The patches are relatively large but quite self-contained, and it would make it easier to tweak the defaults without requiring kernel upgrades.  What do you think?

Comment 2 Chris Lalancette 2010-06-29 20:43:12 UTC
Hey Paolo,
     I'm OK with going with upstream's userspace implementation, though I'm not quite sure how it works.  In particular, does it *always* send a list of supported flags down to the hypervisor when starting a guest?  As long as there is always a whitelist (that will mask out things like GB pages, etc), then I think doing the userspace version would be just fine.  The only thing we'll have to be careful of is that since this is (probably?) a change to the hypervisor/tools ABI, we'll have to have a compat mode so that a new userspace could run on an older hypervisor.

Chris Lalancette

Comment 4 Andrew Jones 2011-06-14 07:52:08 UTC
Some features current I'd like to mask out haven't necessarily caused problems, but one never knows going into the future, and of course the idea behind this bug is to guard against features that don't currently exist.

One current feature I'd like to mask is X86_FEATURE_HT. This hasn't caused problems yet, but it does cause a warning to be output on every boot of RHEL6 PV guests.

CPU: Unsupported number of siblings

This is output from detect_ht(). After that warning, the guest kernel decides to to forget the whole thing and is fine. The warning could be avoided by simply masking the HT feature though.

Comment 5 Andrew Jones 2011-10-07 15:57:23 UTC
Created attachment 526923 [details]
cpuid whitelist function

Comment 7 Konrad Rzeszutek Wilk 2011-10-07 18:35:58 UTC
It looks like it could be quite useful in the upstream Xen? Why not post there as well?

Comment 8 Laszlo Ersek 2011-10-07 19:48:44 UTC
Hello Konrad,

it was our understanding (... any inaccuracy in representing my colleagues' understanding is my fault ...) that upstream Xen "has a mix of white and black listing depending on guest type and does its cpuid management in userspace".

The set of whitelisted features might be useful for upstream, but then it should be specified somewhere in the vm configs or another default setting in userspace, shouldn't it? (Eg. tools/libxc/xc_cpuid_x86.c, amd_xc_cpuid_policy() / intel_xc_cpuid_policy().)

Comment 9 Igor Mammedov 2011-10-10 12:50:52 UTC
*** Bug 711070 has been marked as a duplicate of this bug. ***

Comment 10 Igor Mammedov 2011-10-18 13:11:57 UTC
Created attachment 528802 [details]
Do not expose X86_FEATURE_POPCNT feature to avoid crash on migration to a host that doesn't have it

FC16 HVM will crash after migration with invalid op if it was started on host with X86_FEATURE_POPCNT feature but have been migrated to a host without it.

Attached patch, applied on top of white-listing-V2, fixes this issue.

Comment 14 Jarod Wilson 2011-10-27 13:09:32 UTC
Patch(es) available in kernel-2.6.18-294.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.

Comment 16 Andrew Jones 2011-11-06 10:27:34 UTC
*** Bug 711322 has been marked as a duplicate of this bug. ***

Comment 22 Qin Guan 2012-01-18 08:24:06 UTC
Testing of this problem is covered by running acceptance/functional test with
several Snapshot builds (from Snapshot1 to Snapshot4) on different CPU models. 

No any problem found during the testing, marked it as Verified:SanityOnly.

Comment 24 errata-xmlrpc 2012-02-21 03:26:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0150.html