Bug 524719 - Xen hypervisor doesn't mask xsave feature from the guest; Fedora 11 PV domU kernel crashes
Xen hypervisor doesn't mask xsave feature from the guest; Fedora 11 PV domU k...
Status: CLOSED DUPLICATE of bug 502826
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen (Show other bugs)
5.4
All Linux
low Severity medium
: rc
: ---
Assigned To: Xen Maintainance List
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-09-21 17:04 EDT by Pasi Karkkainen
Modified: 2009-12-14 16:27 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-11-06 12:56:35 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Pasi Karkkainen 2009-09-21 17:04:47 EDT
Description of problem:
F11 Xen PV domU kernel crashes early on boot during cpu initialization with invalid opcode. Crash happens in xsave_cntxt_init() when xsetbv instruction is executed. 

Crash happens because of missing CPU xsave feature masking in Xen hypervisor.

Version-Release number of selected component (if applicable):
Dom0 is RHEL 5.4, 2.6.18-164.el5xen.

How reproducible:
Always, if you have the "correct" hardware.

Steps to Reproduce:
1. Use virt-install (or virt-manager) to start F11 Xen PV domu installation
2. guest kernel crashes
  
Actual results:
guest kernel crashes early with invalid opcode.

Expected results:
guest kernel starts and works ok.

Additional info:

Analysis and the bugfix changeset number from Jeremy Fitzhardinge:
https://www.redhat.com/archives/fedora-xen/2009-September/msg00112.html

F11 guest kernel boot/crash logs:
http://v6.fi/misc/f11_64_kernel.debug2.txt
http://v6.fi/misc/f11_64_kernel.debug4.txt

gdb analysis:
http://v6.fi/misc/gdb_f11_64_kernel.debug.txt
Comment 1 Chris Lalancette 2009-09-22 02:34:56 EDT
Cool, thanks for the workaround in the domU kernel.  Note that I have a patch pending for RHEL-5.5 to actually properly do the masking in the RHEL-5 dom0 kernel (https://bugzilla.redhat.com/show_bug.cgi?id=502826), so either way we should be fixed.

Chris Lalancette
Comment 2 Pasi Karkkainen 2009-09-22 07:56:15 EDT
Chris: do you need testing for that -164.el5virttest17 kernel?

Another solution from Jeremy here:
https://www.redhat.com/archives/fedora-xen/2009-September/msg00114.html

But I guess RHEL5 Xen doesn't support custom cpuid masking per domU.. would be nice to be able to do that aswell :)
Comment 3 Chris Lalancette 2009-09-22 08:04:38 EDT
(In reply to comment #2)
> Chris: do you need testing for that -164.el5virttest17 kernel?

Additional testing is welcome, especially since I haven't tested it specifically to mask out fxsave (although I did test it to properly mask out GBpages).

> Another solution from Jeremy here:
> https://www.redhat.com/archives/fedora-xen/2009-September/msg00114.html
> 
> But I guess RHEL5 Xen doesn't support custom cpuid masking per domU.. would be
> nice to be able to do that aswell :)  

Right.  I'm not sure how invasive that is, given that RHEL-5 is getting a bit long in the tooth.  Nevertheless, if you feel it is a worthwhile feature, open up a bug against the RHEL-5 xen package and we'll see what we can do.

Chris Lalancette
Comment 4 Pasi Karkkainen 2009-09-23 08:17:31 EDT
It looks like -164.el5virttest17 doesn't help. F11 GA kernel still crashes on the same way as earlier.

(early) Initializing CPU#0
(early) invalid opcode: 0000 [#1] (early) SMP (early)
Comment 5 Chris Lalancette 2009-09-23 08:46:19 EDT
Oh, yuck.  I didn't port that part back.  OK, I'll have to respin the patch for BZ 502826 with the NOXSAVE part backported.  I'll keep you informed.

Chris Lalancette
Comment 6 Chris Lalancette 2009-09-24 03:55:54 EDT
Pasi,
     OK, I've now uploaded a new RHEL-5 dom0 kernel that should properly mask xsave.  You can get it from:

http://people.redhat.com/clalance/virttest

Please let me know if that works for you.

Thanks,
Chris Lalancette
Comment 7 Pasi Karkkainen 2009-09-24 07:00:23 EDT
Chris: -166.el5virttest18 fixed the problem! F11 GA PV domU boots/starts OK now.

Thanks!
Comment 8 Chris Lalancette 2009-09-24 07:42:31 EDT
(In reply to comment #7)
> Chris: -166.el5virttest18 fixed the problem! F11 GA PV domU boots/starts OK
> now.

Excellent, thanks for the testing.

Chris Lalancette
Comment 9 Pasi Karkkainen 2009-09-26 10:20:45 EDT
Chris: I also opened a bug against rhel5 xen per your suggestion.

https://bugzilla.redhat.com/show_bug.cgi?id=525873
Comment 10 Pasi Karkkainen 2009-11-06 12:06:34 EST
Is this fix already included in the -170 kernel?
Comment 11 Chris Lalancette 2009-11-06 12:56:35 EST
Ah, I forgot all about this.  I'm not 100% certain which kernel it went into, but it definitely went into the -172 kernel available here:

http://people.redhat.com/dzickus/el5/

I'm actually going to close this as a dup since the fix for this went in along with the rest of the fixes for BZ 502826.

Thanks,
Chris Lalancette

*** This bug has been marked as a duplicate of bug 502826 ***

Note You need to log in before you can comment on or make changes to this bug.