Bug 524719

Summary: Xen hypervisor doesn't mask xsave feature from the guest; Fedora 11 PV domU kernel crashes
Product: Red Hat Enterprise Linux 5 Reporter: Pasi Karkkainen <pasik>
Component: xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: clalance, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-11-06 17:56:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pasi Karkkainen 2009-09-21 21:04:47 UTC
Description of problem:
F11 Xen PV domU kernel crashes early on boot during cpu initialization with invalid opcode. Crash happens in xsave_cntxt_init() when xsetbv instruction is executed. 

Crash happens because of missing CPU xsave feature masking in Xen hypervisor.

Version-Release number of selected component (if applicable):
Dom0 is RHEL 5.4, 2.6.18-164.el5xen.

How reproducible:
Always, if you have the "correct" hardware.

Steps to Reproduce:
1. Use virt-install (or virt-manager) to start F11 Xen PV domu installation
2. guest kernel crashes
  
Actual results:
guest kernel crashes early with invalid opcode.

Expected results:
guest kernel starts and works ok.

Additional info:

Analysis and the bugfix changeset number from Jeremy Fitzhardinge:
https://www.redhat.com/archives/fedora-xen/2009-September/msg00112.html

F11 guest kernel boot/crash logs:
http://v6.fi/misc/f11_64_kernel.debug2.txt
http://v6.fi/misc/f11_64_kernel.debug4.txt

gdb analysis:
http://v6.fi/misc/gdb_f11_64_kernel.debug.txt

Comment 1 Chris Lalancette 2009-09-22 06:34:56 UTC
Cool, thanks for the workaround in the domU kernel.  Note that I have a patch pending for RHEL-5.5 to actually properly do the masking in the RHEL-5 dom0 kernel (https://bugzilla.redhat.com/show_bug.cgi?id=502826), so either way we should be fixed.

Chris Lalancette

Comment 2 Pasi Karkkainen 2009-09-22 11:56:15 UTC
Chris: do you need testing for that -164.el5virttest17 kernel?

Another solution from Jeremy here:
https://www.redhat.com/archives/fedora-xen/2009-September/msg00114.html

But I guess RHEL5 Xen doesn't support custom cpuid masking per domU.. would be nice to be able to do that aswell :)

Comment 3 Chris Lalancette 2009-09-22 12:04:38 UTC
(In reply to comment #2)
> Chris: do you need testing for that -164.el5virttest17 kernel?

Additional testing is welcome, especially since I haven't tested it specifically to mask out fxsave (although I did test it to properly mask out GBpages).

> Another solution from Jeremy here:
> https://www.redhat.com/archives/fedora-xen/2009-September/msg00114.html
> 
> But I guess RHEL5 Xen doesn't support custom cpuid masking per domU.. would be
> nice to be able to do that aswell :)  

Right.  I'm not sure how invasive that is, given that RHEL-5 is getting a bit long in the tooth.  Nevertheless, if you feel it is a worthwhile feature, open up a bug against the RHEL-5 xen package and we'll see what we can do.

Chris Lalancette

Comment 4 Pasi Karkkainen 2009-09-23 12:17:31 UTC
It looks like -164.el5virttest17 doesn't help. F11 GA kernel still crashes on the same way as earlier.

(early) Initializing CPU#0
(early) invalid opcode: 0000 [#1] (early) SMP (early)

Comment 5 Chris Lalancette 2009-09-23 12:46:19 UTC
Oh, yuck.  I didn't port that part back.  OK, I'll have to respin the patch for BZ 502826 with the NOXSAVE part backported.  I'll keep you informed.

Chris Lalancette

Comment 6 Chris Lalancette 2009-09-24 07:55:54 UTC
Pasi,
     OK, I've now uploaded a new RHEL-5 dom0 kernel that should properly mask xsave.  You can get it from:

http://people.redhat.com/clalance/virttest

Please let me know if that works for you.

Thanks,
Chris Lalancette

Comment 7 Pasi Karkkainen 2009-09-24 11:00:23 UTC
Chris: -166.el5virttest18 fixed the problem! F11 GA PV domU boots/starts OK now.

Thanks!

Comment 8 Chris Lalancette 2009-09-24 11:42:31 UTC
(In reply to comment #7)
> Chris: -166.el5virttest18 fixed the problem! F11 GA PV domU boots/starts OK
> now.

Excellent, thanks for the testing.

Chris Lalancette

Comment 9 Pasi Karkkainen 2009-09-26 14:20:45 UTC
Chris: I also opened a bug against rhel5 xen per your suggestion.

https://bugzilla.redhat.com/show_bug.cgi?id=525873

Comment 10 Pasi Karkkainen 2009-11-06 17:06:34 UTC
Is this fix already included in the -170 kernel?

Comment 11 Chris Lalancette 2009-11-06 17:56:35 UTC
Ah, I forgot all about this.  I'm not 100% certain which kernel it went into, but it definitely went into the -172 kernel available here:

http://people.redhat.com/dzickus/el5/

I'm actually going to close this as a dup since the fix for this went in along with the rest of the fixes for BZ 502826.

Thanks,
Chris Lalancette

*** This bug has been marked as a duplicate of bug 502826 ***