Description of problem: F11 Xen PV domU kernel crashes early on boot during cpu initialization with invalid opcode. Crash happens in xsave_cntxt_init() when xsetbv instruction is executed. Crash happens because of missing CPU xsave feature masking in Xen hypervisor. Version-Release number of selected component (if applicable): Dom0 is RHEL 5.4, 2.6.18-164.el5xen. How reproducible: Always, if you have the "correct" hardware. Steps to Reproduce: 1. Use virt-install (or virt-manager) to start F11 Xen PV domu installation 2. guest kernel crashes Actual results: guest kernel crashes early with invalid opcode. Expected results: guest kernel starts and works ok. Additional info: Analysis and the bugfix changeset number from Jeremy Fitzhardinge: https://www.redhat.com/archives/fedora-xen/2009-September/msg00112.html F11 guest kernel boot/crash logs: http://v6.fi/misc/f11_64_kernel.debug2.txt http://v6.fi/misc/f11_64_kernel.debug4.txt gdb analysis: http://v6.fi/misc/gdb_f11_64_kernel.debug.txt
Cool, thanks for the workaround in the domU kernel. Note that I have a patch pending for RHEL-5.5 to actually properly do the masking in the RHEL-5 dom0 kernel (https://bugzilla.redhat.com/show_bug.cgi?id=502826), so either way we should be fixed. Chris Lalancette
Chris: do you need testing for that -164.el5virttest17 kernel? Another solution from Jeremy here: https://www.redhat.com/archives/fedora-xen/2009-September/msg00114.html But I guess RHEL5 Xen doesn't support custom cpuid masking per domU.. would be nice to be able to do that aswell :)
(In reply to comment #2) > Chris: do you need testing for that -164.el5virttest17 kernel? Additional testing is welcome, especially since I haven't tested it specifically to mask out fxsave (although I did test it to properly mask out GBpages). > Another solution from Jeremy here: > https://www.redhat.com/archives/fedora-xen/2009-September/msg00114.html > > But I guess RHEL5 Xen doesn't support custom cpuid masking per domU.. would be > nice to be able to do that aswell :) Right. I'm not sure how invasive that is, given that RHEL-5 is getting a bit long in the tooth. Nevertheless, if you feel it is a worthwhile feature, open up a bug against the RHEL-5 xen package and we'll see what we can do. Chris Lalancette
It looks like -164.el5virttest17 doesn't help. F11 GA kernel still crashes on the same way as earlier. (early) Initializing CPU#0 (early) invalid opcode: 0000 [#1] (early) SMP (early)
Oh, yuck. I didn't port that part back. OK, I'll have to respin the patch for BZ 502826 with the NOXSAVE part backported. I'll keep you informed. Chris Lalancette
Pasi, OK, I've now uploaded a new RHEL-5 dom0 kernel that should properly mask xsave. You can get it from: http://people.redhat.com/clalance/virttest Please let me know if that works for you. Thanks, Chris Lalancette
Chris: -166.el5virttest18 fixed the problem! F11 GA PV domU boots/starts OK now. Thanks!
(In reply to comment #7) > Chris: -166.el5virttest18 fixed the problem! F11 GA PV domU boots/starts OK > now. Excellent, thanks for the testing. Chris Lalancette
Chris: I also opened a bug against rhel5 xen per your suggestion. https://bugzilla.redhat.com/show_bug.cgi?id=525873
Is this fix already included in the -170 kernel?
Ah, I forgot all about this. I'm not 100% certain which kernel it went into, but it definitely went into the -172 kernel available here: http://people.redhat.com/dzickus/el5/ I'm actually going to close this as a dup since the fix for this went in along with the rest of the fixes for BZ 502826. Thanks, Chris Lalancette *** This bug has been marked as a duplicate of bug 502826 ***