Bug 468083 - kernel-xen doesn't boot on Dell Optiplex GX280
kernel-xen doesn't boot on Dell Optiplex GX280
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.3
All Linux
urgent Severity high
: rc
: ---
Assigned To: Chris Lalancette
Martin Jenner
: ZStream
: 469237 470535 (view as bug list)
Depends On:
Blocks: RHEL5u3_relnotes 470040
  Show dependency treegraph
 
Reported: 2008-10-22 13:40 EDT by Jakub Hrozek
Modified: 2011-01-24 18:26 EST (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
In this beta release, the virtualized kernel may not boot on some older x86 systems that lack the EFER MSR. Refer to Red Hat Bugzilla #468083 for more information on this issue.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 14:35:20 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
grub config file (1.17 KB, text/plain)
2008-10-23 05:05 EDT, Jakub Hrozek
no flags Details
dmidecode output (17.54 KB, text/plain)
2008-10-23 05:07 EDT, Jakub Hrozek
no flags Details
/proc/cpuinfo output (1.04 KB, text/plain)
2008-10-23 05:09 EDT, Jakub Hrozek
no flags Details
Patch that will probably fix this issue (3.23 KB, patch)
2008-10-24 07:42 EDT, Chris Lalancette
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
CentOS 3230 None None None Never

  None (edit)
Description Jakub Hrozek 2008-10-22 13:40:26 EDT
Description of problem:
After upgrade to latest 5.3 kernel-xen, the machine doesn't boot and hangs immediately after selecting the kernel from grub prompt. No messages are printed (without the rhgb quiet options), just black screen and hang.

Version-Release number of selected component (if applicable):
Reproduced with kernel-xen-2.6.18-120.el5 and -118. I might have time to bisect the build that caused this tomorrow.

How reproducible:
always

Steps to Reproduce:
1. install a recent 5.3 kernel
2. boot

  
Actual results:
hang 

Expected results:
clean boot

Additional info:
Regular non-xen kernel works fine on the same machine (-120). kernel-xen that was shipped in 5.2 (-92) also works OK.
Comment 1 Bill Burns 2008-10-22 20:50:29 EDT
What kind of system is this happening on? Architecture, how many CPUs, how much memory, etc? Can you provide the grub file? Thanks.
Comment 2 Bill Burns 2008-10-22 20:52:07 EDT
Ahh, system type in the subject line...still please provide the other info.
Comment 3 Jakub Hrozek 2008-10-23 05:05:16 EDT
Created attachment 321258 [details]
grub config file
Comment 4 Jakub Hrozek 2008-10-23 05:07:40 EDT
Created attachment 321259 [details]
dmidecode output

The system is x86 (32 bit), 1 CPU, 2GB of memory. Attaching output of dmidecode.
Comment 5 Jakub Hrozek 2008-10-23 05:09:34 EDT
Created attachment 321260 [details]
/proc/cpuinfo output
Comment 6 Bill Burns 2008-10-23 07:46:53 EDT
Any chance you can capture serial console? (add "com1=115200,8n1 console=com1"
to the hv line in grub and  "console=ttyS0,115200,n8" to the kernel line.
You have not put on an x86_64 kernel or something by mistake have you?
Comment 7 Jakub Hrozek 2008-10-23 10:37:07 EDT
(In reply to comment #6)
> Any chance you can capture serial console? (add "com1=115200,8n1 console=com1"
> to the hv line in grub and  "console=ttyS0,115200,n8" to the kernel line.

OK, I'll try. 

> You have not put on an x86_64 kernel or something by mistake have you?

No, everything is i686:
# rpm -q kernel-xen --queryformat '%{name}-%{version}-%{release}.%{arch}\n'
kernel-xen-2.6.18-118.el5.i686
kernel-xen-2.6.18-120.el5.i686
kernel-xen-2.6.18-92.el5.i686
Comment 8 Chris Lalancette 2008-10-23 11:05:08 EDT
I mentioned to Jakub in IRC that we haven't done a whole lot of mucking with early initialization code between 5.2 and 5.3, so this is a little surprising.  However, looking briefly through the kernel changelog, the two biggest possibilities seem to be:

1)  EPT/2MB stuff
2)  GDT expansion stuff

I've asked Jakub to try -111 (which is right before the EPT stuff went in) to see if that did it.  I also asked him to try -106, which is right before the GDT changes went in.  If neither of those work, then we'll have to do a full bisection.  I'll leave this in needinfo until jakub gets a chance to do the test.

Chris Lalancette
Comment 9 Jakub Hrozek 2008-10-23 12:32:17 EDT
So I ended up doing almost full bisection and oddly enough, the breakage appears to be between -115 and -116. IOW, kernel-xen-2.6.18-115.el5 boots, kernel-xen-2.6.18-116.el5 does not boot.
Comment 10 Chris Lalancette 2008-10-23 16:19:20 EDT
Ug.  OK, well, that really only leaves a single hypervisor patch, which is one of mine.  So it must be that patch, although I have a hard time seeing how it could cause a problem that early in boot.  I'll have to get on there at some point (or find another similar machine) and try some things.

Chris Lalancette
Comment 11 Chris Lalancette 2008-10-24 05:38:45 EDT
Arg!  I think I see the problem in the patch.  I'll spin a test kernel with a fix for you to try.

Chris Lalancette
Comment 12 RHEL Product and Program Management 2008-10-24 06:19:13 EDT
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.
Comment 14 Chris Lalancette 2008-10-24 07:42:03 EDT
Created attachment 321397 [details]
Patch that will probably fix this issue

OK, I missed something when I did the backport of the CR4 TSC hiding patches.  One of the things the CR4 TSC patches do is to add a read of the EFER MSR in the boot path for the boot processor and all other processors.  Unfortunately, older processors (like the P-4) do not have the EFER MSR, so they are basically generating a fault very early on in boot.  Upstream c/s 16378 addresses this by checking for the existence of the EFER before accessing it.  The attached patch is a backport of this, and should solve the problem.  I'm building a test kernel now with this patch; I'll give download details once it is done building.
Comment 15 Chris Lalancette 2008-10-24 10:08:18 EDT
OK.  I've built a kernel with the patch.  It's available here:

http://people.redhat.com/clalance/bz468083/

Please download it and give it a try, and report back the results.  I need to have testing results by early next week to make sure we can get this in as soon as possible.

Thanks!
Chris Lalancette
Comment 16 Jakub Hrozek 2008-10-27 06:36:07 EDT
(In reply to comment #15)
> OK.  I've built a kernel with the patch.  It's available here:
> 
> http://people.redhat.com/clalance/bz468083/
> 
> Please download it and give it a try, and report back the results.  I need to
> have testing results by early next week to make sure we can get this in as soon
> as possible.
>

Seems like it does the trick - boots and runs just fine! Thanks, Chris!
Comment 17 Chris Lalancette 2008-10-27 06:39:21 EDT
OK, thanks for the bug report and the testing.  I'll get this queued up for 5.3

Chris Lalancette
Comment 18 Bill Burns 2008-10-27 09:50:24 EDT
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The Xen kernel will not boot on some older i686 systems that lack the EFER MSR. This issue will be fixed in a beta snapshot.
Comment 21 Ryan Lerch 2008-10-28 01:10:52 EDT
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1 @@
-The Xen kernel will not boot on some older i686 systems that lack the EFER MSR. This issue will be fixed in a beta snapshot.+In this beta release, the virtualized kernel may not boot on some older x86 systems that lack the EFER MSR. Refer to Red Hat Bugzilla #468083 for more on this issue.
Comment 22 Ryan Lerch 2008-10-28 01:11:49 EDT
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1 @@
-In this beta release, the virtualized kernel may not boot on some older x86 systems that lack the EFER MSR. Refer to Red Hat Bugzilla #468083 for more on this issue.+In this beta release, the virtualized kernel may not boot on some older x86 systems that lack the EFER MSR. Refer to Red Hat Bugzilla #468083 for more information on this issue.
Comment 23 Chris Lalancette 2008-10-31 09:32:47 EDT
*** Bug 469237 has been marked as a duplicate of this bug. ***
Comment 24 Don Zickus 2008-11-04 11:51:13 EST
in kernel-2.6.18-122.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 28 Bill Burns 2008-11-07 11:59:13 EST
*** Bug 470535 has been marked as a duplicate of this bug. ***
Comment 29 Andreas Thienemann 2008-11-09 18:57:04 EST
This bug also applies to 2.6.18-92.1.17.el5xen which is being pushed as a current RH5.2 update.
Please take note that this is hitting production systems as well and not just lab systems using the beta.
Comment 30 Alexander Lindqvist 2008-11-12 03:00:36 EST
(In reply to comment #29)
> This bug also applies to 2.6.18-92.1.17.el5xen which is being pushed as a
> current RH5.2 update.
> Please take note that this is hitting production systems as well and not just
> lab systems using the beta.

This is why I filed bug  470535 entered here (RHEL 5.2):

https://bugzilla.redhat.com/show_bug.cgi?id=470535

Can we get a confirmation on when we can expect a fixed xen kernel and that the problem is confirmed by Red Hat ?
Comment 32 Issue Tracker 2008-11-17 10:01:34 EST
------- Comment From santwana.samantray@in.ibm.com 2008-11-16 12:57
EDT-------
Hi,

I was able to boot successfully into a 32-bit machine installed with a Xen
Kernel,in RHEL5.3-Snap2.
[root@x360a ~]# uname -a
Linux x360a.in.ibm.com 2.6.18-122.el5xen #1 SMP Mon Nov 3 18:49:46 EST
2008 i686 i686 i386 GNU/Linux

Thanks,
Santwana


This event sent from IssueTracker by jkachuck 
 issue 234219
Comment 36 errata-xmlrpc 2009-01-20 14:35:20 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.