Bug 471060

Summary: PAE enabled kernels can be installed on non-PAE enabled hardware
Product: Red Hat Enterprise Linux 5 Reporter: Neil Horman <nhorman>
Component: kernelAssignee: Don Zickus <dzickus>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.2CC: dzickus, jtluka, lwang, mikeda, syeghiay
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-08-03 19:25:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 533192    
Attachments:
Description Flags
sample patch to test for pae availability none

Description Neil Horman 2008-11-11 16:10:11 UTC
Description of problem:
PAE enabled kernels (kernel-PAE and kernel-xen) install on non-PAE enabled hardware.  Since PAE enabled kenrnels can't run on non-PAE hardware we are guaranteed that these kernels will panic consistently once they are rolled out

Version-Release number of selected component (if applicable):
all i686 PAE enabled kernels (kernel-PAE-<version>-<release>.i686

How reproducible:
always

Steps to Reproduce:
1.install kernel-xen or kernel-PAE-2.6.18-*.i686 on a non-PAE system (I used a thinkpad T42)

2. reboot into above kernel
3.
  
Actual results:
panic during boot

Expected results:
failed install indicating hardware incompatibility

Additional info:

Comment 1 Neil Horman 2008-11-11 16:15:50 UTC
Created attachment 323191 [details]
sample patch to test for pae availability

I've not tested it, but I think something like this would solve the probelm pretty easily.

Comment 2 Neil Horman 2008-11-11 20:27:03 UTC
Jeremy, Don suggested that we ping you on this issue, in case the proposed patch above has any adverse effects on  PackageKit, etc. Thoughts/opinions welcome.  Thanks!

Comment 3 Jeremy Katz 2008-11-12 16:34:42 UTC
Something like this is going to be problematic in cases like the installer or a variety of other places.  There's no guarantee that the environment in which the package is being put on disk has /proc mounted or is even the system that will be being later used; this is much more the case as we start doing things like appliances.

Also, errors like this are never going to be seen by the user if htey're using a graphical app and so it's just going to appear that yum/PackageKit/whatever didn't do what they said, leading to other bug reports.

Comment 4 Don Zickus 2008-11-12 16:59:54 UTC
Jeremy,

Do you have another way of doing this?  The kernel is slowly getting pressure for us to be able to block things like this (and the whole x86 on x86_64 distros too).  We understand the concerns of the GUI package apps, which is why we were soliciting feedback from you on this approach.  If there is another way to do so, I don't think we will have any issues with it (assuming it doesn't include patching the kernel binary :) ).

Comment 5 Jeremy Katz 2008-11-12 17:36:56 UTC
There's no real way of doing it sanely in RPM right now.  

It's what the idea of rpmarch is supposed to handle (since you can't install for a "wrong" arch unless you do so with --ignorearch), but proliferation of arches for things which really aren't architectures is unlikely to happen with upstream rpm

Comment 6 Neil Horman 2008-11-12 18:46:04 UTC
I understand your point, but I think saying that theres no sane way to do this is overstating things.  All we really need to do is check (in those instances for which we are able) if a new kernel will boot or not on a given piece of hardware.    Some cases we won't be able to do that, like if we're building an nfs root filesystem, or if we're building an appliance (although I've yet to see an appliance that doesn't mount proc).  Perhaps all we need to do is enhance the above check to determine if the rpm is being installed to a location that the local system will use to find bootable kernels.  If we're installing there, then preform the check for pae.  If not, let it pass.  It lets the non-local boot cases slide, but there really is no solution (in rpm or otherwise )that can guarantee remote hardware is compatible with a local kernel.

As for making the gui clients play nice with this, that seems like a separate problem.  Generally speaking it seems to me that if we have an error during the installation of the transaction set, we should develop a mechanism whereby any error messages that the rpms being installed report get propogated back up.  Perhaps a redirection of the rpm output to an error log for display by the gui in the event that something goes awry?

Comment 7 Don Zickus 2008-11-18 20:20:35 UTC
Jeremy,

Alright, so here I am in this weird situation where I read bug report after bug report from the QE team and developers about how certain kernels hang on booting. After spending a week or so investigating, we learn either the developer or the QE tester put some sort of incorrect combination of i686/i686PAE/i686xen/x86_64/x86_64xen on their i686/x86_64 distro.

It's getting to the point we are spinning our wheels here.  As much as a I respect Jeremy's opinion of the other side of the coin, I am going to argue that those problems aren't as painful as we might think and don't outweigh the cost of not putting code like Neil's patch in the kernel spec file.

* the GUI tools - IMHO the GUI tools are smart enough to prevent the users from having the wrong choices to begin with and thus the end user will never get himself into a situation where they are trying to install a PAE kernel on a non-PAE box.  So far the majority of the people who get into trouble are the ones who use rpm manually.  So I am going to consider that argument a non-issue for now.

* /proc may not be mounted - well looking at the kernel-2.6.spec file, I have noticed that the %post section contains references to the /proc fs and uname.  Considering I haven't seen any bzs complaining about them, I am going to assume we haven't run into any 'special' install where not having /proc mounted caused issues.  Now granted in the spec file the code was surrounded with 'test -e /proc/foo' and I think Neil's patch can be reworked to contain the same for safety.

* appliances - this one I don't have a good argument for except for how does new-kernel-pkg get away with it?  I assume appliances have to be created in a chroot'd environment on the same arch for which they will be used (otherwise the kernel and it's use of uname will create other havoc).

Like I said earlier, despite rpm's limitations, I need to find some way to prevent overworked QE testers and developers from installing the wrong kernel on the wrong distro.  The bzs are piling up about one every week now.  As I stated above I can't really see Neil's modified patch (and others like it) from causing the issues above in practice, so I would like to move ahead with it for 5.4.  

Thoughts?

Don

Comment 9 RHEL Program Management 2009-04-07 20:29:06 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Mike Gahagan 2009-04-07 20:59:02 UTC
When I hit this a few months ago, I seemed to recall that even not all T42's behaved the same way. Some panic'ed or otherwise halted like they should, others would go into an infinite reboot loop, so even if we do manage to fix it in rpm, we still might want to release note it since there is no guarantee how the hardware will behave if someone manages to get the kernel installed.

Comment 12 RHEL Program Management 2009-11-25 23:30:21 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 14 Chris Lumens 2010-02-25 15:24:48 UTC
*** Bug 568083 has been marked as a duplicate of this bug. ***

Comment 15 Don Zickus 2010-08-03 19:25:45 UTC
With newer machines all supporting PAE and older machines slowly collecting dust, this issue is starting to fade away into a dark corner.

Considering there is no easy way to solve this and less people have complained about it, I am going to close this as WONTFIX.

If people are still being bitten by this, please re-open and I will try to re-investigate a better solution.

Cheers,
Don