Bug 198657 - RHEL4-U4-B2:Kernel Panic while boot-up during installation on PE6800.
Summary: RHEL4-U4-B2:Kernel Panic while boot-up during installation on PE6800.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Jason Baron
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-07-12 15:31 UTC by Raghavendra Biligiri
Modified: 2018-10-19 19:06 UTC (History)
9 users (show)

Fixed In Version: RHBA-2007-0304
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-08 02:37:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Serial console output of Kernel panic (6.52 KB, text/plain)
2006-07-12 15:38 UTC, Raghavendra Biligiri
no flags Details
Serial console output of kernel panic on kernel-2.6.9-42.7 (6.32 KB, text/plain)
2006-09-07 15:23 UTC, Raghavendra Biligiri
no flags Details
apic patch (5.64 KB, patch)
2006-10-17 21:26 UTC, Jason Baron
no flags Details | Diff
patch that resolves this issue (886 bytes, patch)
2006-10-20 18:58 UTC, Jason Baron
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0304 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 4 Update 5 2007-04-28 18:58:50 UTC

Description Raghavendra Biligiri 2006-07-12 15:31:40 UTC
Description of problem:
When we do PXE-install of RHEL4-U4-B2 x86_64(kernel 2.6.9-40) on PE6800 (with 
2GB memory),during the boot-up the kernel panics.

Version-Release number of selected component (if applicable):
kernel 2.6.9-40

How reproducible:
Everytime

Steps to Reproduce:
1.Install RHEL4-U4-B2(kernel 2.6.9-40)
2.When trying to boot-up the kernel panics
3.
  
Actual results:
Kernel fails to boot up properly and kernel panic observed.

Expected results:
Kernel should boot up fine and installed successfully.

Additional info:
1. Tried passing the noapic parameter during boot-up,but still there is kernel 
panic.
2. RHEL4-U3 on the same machine boots up fine.

Comment 1 Raghavendra Biligiri 2006-07-12 15:38:57 UTC
Created attachment 132313 [details]
Serial console output of Kernel panic

Comment 2 Jason Baron 2006-07-12 18:39:31 UTC
Can you please try 'nolapic' too?

Comment 3 Jeff Burke 2006-07-12 19:54:53 UTC
We don't have this hardware in house. Can you please tell us which RHEL4-U4
kernel worked last, if any. Once we have that data we can try and narrow down
when this issue was introduced.

Comment 5 Jim Paradis 2006-07-12 22:23:49 UTC
This crash happens because phys_cpu_present_map does not include an entry for
the APIC ID of the CPU that's running at boot.  Access to hardware will make
this easier to track down.


Comment 7 Jim Paradis 2006-07-13 01:14:25 UTC
We located the PE6800 in Westford, but apparently it installed RHEL4U4 just
fine.  I believe I know what's going on, though:

The console output in Comment 1 indicates that the system on which this problem
occurs (a) has only 4 CPU cores while our system has 16, and (b) does not
enumerate them from 0 (it has 8/9/14/15).

What the console dump does not show is which of these CPUs is flagged by the
ACPI tables as the boot CPU.  It turns out that if the first CPU detected by
Linux (i.e. the first one listed in the MADT) is *not* flagged as the boot CPU,
we could have trouble booting the install kernel.  The reason is that the
install kernel is limited to 1 CPU (NR_CPUS=1).  This means that it will only
use the first CPU it finds in the MADT and will scan (but not use) the others. 
The kernel sets the "boot_cpu_id" variable based on the CPU_BOOTPROCESSOR flag
in the MADT, though.  This means that it's possible for the system to be running
on a CPU other than the boot CPU at boot time.  If only one CPU is allowed, then
the bit for "boot_cpu_id" will not be found in phys_cpu_present_map.  I think
this is why we're tripping over this bug.

Questions for Dell: 

(1) under what circumstances would a PE6800 start enumerating CPUs from other
than 0, and is this a supported configuration?  

(2) Under what circumstances would a CPU other than the first one in the ACPI
MADT table be flagged as the boot processor?

(3) how common and/or likely are (1) and (2) under normal operating conditions?



Comment 8 Jeff Burke 2006-07-13 01:32:49 UTC
Jim,
 Sorry for the mis information in comment #3 for some reason it is not in the
inventory database. Both Jason and I looked. I will ask Matt about that. Thanks Jeff

Comment 9 Jeff Burke 2006-07-13 01:41:30 UTC
Ok it actually is in the database but it is different then 99% of the other Dell
systems. Using the search in the inventory database I looked for:
dmi rh_bios_vendor = Dell Computer Corporation
or 
dmi rh_bios_vendor = Dell inc.
That query picks up 99% of the systems. For some reason the dmi data for this
system is just Dell 

Thanks,
Jeff

Comment 10 Raghavendra Biligiri 2006-07-13 09:22:23 UTC
When we pass both "noapic" and "nolapic" parameters kernel boots-up fine.


Comment 11 Raghavendra Biligiri 2006-07-13 10:17:22 UTC
Passing only "noapic" or only "nolapic" does not solve the problem.




Comment 12 Raghavendra Biligiri 2006-07-13 10:42:18 UTC
Issue is seen even on kernel-2.6.9-39.

Comment 13 Raghavendra Biligiri 2006-07-13 13:40:08 UTC
When we pass both "noapic" and "nolapic" parameters kernel boots-up fine and 
the installation is completed successfully.
After the installation when we check /proc/cmdline only noapic is specified.
When we removed noapic parameter and rebooted the system,the system comes up 
fine.
So basically we have a problem with the boot kernel. But the normal kernel 
boots up fine.

Comment 16 Charles Rose 2006-07-25 03:53:23 UTC
Does Red hat have plans to document this issue as a KB article?

Comment 19 RHEL Program Management 2006-08-16 19:40:47 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this enhancement by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This enhancement is not yet committed for inclusion in an Update
release.

Comment 22 Charles Rose 2006-09-05 15:37:16 UTC
Does Red Hat have a fix for this issue? Can we have a commitment for RHEL 4.5?

Comment 23 Jason Baron 2006-09-05 15:57:48 UTC
committed in stream U5 build 42.6. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 24 Raghavendra Biligiri 2006-09-07 15:16:43 UTC
Issue is reproducible with test kernel-2.6.9-42.7.EL.x86_64.
When we try to do PXE install of kernel-2.6.9-42.7.EL x86_64,the kernel panics.
I have attached the serial console output.

Comment 25 Raghavendra Biligiri 2006-09-07 15:23:53 UTC
Created attachment 135778 [details]
Serial console output of kernel panic on kernel-2.6.9-42.7

Comment 30 Jim Paradis 2006-09-13 19:39:11 UTC
Question for Dell:

Can you reproduce this problem by installing RHEL4-U3 on the system, then
installing the UP (Non-SMP) RHEL4-U4 kernel and booting that?  I'd like to see
if we can reproduce this on something other than the bootstrap kernel.

If you can reproduce it this way I may have a kernel for you to test.


Comment 31 Larry Troan 2006-09-14 12:32:07 UTC
Per comment #30, changing status to NEEDINFO.

Comment 32 Raghavendra Biligiri 2006-09-14 15:30:05 UTC
After installing RHEL4-U3(2.6.9-34) on PE6800, installed RHEL4-U4(2.6.9-42.EL) 
UP(Non-SMP) and then booting into RHEL4-U4 kernel results in Kernel Panic.

Comment 33 Jim Paradis 2006-09-14 15:36:09 UTC
Re Comment #32: Thank you for testing this.  I will shortly provide a test
kernel that you can try out on this system to see if it fixes the problem.


Comment 34 Jim Paradis 2006-09-15 23:24:54 UTC
Try the kernel at http://people.redhat.com/jparadis/bz198657 and let me know if
it boots successfully on this hardware...


Comment 35 Raghavendra Biligiri 2006-09-18 11:38:40 UTC
(In reply to comment #34)
> Try the kernel at http://people.redhat.com/jparadis/bz198657 and let me know 
if
> it boots successfully on this hardware...

Issue is reproducible with the test kernel provided in comment #34.
Installed the test kernel(kernel-2.6.9-42.10.bz198657.EL) and then on booting 
into the test kernel results in kernel panic.

Comment 37 Jim Paradis 2006-10-05 22:44:17 UTC
I have been trying to reproduce this issue in-house.  The problem I have is that
the crash shows that the only CPUs online are APIC IDs 8, 14, 9, and 15.  My
system shows 16 logical processors: 0-15.  I tried pulling Processors 1 and 2
from my box to get a similar configuration (one where the boot processor is not
APIC ID 0), but now it just won't come up at all.  What am I supposed to do to
replicate the originator's configuration?


Comment 38 Raghavendra Biligiri 2006-10-07 11:05:15 UTC
I am able to reproduce this issue on PE6800 and PE6850 with only 2 CPU's.
This issue is reproducible on both single-core and dual-core processor machines.
This issue is reproducible when we have only 2 CPU's on a 4 CPU machine.If we 
have all the 4 CPU's on the machine then this issue is not seen.
Reply to comment #37 : If processors 3 and 4 are removed from a 4-CPU machine 
then the issue can be reproduced.

Comment 39 Robert Hentosh 2006-10-12 19:53:43 UTC
If you are running a 2 processor system, then the sytem must have processors 3
and 4 removed.  (Only 1,2 an 4 processor configurations are supported) What BIOS
versions are each of the systems at in Red Hat and in Dell? BIOS A04 is the
latest available (it can be found at support.dell.com)


Comment 40 Jeff Burke 2006-10-12 20:25:33 UTC
Our current Bios version is A02 I am pulling down the latest version from Dells
website now.

Comment 41 Jeff Burke 2006-10-12 23:16:33 UTC
Ok after reconfiguring the system and putting the latest bios on the system we
are able to reproduce this error in house. With the pe6800 located in the
Westford office. Looking at the panic that was posted in this bz and the one I
now have they are identical.

Jeff

Comment 42 Jason Baron 2006-10-17 21:26:13 UTC
Created attachment 138731 [details]
apic patch

this is the patch causing this problem that was added in U4. it resolved bugs
176612 and 174627

Comment 43 Jason Baron 2006-10-20 18:58:52 UTC
Created attachment 139019 [details]
patch that resolves this issue

Comment 44 Jason Baron 2006-10-25 17:55:22 UTC
committed in stream U5 build 42.21. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 45 Raghavendra Biligiri 2006-10-30 12:09:45 UTC
Issue is fixed with test kernel(kernel-2.6.9-42.21.EL.x86_64.rpm).
Installed the test kernel on PE6800 and the kernel boots up fine.

Comment 46 RHEL Program Management 2006-12-12 17:13:02 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being marked as a blocker for this release.  

Please resolve ASAP.

Comment 47 RHEL Program Management 2006-12-12 17:13:03 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being marked as a blocker for this release.  

Please resolve ASAP.

Comment 48 Charles Rose 2007-01-11 14:20:38 UTC
We were told that the workaround for this issue would be documented on the RH
knowledgebase. We reviewed the content a few months ago but we do not see the KB
entry yet. We have seen many instances of this issue on Dell mailing lists.
Please have the KB entry asap. Else we will have to wait till April 2007 for
RHEL 4.5 GA.

Comment 49 Samuel Benjamin 2007-01-25 00:40:02 UTC
KB information : 
Why does a Dell PE6800 system encounter a kernel panic when doing a PXE-install
with Red Hat Enterprise Linux 4 Update 4 on an x86-64 kernel?

http://kbase.redhat.com/faq/FAQ_46_8755.shtm

Article ID: 8755
Last update: 12-19-06

Comment 52 Mike Gahagan 2007-03-20 18:44:42 UTC
Patch is in -50.EL and has been verified by two partners.

Comment 54 Red Hat Bugzilla 2007-05-08 02:37:24 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html


Note You need to log in before you can comment on or make changes to this bug.