Description of problem: When we do PXE-install of RHEL4-U4-B2 x86_64(kernel 2.6.9-40) on PE6800 (with 2GB memory),during the boot-up the kernel panics. Version-Release number of selected component (if applicable): kernel 2.6.9-40 How reproducible: Everytime Steps to Reproduce: 1.Install RHEL4-U4-B2(kernel 2.6.9-40) 2.When trying to boot-up the kernel panics 3. Actual results: Kernel fails to boot up properly and kernel panic observed. Expected results: Kernel should boot up fine and installed successfully. Additional info: 1. Tried passing the noapic parameter during boot-up,but still there is kernel panic. 2. RHEL4-U3 on the same machine boots up fine.
Created attachment 132313 [details] Serial console output of Kernel panic
Can you please try 'nolapic' too?
We don't have this hardware in house. Can you please tell us which RHEL4-U4 kernel worked last, if any. Once we have that data we can try and narrow down when this issue was introduced.
This crash happens because phys_cpu_present_map does not include an entry for the APIC ID of the CPU that's running at boot. Access to hardware will make this easier to track down.
We located the PE6800 in Westford, but apparently it installed RHEL4U4 just fine. I believe I know what's going on, though: The console output in Comment 1 indicates that the system on which this problem occurs (a) has only 4 CPU cores while our system has 16, and (b) does not enumerate them from 0 (it has 8/9/14/15). What the console dump does not show is which of these CPUs is flagged by the ACPI tables as the boot CPU. It turns out that if the first CPU detected by Linux (i.e. the first one listed in the MADT) is *not* flagged as the boot CPU, we could have trouble booting the install kernel. The reason is that the install kernel is limited to 1 CPU (NR_CPUS=1). This means that it will only use the first CPU it finds in the MADT and will scan (but not use) the others. The kernel sets the "boot_cpu_id" variable based on the CPU_BOOTPROCESSOR flag in the MADT, though. This means that it's possible for the system to be running on a CPU other than the boot CPU at boot time. If only one CPU is allowed, then the bit for "boot_cpu_id" will not be found in phys_cpu_present_map. I think this is why we're tripping over this bug. Questions for Dell: (1) under what circumstances would a PE6800 start enumerating CPUs from other than 0, and is this a supported configuration? (2) Under what circumstances would a CPU other than the first one in the ACPI MADT table be flagged as the boot processor? (3) how common and/or likely are (1) and (2) under normal operating conditions?
Jim, Sorry for the mis information in comment #3 for some reason it is not in the inventory database. Both Jason and I looked. I will ask Matt about that. Thanks Jeff
Ok it actually is in the database but it is different then 99% of the other Dell systems. Using the search in the inventory database I looked for: dmi rh_bios_vendor = Dell Computer Corporation or dmi rh_bios_vendor = Dell inc. That query picks up 99% of the systems. For some reason the dmi data for this system is just Dell Thanks, Jeff
When we pass both "noapic" and "nolapic" parameters kernel boots-up fine.
Passing only "noapic" or only "nolapic" does not solve the problem.
Issue is seen even on kernel-2.6.9-39.
When we pass both "noapic" and "nolapic" parameters kernel boots-up fine and the installation is completed successfully. After the installation when we check /proc/cmdline only noapic is specified. When we removed noapic parameter and rebooted the system,the system comes up fine. So basically we have a problem with the boot kernel. But the normal kernel boots up fine.
Does Red hat have plans to document this issue as a KB article?
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this enhancement by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This enhancement is not yet committed for inclusion in an Update release.
Does Red Hat have a fix for this issue? Can we have a commitment for RHEL 4.5?
committed in stream U5 build 42.6. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
Issue is reproducible with test kernel-2.6.9-42.7.EL.x86_64. When we try to do PXE install of kernel-2.6.9-42.7.EL x86_64,the kernel panics. I have attached the serial console output.
Created attachment 135778 [details] Serial console output of kernel panic on kernel-2.6.9-42.7
Question for Dell: Can you reproduce this problem by installing RHEL4-U3 on the system, then installing the UP (Non-SMP) RHEL4-U4 kernel and booting that? I'd like to see if we can reproduce this on something other than the bootstrap kernel. If you can reproduce it this way I may have a kernel for you to test.
Per comment #30, changing status to NEEDINFO.
After installing RHEL4-U3(2.6.9-34) on PE6800, installed RHEL4-U4(2.6.9-42.EL) UP(Non-SMP) and then booting into RHEL4-U4 kernel results in Kernel Panic.
Re Comment #32: Thank you for testing this. I will shortly provide a test kernel that you can try out on this system to see if it fixes the problem.
Try the kernel at http://people.redhat.com/jparadis/bz198657 and let me know if it boots successfully on this hardware...
(In reply to comment #34) > Try the kernel at http://people.redhat.com/jparadis/bz198657 and let me know if > it boots successfully on this hardware... Issue is reproducible with the test kernel provided in comment #34. Installed the test kernel(kernel-2.6.9-42.10.bz198657.EL) and then on booting into the test kernel results in kernel panic.
I have been trying to reproduce this issue in-house. The problem I have is that the crash shows that the only CPUs online are APIC IDs 8, 14, 9, and 15. My system shows 16 logical processors: 0-15. I tried pulling Processors 1 and 2 from my box to get a similar configuration (one where the boot processor is not APIC ID 0), but now it just won't come up at all. What am I supposed to do to replicate the originator's configuration?
I am able to reproduce this issue on PE6800 and PE6850 with only 2 CPU's. This issue is reproducible on both single-core and dual-core processor machines. This issue is reproducible when we have only 2 CPU's on a 4 CPU machine.If we have all the 4 CPU's on the machine then this issue is not seen. Reply to comment #37 : If processors 3 and 4 are removed from a 4-CPU machine then the issue can be reproduced.
If you are running a 2 processor system, then the sytem must have processors 3 and 4 removed. (Only 1,2 an 4 processor configurations are supported) What BIOS versions are each of the systems at in Red Hat and in Dell? BIOS A04 is the latest available (it can be found at support.dell.com)
Our current Bios version is A02 I am pulling down the latest version from Dells website now.
Ok after reconfiguring the system and putting the latest bios on the system we are able to reproduce this error in house. With the pe6800 located in the Westford office. Looking at the panic that was posted in this bz and the one I now have they are identical. Jeff
Created attachment 138731 [details] apic patch this is the patch causing this problem that was added in U4. it resolved bugs 176612 and 174627
Created attachment 139019 [details] patch that resolves this issue
committed in stream U5 build 42.21. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
Issue is fixed with test kernel(kernel-2.6.9-42.21.EL.x86_64.rpm). Installed the test kernel on PE6800 and the kernel boots up fine.
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being marked as a blocker for this release. Please resolve ASAP.
We were told that the workaround for this issue would be documented on the RH knowledgebase. We reviewed the content a few months ago but we do not see the KB entry yet. We have seen many instances of this issue on Dell mailing lists. Please have the KB entry asap. Else we will have to wait till April 2007 for RHEL 4.5 GA.
KB information : Why does a Dell PE6800 system encounter a kernel panic when doing a PXE-install with Red Hat Enterprise Linux 4 Update 4 on an x86-64 kernel? http://kbase.redhat.com/faq/FAQ_46_8755.shtm Article ID: 8755 Last update: 12-19-06
Patch is in -50.EL and has been verified by two partners.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html