Red Hat Bugzilla – Bug 118564
[PATCH] RHEL3 cannot boot on 8-way Opteron systems
Last modified: 2013-08-05 21:04:55 EDT
Description of problem: /usr/src/linux/arch/x86_64/mm/k8topology.c uses the wrong mask to get the number of nodes when booting on an 8 way system. It will crash if loaded on such a system. Version-Release number of selected component (if applicable): 2.4.21-11.EL How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Simple, 2 change patch to fix the problem attached.
Created attachment 98611 [details] patch to correctly read processor masks for 8p system This patch has been tested by AMD and verified to work.
Patch submitted, should be included in U3.
Jim's patch has been posted to our internal review list. When it is finally checked in to our CVS patch pool (for U3), I'll change the status of this bug back to "modified" to confirm this.
Further testing found an additional bug in weird memory situations (ie, some but not all processors have memory). Patch below: diff -u linux/arch/x86_64/mm/k8topology.c-o linux/arch/x86_64/mm/k8topology.c --- linux/arch/x86_64/mm/k8topology.c-o 2004-01-29 16:15:07.000000000 +0100 +++ linux/arch/x86_64/mm/k8topology.c 2004-04-10 01:20:41.000000000 +0200 @@ -196,7 +196,7 @@ continue; if ((nodes_present >> rr) == 0) rr = 0; - rr = ffz(~nodes_present >> rr); + rr += ffz(~nodes_present >> rr); PLAT_NODE_DATA(i) = PLAT_NODE_DATA(rr); rr++; }
The patch in comment #1 has just been committed to the RHEL3 U3 patch pool (in kernel version 2.4.21-15.2.EL). Jim, could you please validate/test/post the additional patch in comment #4? If/when that is committed to U3, I'll change the state of this Bugzilla report to "modified". Thanks. -ernie
Mark, the patch in comment #4 doesn't apply to the current (in-progress) RHEL3 U3 patch pool nor to the (released) RHEL3 U2 source tree. I'm not sure where it's from, but let me just ask: has the original problem described in this report been resolved for you? Thanks. -ernie
Mark - Have you been able to try a recent RHEL3 kernel on an 8-way Opteron to see if this issue is indeed resolved for you? --jim
Sorry for the delay in answering. We've only got a few of the relevant systems so scheduling tests takes time. I tried 2.4.21-15.0.3 (RHEL3-U2) with the following grub.conf boot line and the system failed to load the kernel: title Red Hat Enterprise Linux AS (2.4.21-15.0.3.ELsmp) root (hd0,0) kernel /vmlinuz-2.4.21-15.0.3.ELsmp ro root=LABEL=/ apm=power-off hdc=ide-scsi noexec=off initrd /initrd-2.4.21-15.0.3.ELsmp.img Adding numa=off allowed the system to boot into run level 3 as expected. I can retest on RHEL3 U3 later this week, but we're still getting back from OLS.
2.4.21-18 (U3 beta kernel) works on the 8 way systems we have available, with both NUMA enabled and disabled. There is a 60% performance boost with NUMA enabled on the 8 way systems. Thanks for resolving this issue.
I'm reverting this to MODIFIED state. The Errata System will automatically change the state to CLOSED/ERRATA when U3 is released (most likely tomorrow).
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-433.html