Bug 118564 - [PATCH] RHEL3 cannot boot on 8-way Opteron systems
[PATCH] RHEL3 cannot boot on 8-way Opteron systems
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Jim Paradis
Depends On:
  Show dependency treegraph
Reported: 2004-03-17 14:57 EST by Mark Langsdorf
Modified: 2013-08-05 21:04 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-09-02 00:31:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
patch to correctly read processor masks for 8p system (685 bytes, patch)
2004-03-17 14:59 EST, Mark Langsdorf
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:433 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 3 Update 3 2004-09-02 00:00:00 EDT

  None (edit)
Description Mark Langsdorf 2004-03-17 14:57:21 EST
Description of problem:
/usr/src/linux/arch/x86_64/mm/k8topology.c uses the wrong mask to get 
the number of nodes when booting on an 8 way system.  It will crash 
if loaded on such a system.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Actual results:

Expected results:

Additional info:

Simple, 2 change patch to fix the problem attached.
Comment 1 Mark Langsdorf 2004-03-17 14:59:48 EST
Created attachment 98611 [details]
patch to correctly read processor masks for 8p system

This patch has been tested by AMD and verified to work.
Comment 2 Jim Paradis 2004-03-31 17:35:47 EST
Patch submitted, should be included in U3.
Comment 3 Ernie Petrides 2004-03-31 18:24:38 EST
Jim's patch has been posted to our internal review list.  When it
is finally checked in to our CVS patch pool (for U3), I'll change
the status of this bug back to "modified" to confirm this.
Comment 4 Mark Langsdorf 2004-04-14 15:12:34 EDT
Further testing found an additional bug in weird memory situations 
(ie, some but not all processors have memory).  Patch below:

diff -u linux/arch/x86_64/mm/k8topology.c-o 
--- linux/arch/x86_64/mm/k8topology.c-o	2004-01-29 16:15:07.000000000 
+++ linux/arch/x86_64/mm/k8topology.c	2004-04-10 01:20:41.000000000 
@@ -196,7 +196,7 @@
 		if ((nodes_present >> rr) == 0) 
 			rr = 0; 
-		rr = ffz(~nodes_present >> rr); 
+		rr += ffz(~nodes_present >> rr);
Comment 5 Ernie Petrides 2004-05-04 19:09:57 EDT
The patch in comment #1 has just been committed to the RHEL3 U3
patch pool (in kernel version 2.4.21-15.2.EL).

Jim, could you please validate/test/post the additional patch in
comment #4?  If/when that is committed to U3, I'll change the state
of this Bugzilla report to "modified".

Thanks.  -ernie
Comment 6 Ernie Petrides 2004-06-09 17:20:08 EDT
Mark, the patch in comment #4 doesn't apply to the current
(in-progress) RHEL3 U3 patch pool nor to the (released) RHEL3
U2 source tree.  I'm not sure where it's from, but let me just
ask: has the original problem described in this report been
resolved for you?

Thanks.  -ernie
Comment 7 Jim Paradis 2004-07-06 16:33:01 EDT
Mark - Have you been able to try a recent RHEL3 kernel on an 8-way
Opteron to see if this issue is indeed resolved for you?

Comment 8 Mark Langsdorf 2004-07-26 14:50:39 EDT
Sorry for the delay in answering.  We've only got a few of the
relevant systems so scheduling tests takes time.

I tried 2.4.21-15.0.3 (RHEL3-U2) with the following grub.conf boot 
line and the system failed to load the kernel:

title Red Hat Enterprise Linux AS (2.4.21-15.0.3.ELsmp)
         root (hd0,0)
         kernel /vmlinuz-2.4.21-15.0.3.ELsmp ro root=LABEL=/ 
apm=power-off hdc=ide-scsi noexec=off
         initrd /initrd-2.4.21-15.0.3.ELsmp.img

Adding numa=off allowed the system to boot into run level 3 as 

I can retest on RHEL3 U3 later this week, but we're still getting 
back from OLS.
Comment 9 Mark Langsdorf 2004-08-16 12:00:09 EDT
2.4.21-18 (U3 beta kernel) works on the 8 way systems we have 
available, with both NUMA enabled and disabled.  There is a 60% 
performance boost with NUMA enabled on the 8 way systems.

Thanks for resolving this issue.
Comment 10 Ernie Petrides 2004-09-01 16:03:07 EDT
I'm reverting this to MODIFIED state.  The Errata System will
automatically change the state to CLOSED/ERRATA when U3 is
released (most likely tomorrow).
Comment 11 John Flanagan 2004-09-02 00:31:10 EDT
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.