Bug 118564

Summary: [PATCH] RHEL3 cannot boot on 8-way Opteron systems
Product: Red Hat Enterprise Linux 3 Reporter: Mark Langsdorf <mark.langsdorf>
Component: kernelAssignee: Jim Paradis <jparadis>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: peterm, petrides, riel
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-02 04:31:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch to correctly read processor masks for 8p system none

Description Mark Langsdorf 2004-03-17 19:57:21 UTC
Description of problem:
/usr/src/linux/arch/x86_64/mm/k8topology.c uses the wrong mask to get 
the number of nodes when booting on an 8 way system.  It will crash 
if loaded on such a system.

Version-Release number of selected component (if applicable):
2.4.21-11.EL

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Simple, 2 change patch to fix the problem attached.

Comment 1 Mark Langsdorf 2004-03-17 19:59:48 UTC
Created attachment 98611 [details]
patch to correctly read processor masks for 8p system

This patch has been tested by AMD and verified to work.

Comment 2 Jim Paradis 2004-03-31 22:35:47 UTC
Patch submitted, should be included in U3.


Comment 3 Ernie Petrides 2004-03-31 23:24:38 UTC
Jim's patch has been posted to our internal review list.  When it
is finally checked in to our CVS patch pool (for U3), I'll change
the status of this bug back to "modified" to confirm this.


Comment 4 Mark Langsdorf 2004-04-14 19:12:34 UTC
Further testing found an additional bug in weird memory situations 
(ie, some but not all processors have memory).  Patch below:

diff -u linux/arch/x86_64/mm/k8topology.c-o 
linux/arch/x86_64/mm/k8topology.c
--- linux/arch/x86_64/mm/k8topology.c-o	2004-01-29 16:15:07.000000000 
+0100
+++ linux/arch/x86_64/mm/k8topology.c	2004-04-10 01:20:41.000000000 
+0200
@@ -196,7 +196,7 @@
 			continue;		
 		if ((nodes_present >> rr) == 0) 
 			rr = 0; 
-		rr = ffz(~nodes_present >> rr); 
+		rr += ffz(~nodes_present >> rr);
 		PLAT_NODE_DATA(i) = PLAT_NODE_DATA(rr); 
 		rr++; 
 	}


Comment 5 Ernie Petrides 2004-05-04 23:09:57 UTC
The patch in comment #1 has just been committed to the RHEL3 U3
patch pool (in kernel version 2.4.21-15.2.EL).

Jim, could you please validate/test/post the additional patch in
comment #4?  If/when that is committed to U3, I'll change the state
of this Bugzilla report to "modified".

Thanks.  -ernie


Comment 6 Ernie Petrides 2004-06-09 21:20:08 UTC
Mark, the patch in comment #4 doesn't apply to the current
(in-progress) RHEL3 U3 patch pool nor to the (released) RHEL3
U2 source tree.  I'm not sure where it's from, but let me just
ask: has the original problem described in this report been
resolved for you?

Thanks.  -ernie


Comment 7 Jim Paradis 2004-07-06 20:33:01 UTC
Mark - Have you been able to try a recent RHEL3 kernel on an 8-way
Opteron to see if this issue is indeed resolved for you?

--jim


Comment 8 Mark Langsdorf 2004-07-26 18:50:39 UTC
Sorry for the delay in answering.  We've only got a few of the
relevant systems so scheduling tests takes time.

I tried 2.4.21-15.0.3 (RHEL3-U2) with the following grub.conf boot 
line and the system failed to load the kernel:

title Red Hat Enterprise Linux AS (2.4.21-15.0.3.ELsmp)
         root (hd0,0)
         kernel /vmlinuz-2.4.21-15.0.3.ELsmp ro root=LABEL=/ 
apm=power-off hdc=ide-scsi noexec=off
         initrd /initrd-2.4.21-15.0.3.ELsmp.img

Adding numa=off allowed the system to boot into run level 3 as 
expected.

I can retest on RHEL3 U3 later this week, but we're still getting 
back from OLS.


Comment 9 Mark Langsdorf 2004-08-16 16:00:09 UTC
2.4.21-18 (U3 beta kernel) works on the 8 way systems we have 
available, with both NUMA enabled and disabled.  There is a 60% 
performance boost with NUMA enabled on the 8 way systems.

Thanks for resolving this issue.

Comment 10 Ernie Petrides 2004-09-01 20:03:07 UTC
I'm reverting this to MODIFIED state.  The Errata System will
automatically change the state to CLOSED/ERRATA when U3 is
released (most likely tomorrow).


Comment 11 John Flanagan 2004-09-02 04:31:10 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-433.html