Bug 167672

Summary: GART error during bootup
Product: Red Hat Enterprise Linux 3 Reporter: Linda Wang <lwang>
Component: kernelAssignee: Jim Paradis <jparadis>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: aspanke, bmaly, jnansi, martinez, netllama, peterm, petrides, syeghiay, tkincaid
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0437 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-07-20 13:29:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 181405    

Description Linda Wang 2005-09-06 20:31:37 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050512 Red Hat/1.0.4-1.4.1 Firefox/1.0.4

Description of problem:
Reopening this as the following seems to be the behaviour of the system.



With 2.4.21-35.EL the system does not panic with numa=off.  However, while
booting with numa=off shows following errors, just after it has finished
mounting the filesystems: 

 ----------------------------------------------- 

  CPU 0: Silent Northbridge MCE 

  Northbridge status a6000001:0005001b 

      Error gart error 

      GART TLB error generic level generic 

      err cpu1 

      processor context corrupt 

      error uncorrected 

      previous error lost 

      NB error address 0000000037ff0000 

 ----------------------------------------------- 

 

The error address 0000000037ff0000 is the same as that reported by 

2.4.21-34.EL when it panics on rebooting with numa=off alone. 



After rebooting, same messages keep coming up at some random time intervals


for the rest of the cpus CPU 1, CPU 2 and CPU 3. 

 

However, the system does not show any other noticeable abnormal behaviour.





Version-Release number of selected component (if applicable):
kernel-2.4.21-35.EL

How reproducible:
Always

Steps to Reproduce:
1.install 2.4.21-35.EL kernel
2.boot the box up
3.view the console log
  

Actual Results:  seeing the following messages:

 ----------------------------------------------- 

  CPU 0: Silent Northbridge MCE 

  Northbridge status a6000001:0005001b 

      Error gart error 

      GART TLB error generic level generic 

      err cpu1 

      processor context corrupt 

      error uncorrected 

      previous error lost 

      NB error address 0000000037ff0000 

 ----------------------------------------------- 



Expected Results:  should not see any of these messages.

Additional info:

Comment 9 Lonni J Friedman 2005-12-07 16:31:01 UTC
This looks like a duplicate of bug 163210.

Comment 11 Ernie Petrides 2006-01-27 22:43:10 UTC
Removing ITs 73360 and 86498, which are not about GART errors during boot.

Comment 12 Ernie Petrides 2006-01-27 22:49:02 UTC
How much physical memory was on the system exhibiting this problem?

Specifically, I'm wondering if the 37ff0000 address is the base of
the last page of physical memory.

Comment 13 Jim Paradis 2006-01-27 23:04:36 UTC
Can you attach a boot log (either serial capture or dmesg) from a system that
shows this behavior?  There may be some clues in there.


Comment 15 Ernie Petrides 2006-02-02 03:17:46 UTC
Marizol Martinez, could we please get some help trying to reproduce this
problem on RHEL3 U7?  Could you also post the data requested in comment #13?

Thanks in advance.  -ernie


Comment 22 Ernie Petrides 2006-04-20 01:23:34 UTC
A fix for this problem has just been committed to the RHEL3 U8
patch pool this evening (in kernel version 2.4.21-40.7.EL).


Comment 25 Dan Carpenter 2006-05-09 02:49:48 UTC
Is there a test kernel available for this?  I have hardware that can reproduce
the GART errors with RHEL3u7.



Comment 26 Ernie Petrides 2006-05-09 20:16:52 UTC
Yes, but it is in internal beta at the moment.  Watch for it in the
RHN beta channels in a couple of weeks or so.  The latest kernel
version (1st U8 beta respin) is 2.4.21-42.EL (built on Friday).

Comment 27 Joshua Giles 2006-06-01 04:19:39 UTC
A kernel has been released that contains a patch for this problem.  Please
verify if your problem is fixed with the latest available kernel from the RHEL3
public beta channel at rhn.redhat.com and report your test results.

Comment 29 Red Hat Bugzilla 2006-07-20 13:29:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0437.html