From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050512 Red Hat/1.0.4-1.4.1 Firefox/1.0.4 Description of problem: Reopening this as the following seems to be the behaviour of the system. With 2.4.21-35.EL the system does not panic with numa=off. However, while booting with numa=off shows following errors, just after it has finished mounting the filesystems: ----------------------------------------------- CPU 0: Silent Northbridge MCE Northbridge status a6000001:0005001b Error gart error GART TLB error generic level generic err cpu1 processor context corrupt error uncorrected previous error lost NB error address 0000000037ff0000 ----------------------------------------------- The error address 0000000037ff0000 is the same as that reported by 2.4.21-34.EL when it panics on rebooting with numa=off alone. After rebooting, same messages keep coming up at some random time intervals for the rest of the cpus CPU 1, CPU 2 and CPU 3. However, the system does not show any other noticeable abnormal behaviour. Version-Release number of selected component (if applicable): kernel-2.4.21-35.EL How reproducible: Always Steps to Reproduce: 1.install 2.4.21-35.EL kernel 2.boot the box up 3.view the console log Actual Results: seeing the following messages: ----------------------------------------------- CPU 0: Silent Northbridge MCE Northbridge status a6000001:0005001b Error gart error GART TLB error generic level generic err cpu1 processor context corrupt error uncorrected previous error lost NB error address 0000000037ff0000 ----------------------------------------------- Expected Results: should not see any of these messages. Additional info:
This looks like a duplicate of bug 163210.
Removing ITs 73360 and 86498, which are not about GART errors during boot.
How much physical memory was on the system exhibiting this problem? Specifically, I'm wondering if the 37ff0000 address is the base of the last page of physical memory.
Can you attach a boot log (either serial capture or dmesg) from a system that shows this behavior? There may be some clues in there.
Marizol Martinez, could we please get some help trying to reproduce this problem on RHEL3 U7? Could you also post the data requested in comment #13? Thanks in advance. -ernie
A fix for this problem has just been committed to the RHEL3 U8 patch pool this evening (in kernel version 2.4.21-40.7.EL).
Is there a test kernel available for this? I have hardware that can reproduce the GART errors with RHEL3u7.
Yes, but it is in internal beta at the moment. Watch for it in the RHN beta channels in a couple of weeks or so. The latest kernel version (1st U8 beta respin) is 2.4.21-42.EL (built on Friday).
A kernel has been released that contains a patch for this problem. Please verify if your problem is fixed with the latest available kernel from the RHEL3 public beta channel at rhn.redhat.com and report your test results.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0437.html