Bug 112436
Summary: | aacraid + kernel2.4.21-4.0.1.ELsmp + x86_64 == crash | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Jeff Thomas <jeff> |
Component: | kernel | Assignee: | Tom Coughlan <coughlan> |
Status: | CLOSED CANTFIX | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.0 | CC: | ckloiber, jparadis, petrides, riel, stakagi, tao |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-09-19 13:56:00 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jeff Thomas
2003-12-19 17:49:04 UTC
I had also opened service request 277987 on this, a tech there responded suggesting to add "nomce" to the kernel command. This is effective is eliminating the crash and the systems appear stable. I am not familiar with the details of the machine check exception but if this is a valid fix then please close this ticket. my customer has also reported a similar case since updated kernel to 2.4.21-20.EL.x86_64 and using Optron64. Sep 7 04:02:23 opteron kernel: Northbridge status a60000010005001b Sep 7 04:02:23 opteron kernel: GART error 11 Sep 7 04:02:23 opteron kernel: Lost an northbridge error Sep 7 04:02:23 opteron kernel: NB status: unrecoverable Sep 7 04:02:23 opteron kernel: NB error address 00000000fbf61258 Sep 7 04:02:23 opteron kernel: Error uncorrected Documentation for AMD Opteron MCE architecture may be found at http://www.amd.com/us- en/assets/content_type/white_papers_and_tech_docs/26094.PDF This appears to decode to be a GART TLB Error with a valid cause address of 00000000fbf61258. Given the address (it's very near where I would expect mmio space would be allocated) I would take a look in /proc/iomem and see if the controller in question has memory near this address. Thanks for the information David. Shinya, Jeff, Please check /proc/iomem to see if the Adaptec 2200 has memory at the address shown in the machine check. Also please check with Adaptec and make sure you have the latest firmware for that board. Since we have not received the feedback we requested, we will assume the problem was not reproduceable or has been fixed in a later update for this product. Users who have experienced this problem are encouraged to upgrade to the latest update release, and if this issue is still reproduceable, please contact the Red Hat Global Support Services page on our website for technical support options: https://www.redhat.com/support If you have a telephone based support contract, you may contact Red Hat at 1-888-GO-REDHAT for technical support for the problem you are experiencing. |