Bug 129359 - [RHEL3] uncorrectable ECC memory errors do NOT halt the system
[RHEL3] uncorrectable ECC memory errors do NOT halt the system
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Dave Anderson
Brian Brock
:
Depends On:
Blocks: 123574
  Show dependency treegraph
 
Reported: 2004-08-06 16:47 EDT by Alexandre Oliva
Modified: 2007-11-30 17:07 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-12-20 15:55:51 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alexandre Oliva 2004-08-06 16:47:58 EDT
Linux won't protect itself from memory corruption due to not properly
panic'ing and leaving the system in a somewhat usable state upon
encountering serious memory error.

Arjan van de Ven wrote:

yes this is an oversight that I'll be correcting in the rhel4 kernel

Frank Hirtz wrote:

Is this something that we can get addressed from within the context of
RHEL 2.1 and 3?
Comment 3 Dave Anderson 2004-08-18 13:19:32 EDT
It looks OK to me, although I'd prefer that the "if (mem_nmi_panic)"
clutter be moved inside the mem_parity_error() function.  An AS2.1
version would require a bit more since there's no die_nmi() function
but could be easily done.  I can put together a couple patches for
both kernels, but I'd also prefer to follow Arjan's lead in how he
would implement it in RHEL4.
Comment 4 Dave Anderson 2004-08-18 13:53:21 EDT
I see now he's just followed the lead of the "unknown_nmi_panic"
sysctl check above it, which leads to the question as to whether it
makes sense to put both that sysctl as well as the proposed
mem_nmi_panic sysctl's into AS2.1 and RHEL4 to maintain consistency?
Comment 5 Alexandre Oliva 2004-08-19 07:20:34 EDT
The feature request is for both 2.1 and 3, and it should definitely be
carried over to RHEL4 to avoid a regression.
Comment 6 Dave Anderson 2004-09-14 16:46:24 EDT
RHEL3 patch posted today.

I'll start on an AS2.1 version tomorrow, noting as before
that it is not as simple because there's no die_nmi() function
in AS2.1. 
Comment 8 Dave Anderson 2004-09-15 14:58:24 EDT
I am just about to post the AS2.1 patch.
Comment 9 Dave Anderson 2004-09-15 15:14:19 EDT
AS2.1 patch posted today.

Note that the patch also adds the "unknown_nmi_panic" tuneable in
addition to the requested "mem_nmi_panic", making it consistent with
RHEL3.  RHEL4 already has "unknown_nmi_panic", and Arjan has indicated
that he will be adding "mem_nmi_panic".  
Comment 10 Ernie Petrides 2004-09-20 02:53:17 EDT
A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.8.EL).
Comment 11 John Flanagan 2004-12-13 15:06:27 EST
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-505.html
Comment 12 Ernie Petrides 2004-12-13 16:41:41 EST
This bug was inappropriate listed in the above 2.1 Erratum (listed above),
and thus should not have yet been closed.  I'm reverting it to MODIFIED
state until the RHEL3 Erratum is released (which should be in a week).
I'm also removing it from the RHEL2.1 blocker list.
Comment 13 Ernie Petrides 2004-12-13 16:45:09 EST
I meant to write "inappropriately".
Comment 14 John Flanagan 2004-12-20 15:55:51 EST
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html

Note You need to log in before you can comment on or make changes to this bug.