|Summary:||[RHEL3] uncorrectable ECC memory errors do NOT halt the system|
|Product:||Red Hat Enterprise Linux 3||Reporter:||Alexandre Oliva <aoliva>|
|Component:||kernel||Assignee:||Dave Anderson <anderson>|
|Status:||CLOSED ERRATA||QA Contact:||Brian Brock <bbrock>|
|Version:||3.0||CC:||barryn, jbaron, jparadis, peterm, petrides, riel|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2004-12-20 20:55:51 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
Description Alexandre Oliva 2004-08-06 20:47:58 UTC
Linux won't protect itself from memory corruption due to not properly panic'ing and leaving the system in a somewhat usable state upon encountering serious memory error. Arjan van de Ven wrote: yes this is an oversight that I'll be correcting in the rhel4 kernel Frank Hirtz wrote: Is this something that we can get addressed from within the context of RHEL 2.1 and 3?
Comment 3 Dave Anderson 2004-08-18 17:19:32 UTC
It looks OK to me, although I'd prefer that the "if (mem_nmi_panic)" clutter be moved inside the mem_parity_error() function. An AS2.1 version would require a bit more since there's no die_nmi() function but could be easily done. I can put together a couple patches for both kernels, but I'd also prefer to follow Arjan's lead in how he would implement it in RHEL4.
Comment 4 Dave Anderson 2004-08-18 17:53:21 UTC
I see now he's just followed the lead of the "unknown_nmi_panic" sysctl check above it, which leads to the question as to whether it makes sense to put both that sysctl as well as the proposed mem_nmi_panic sysctl's into AS2.1 and RHEL4 to maintain consistency?
Comment 5 Alexandre Oliva 2004-08-19 11:20:34 UTC
The feature request is for both 2.1 and 3, and it should definitely be carried over to RHEL4 to avoid a regression.
Comment 6 Dave Anderson 2004-09-14 20:46:24 UTC
RHEL3 patch posted today. I'll start on an AS2.1 version tomorrow, noting as before that it is not as simple because there's no die_nmi() function in AS2.1.
Comment 8 Dave Anderson 2004-09-15 18:58:24 UTC
I am just about to post the AS2.1 patch.
Comment 9 Dave Anderson 2004-09-15 19:14:19 UTC
AS2.1 patch posted today. Note that the patch also adds the "unknown_nmi_panic" tuneable in addition to the requested "mem_nmi_panic", making it consistent with RHEL3. RHEL4 already has "unknown_nmi_panic", and Arjan has indicated that he will be adding "mem_nmi_panic".
Comment 10 Ernie Petrides 2004-09-20 06:53:17 UTC
A fix for this problem has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-20.8.EL).
Comment 11 John Flanagan 2004-12-13 20:06:27 UTC
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-505.html
Comment 12 Ernie Petrides 2004-12-13 21:41:41 UTC
This bug was inappropriate listed in the above 2.1 Erratum (listed above), and thus should not have yet been closed. I'm reverting it to MODIFIED state until the RHEL3 Erratum is released (which should be in a week). I'm also removing it from the RHEL2.1 blocker list.
Comment 13 Ernie Petrides 2004-12-13 21:45:09 UTC
I meant to write "inappropriately".
Comment 14 John Flanagan 2004-12-20 20:55:51 UTC
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html