Bug 220530

Summary: kernel: EDAC MC0: UE page 0x2c, offset 0x0, grain 4096, ...
Product: [Fedora] Fedora Reporter: Jerry Quinn <jlquinn>
Component: kernelAssignee: Aristeu Rozanski <arozansk>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: low Docs Contact:
Priority: medium    
Version: 6CC: jarod, wtogami
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-02-25 18:25:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jerry Quinn 2006-12-21 22:25:27 UTC
Description of problem:

Kernel writes to xterm windows messages like
smaug kernel: EDAC MC0: UE page 0x2c, offset 0x0, grain 4096, row 0, labels "":
i82860 UE

They occur 1-2 per minute.

/var/log/messaages doesn't show anything more enlightening.

Version-Release number of selected component (if applicable):

kernel-2.6.18-1.2868.fc6

How reproducible:

Install the system, apply all updates (as of 12/22).  Open an xterm and sit back.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Jarod Wilson 2006-12-21 22:32:41 UTC
Erm, that's actually EDAC doing exactly what its supposed to be doing. Its
telling you that one of your DIMMs is constantly hitting uncorrectable errors.
In other words, you have some memory that has gone bad and should be replaced
(because uncorrectable errors can lead to data corruption).

Comment 2 Jerry Quinn 2006-12-21 23:41:09 UTC
I'm reopening as an enhancement request, but bugzilla doesn't let me change the
priority.

Some quick googling didn't tell me that I've got memory, so please consider
teaching the kernel to output a message that is useful to non-kernel hackers.

For bonus points, tell me which DIMM is suspect :-)



Comment 3 Jarod Wilson 2006-12-22 02:43:05 UTC
(In reply to comment #2)
> I'm reopening as an enhancement request, but bugzilla doesn't let me change
> the priority.

Not sure what the proper channel for an enhancement request like this is,
especially for Fedora...

> Some quick googling didn't tell me that I've got memory, so please consider
> teaching the kernel to output a message that is useful to non-kernel hackers.

Its probably worth adding some info on EDAC to the Fedora Project wiki, but this
is upstream code, not something we wrote. You'd likely need to take this up with
the EDAC maintainers.

> For bonus points, tell me which DIMM is suspect :-)

There's actually a facility in EDAC for doing that, but unfortunately, its on a
per-board basis. We have to know the exact memory layout, how
banks/rows/channels are mapped across DIMMS and how that corresponds to
silk-screened DIMM info on the motherboard. Unfortunately, very few boards are
properly documented to that level. However, when they are, EDAC will tell you
exactly which DIMM is bad (note the empty "" after "labels" in your output). If
you're lucky, your board could already be supported... The edac tarball from
http://bluesmoke.sourceforge.net/ contains some utilities that might help.

In a prior lifetime, I actually worked on large clusters where we had EDAC
configured on all nodes to report specific DIMMs, complete with cron jobs that
parsed logs looking for EDAC events, raising alerts over certain thresholds, etc...

Comment 4 Aristeu Rozanski 2007-08-20 16:35:45 UTC
The edac-utils package that was added to fedora extras should allow you to label
the memory modules slots. Please try it and report how it goes.


Comment 5 Aristeu Rozanski 2008-01-07 16:35:16 UTC
Jerry, did you tried edac-utils yet?


Comment 6 Aristeu Rozanski 2008-02-13 20:22:11 UTC
Jerry, any updates on this one?


Comment 7 Jerry Quinn 2008-02-15 16:24:30 UTC
Unfortunately, I no longer have the machine that was giving me this problem.


Comment 8 Aristeu Rozanski 2008-02-25 18:25:54 UTC
ok, I'll close this one. If you hit in the same problem and edac-utils isn't
enough to solve it, please reopen.