Bug 220530 - kernel: EDAC MC0: UE page 0x2c, offset 0x0, grain 4096, ...
kernel: EDAC MC0: UE page 0x2c, offset 0x0, grain 4096, ...
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
6
All Linux
medium Severity low
: ---
: ---
Assigned To: Aristeu Rozanski
Brian Brock
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-12-21 17:25 EST by Jerry Quinn
Modified: 2008-02-25 13:25 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-02-25 13:25:54 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jerry Quinn 2006-12-21 17:25:27 EST
Description of problem:

Kernel writes to xterm windows messages like
smaug kernel: EDAC MC0: UE page 0x2c, offset 0x0, grain 4096, row 0, labels "":
i82860 UE

They occur 1-2 per minute.

/var/log/messaages doesn't show anything more enlightening.

Version-Release number of selected component (if applicable):

kernel-2.6.18-1.2868.fc6

How reproducible:

Install the system, apply all updates (as of 12/22).  Open an xterm and sit back.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Jarod Wilson 2006-12-21 17:32:41 EST
Erm, that's actually EDAC doing exactly what its supposed to be doing. Its
telling you that one of your DIMMs is constantly hitting uncorrectable errors.
In other words, you have some memory that has gone bad and should be replaced
(because uncorrectable errors can lead to data corruption).
Comment 2 Jerry Quinn 2006-12-21 18:41:09 EST
I'm reopening as an enhancement request, but bugzilla doesn't let me change the
priority.

Some quick googling didn't tell me that I've got memory, so please consider
teaching the kernel to output a message that is useful to non-kernel hackers.

For bonus points, tell me which DIMM is suspect :-)

Comment 3 Jarod Wilson 2006-12-21 21:43:05 EST
(In reply to comment #2)
> I'm reopening as an enhancement request, but bugzilla doesn't let me change
> the priority.

Not sure what the proper channel for an enhancement request like this is,
especially for Fedora...

> Some quick googling didn't tell me that I've got memory, so please consider
> teaching the kernel to output a message that is useful to non-kernel hackers.

Its probably worth adding some info on EDAC to the Fedora Project wiki, but this
is upstream code, not something we wrote. You'd likely need to take this up with
the EDAC maintainers.

> For bonus points, tell me which DIMM is suspect :-)

There's actually a facility in EDAC for doing that, but unfortunately, its on a
per-board basis. We have to know the exact memory layout, how
banks/rows/channels are mapped across DIMMS and how that corresponds to
silk-screened DIMM info on the motherboard. Unfortunately, very few boards are
properly documented to that level. However, when they are, EDAC will tell you
exactly which DIMM is bad (note the empty "" after "labels" in your output). If
you're lucky, your board could already be supported... The edac tarball from
http://bluesmoke.sourceforge.net/ contains some utilities that might help.

In a prior lifetime, I actually worked on large clusters where we had EDAC
configured on all nodes to report specific DIMMs, complete with cron jobs that
parsed logs looking for EDAC events, raising alerts over certain thresholds, etc...
Comment 4 Aristeu Rozanski 2007-08-20 12:35:45 EDT
The edac-utils package that was added to fedora extras should allow you to label
the memory modules slots. Please try it and report how it goes.
Comment 5 Aristeu Rozanski 2008-01-07 11:35:16 EST
Jerry, did you tried edac-utils yet?
Comment 6 Aristeu Rozanski 2008-02-13 15:22:11 EST
Jerry, any updates on this one?
Comment 7 Jerry Quinn 2008-02-15 11:24:30 EST
Unfortunately, I no longer have the machine that was giving me this problem.
Comment 8 Aristeu Rozanski 2008-02-25 13:25:54 EST
ok, I'll close this one. If you hit in the same problem and edac-utils isn't
enough to solve it, please reopen.

Note You need to log in before you can comment on or make changes to this bug.