Red Hat Bugzilla – Bug 220530
kernel: EDAC MC0: UE page 0x2c, offset 0x0, grain 4096, ...
Last modified: 2008-02-25 13:25:54 EST
Description of problem:
Kernel writes to xterm windows messages like
smaug kernel: EDAC MC0: UE page 0x2c, offset 0x0, grain 4096, row 0, labels "":
They occur 1-2 per minute.
/var/log/messaages doesn't show anything more enlightening.
Version-Release number of selected component (if applicable):
Install the system, apply all updates (as of 12/22). Open an xterm and sit back.
Steps to Reproduce:
Erm, that's actually EDAC doing exactly what its supposed to be doing. Its
telling you that one of your DIMMs is constantly hitting uncorrectable errors.
In other words, you have some memory that has gone bad and should be replaced
(because uncorrectable errors can lead to data corruption).
I'm reopening as an enhancement request, but bugzilla doesn't let me change the
Some quick googling didn't tell me that I've got memory, so please consider
teaching the kernel to output a message that is useful to non-kernel hackers.
For bonus points, tell me which DIMM is suspect :-)
(In reply to comment #2)
> I'm reopening as an enhancement request, but bugzilla doesn't let me change
> the priority.
Not sure what the proper channel for an enhancement request like this is,
especially for Fedora...
> Some quick googling didn't tell me that I've got memory, so please consider
> teaching the kernel to output a message that is useful to non-kernel hackers.
Its probably worth adding some info on EDAC to the Fedora Project wiki, but this
is upstream code, not something we wrote. You'd likely need to take this up with
the EDAC maintainers.
> For bonus points, tell me which DIMM is suspect :-)
There's actually a facility in EDAC for doing that, but unfortunately, its on a
per-board basis. We have to know the exact memory layout, how
banks/rows/channels are mapped across DIMMS and how that corresponds to
silk-screened DIMM info on the motherboard. Unfortunately, very few boards are
properly documented to that level. However, when they are, EDAC will tell you
exactly which DIMM is bad (note the empty "" after "labels" in your output). If
you're lucky, your board could already be supported... The edac tarball from
http://bluesmoke.sourceforge.net/ contains some utilities that might help.
In a prior lifetime, I actually worked on large clusters where we had EDAC
configured on all nodes to report specific DIMMs, complete with cron jobs that
parsed logs looking for EDAC events, raising alerts over certain thresholds, etc...
The edac-utils package that was added to fedora extras should allow you to label
the memory modules slots. Please try it and report how it goes.
Jerry, did you tried edac-utils yet?
Jerry, any updates on this one?
Unfortunately, I no longer have the machine that was giving me this problem.
ok, I'll close this one. If you hit in the same problem and edac-utils isn't
enough to solve it, please reopen.