Bug 220530
Summary: | kernel: EDAC MC0: UE page 0x2c, offset 0x0, grain 4096, ... | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jerry Quinn <jlquinn> |
Component: | kernel | Assignee: | Aristeu Rozanski <arozansk> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Brian Brock <bbrock> |
Severity: | low | Docs Contact: | |
Priority: | medium | ||
Version: | 6 | CC: | jarod, wtogami |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-02-25 18:25:54 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jerry Quinn
2006-12-21 22:25:27 UTC
Erm, that's actually EDAC doing exactly what its supposed to be doing. Its telling you that one of your DIMMs is constantly hitting uncorrectable errors. In other words, you have some memory that has gone bad and should be replaced (because uncorrectable errors can lead to data corruption). I'm reopening as an enhancement request, but bugzilla doesn't let me change the priority. Some quick googling didn't tell me that I've got memory, so please consider teaching the kernel to output a message that is useful to non-kernel hackers. For bonus points, tell me which DIMM is suspect :-) (In reply to comment #2) > I'm reopening as an enhancement request, but bugzilla doesn't let me change > the priority. Not sure what the proper channel for an enhancement request like this is, especially for Fedora... > Some quick googling didn't tell me that I've got memory, so please consider > teaching the kernel to output a message that is useful to non-kernel hackers. Its probably worth adding some info on EDAC to the Fedora Project wiki, but this is upstream code, not something we wrote. You'd likely need to take this up with the EDAC maintainers. > For bonus points, tell me which DIMM is suspect :-) There's actually a facility in EDAC for doing that, but unfortunately, its on a per-board basis. We have to know the exact memory layout, how banks/rows/channels are mapped across DIMMS and how that corresponds to silk-screened DIMM info on the motherboard. Unfortunately, very few boards are properly documented to that level. However, when they are, EDAC will tell you exactly which DIMM is bad (note the empty "" after "labels" in your output). If you're lucky, your board could already be supported... The edac tarball from http://bluesmoke.sourceforge.net/ contains some utilities that might help. In a prior lifetime, I actually worked on large clusters where we had EDAC configured on all nodes to report specific DIMMs, complete with cron jobs that parsed logs looking for EDAC events, raising alerts over certain thresholds, etc... The edac-utils package that was added to fedora extras should allow you to label the memory modules slots. Please try it and report how it goes. Jerry, did you tried edac-utils yet? Jerry, any updates on this one? Unfortunately, I no longer have the machine that was giving me this problem. ok, I'll close this one. If you hit in the same problem and edac-utils isn't enough to solve it, please reopen. |