Bug 191506
Summary: | i82875p_edac spitting EDAC errors | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Daniel Roesen <dr> | ||||||||
Component: | kernel | Assignee: | Aristeu Rozanski <arozansk> | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 11 | CC: | arozansk, bolek-vendor, bughunt, davej, jesse, matt, raffaele.recalcati, saymanzzy, triage, wtogami | ||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||
Target Release: | --- | ||||||||||
Hardware: | i686 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | bzcl34nup | ||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2010-06-28 10:22:03 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 513462 | ||||||||||
Attachments: |
|
Description
Daniel Roesen
2006-05-12 15:59:10 UTC
Sidenote: machine works rock-solid with kernel build 1833. No crashes, no data corruption or any other oddities observed yet. Someone else reporting exactly the same problem with multiple machines with the same mainboard: http://lkml.org/lkml/2006/5/16/79 http://lkml.org/lkml/2006/5/16/40 And someone else having similar problems with the RHEL4 beta in February: http://www.redhat.com/archives/nahant-beta-list/2006-February/msg00000.html should be fixed in latest errata. (previous update had a debug option left on) Still happening with latest FC4 and FC5 kernels (2111 and 2122). As far as I can see it's not the debug message problem. It is still occuring: Dell PE 2850 - Upgraded from FC4 to FC5. Sep 9 12:44:07 XXX kernel: Non-Fatal Error DRAM Controler Sep 9 12:44:07 XXX kernel: Non-Fatal Error DRAM Controler Sep 9 12:44:07 XXX kernel: EDAC MC0: CE page 0x1d9, offset 0x0, grain 0, syndrome 0xefd0, row 7, channel 1, label "": e752x CE Sep 9 12:44:07 XXX kernel: EDAC MC0: CE page 0x7ced7, offset 0x0, grain 0, syndrome 0x9120, row 7, channel 1, label "": e752x CE Linux XXX 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST 2006 i686 i686 i386 GNU/Linux Can you confirm this Dell box is using the ASUS P4C800(-E Deluxe) as well? Nope, it does not. Please see: http://www.dell.com/content/products/productdetails.aspx/pedge_2850?c=us&cs=555&l=en&s=biz [This comment added as part of a mass-update to all open FC4 kernel bugs] FC4 has now transitioned to the Fedora legacy project, which will continue to release security related updates for the kernel. As this bug is not security related, it is unlikely to be fixed in an update for FC4, and has been migrated to FC5. Please retest with Fedora Core 5. Thank you. Still a problem with FC5 (as stated already 4 months ago) and most probably FC6 too. A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you. I'm having a problem similar to "Daniel Roesen on 2006-05-12 11:59". NEC Express5800 TM700. Fedora Core 5 updated at about three days ago. uname -r 2.6.18-1.2200.fc5smp Each second syslog prints the followings warnings. If I shutdown syslog demon the printings stop. I have checked the ram with memtest86-3.2 for 17 hours and it is ok. The pc has worked well for about a year with Redhat Enterprise (2,4,x kernel: 2.4.31 perhaps) I want to use this system as server but I absolutely don't trust it. Message from syslogd@dhcppc7 at Thu Nov 2 14:37:28 2006 ... dhcppc7 kernel: EDAC MC0: UE page 0x19a8, offset 0x0, grain 4096, row 0, labels ":": i82875p UE Message from syslogd@dhcppc7 at Thu Nov 2 14:37:28 2006 ... dhcppc7 kernel: EDAC MC0: UE page 0x37e01, offset 0x0, grain 4096, row 1, labels ":": i82875p UE Message from syslogd@dhcppc7 at Thu Nov 2 14:37:29 2006 ... dhcppc7 kernel: EDAC MC0: UE page 0x2ce08, offset 0x0, grain 4096, row 1, labels ":": i82875p UE Message from syslogd@dhcppc7 at Thu Nov 2 14:37:30 2006 ... dhcppc7 kernel: EDAC MC0: UE page 0x36a78, offset 0x0, grain 4096, row 1, labels ":": i82875p UE dmesg EDAC MC0: UE page 0x2e4e4, offset 0x0, grain 4096, row 1, labels ":": i82875p UEEDAC MC0: UE page 0x36b5b, offset 0x0, grain 4096, row 1, labels ":": i82875p UEEDAC MC0: UE page 0x37ed8, offset 0x0, grain 4096, row 1, labels ":": i82875p UEEDAC MC0: UE page 0x37746, offset 0x0, grain 4096, row 1, labels ":": i82875p UEEDAC MC0: UE page 0x374ec, offset 0x0, grain 4096, row 1, labels ":": i82875p UEEDAC MC0: UE page 0x36b06, offset 0x0, grain 4096, row 1, labels ":": i82875p UEEDAC MC0: UE page 0x36a91, offset 0x0, grain 4096, row 1, labels ":": i82875p processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Pentium(R) 4 CPU 3.00GHz stepping : 3 cpu MHz : 2998.654 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pni monitor ds_cpl cid bogomips : 6001.85 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Pentium(R) 4 CPU 3.00GHz stepping : 3 cpu MHz : 2998.654 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pni monitor ds_cpl cid bogomips : 5997.58 [root@dhcppc7 ~]# lsmod | grep i82 i82875p_edac 10821 0 edac_mc 27465 1 i82875p_edac [root@dhcppc7 ~]# cat /proc/misc 63 device-mapper 183 hw_random 175 agpgart 144 nvram 228 hpet 135 rtc Alan, any ideas ? Code appears to be working correctly. Something is causing UE's to be flagged - could be board, psu, ram whatever. Daniel, I need a lspci -vvv and need to know some details on your memory modules (ECC, how many, sizes, etc). Can you get these for me? Thanks Ugh, just noticed you already provided the memory modules info. But I still need the lspci output. Thanks Created attachment 147719 [details]
lspci -vvv output of my ASUS P4C800-E Deluxe mainboard based system
lspci -vvv output attached. Thanks for following up!
Just FYI: problem still exists with recent FC6 kernels. Doesn't go away by
itself, unfortunately. :-)
I am having the same issue with FC6. 00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 0c) Subsystem: Intel Corporation Unknown device 3439 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Capabilities: [40] Vendor Specific Information 00:00.1 Class ff00: Intel Corporation E7525/E7520 Error Reporting Registers (rev 0c) Subsystem: Intel Corporation Unknown device 3439 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- 00:01.0 System peripheral: Intel Corporation E7520 DMA Controller (rev 0c) Subsystem: Intel Corporation Unknown device 3439 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin A routed to IRQ 10 Region 0: Memory at fcdff000 (32-bit, non-prefetchable) [size=4K] Capabilities: [b0] Message Signalled Interrupts: 64bit- Queue=0/1 Enable- Address: fee00000 Data: 0000 00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 0c) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=01, subordinate=03, sec-latency=0 I/O behind bridge: 0000d000-0000dfff Memory behind bridge: fce00000-fcffffff Prefetchable memory behind bridge: 00000000fb800000-00000000fbf00000 Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable- Address: fee00000 Data: 0000 Capabilities: [64] Express Root Port (Slot-) IRQ 0 Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <64ns, L1 <1us Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 256 bytes, MaxReadReq 512 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 2 Link: Latency L0s <4us, L1 unlimited Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x8 Root: Correctable+ Non-Fatal+ Fatal+ PME+ 00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 0c) (prog-if 00 [Normal decode]) Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=04, subordinate=04, sec-latency=0 I/O behind bridge: 0000f000-00000fff Memory behind bridge: fff00000-000fffff Prefetchable memory behind bridge: 00000000fff00000-0000000000000000 Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable- Address: fee00000 Data: 0000 Capabilities: [64] Express Root Port (Slot-) IRQ 0 Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <64ns, L1 <1us Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 4 Link: Latency L0s <4us, L1 unlimited Link: ASPM Disabled RCB 64 bytes Disabled CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x0 Root: Correctable- Non-Fatal- Fatal- PME- 00:05.0 PCI bridge: Intel Corporation E7520 PCI Express Port B1 (rev 0c) (prog-if 00 [Normal decode]) Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=05, subordinate=05, sec-latency=0 I/O behind bridge: 0000f000-00000fff Memory behind bridge: fff00000-000fffff Prefetchable memory behind bridge: 00000000fff00000-0000000000000000 Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity+ SERR- NoISA+ VGA- MAbort- >Reset- FastB2B- Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable- Address: fee00000 Data: 0000 Capabilities: [64] Express Root Port (Slot-) IRQ 0 Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <64ns, L1 <1us Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x4, ASPM L0s, Port 5 Link: Latency L0s <4us, L1 unlimited Link: ASPM Disabled RCB 64 bytes Disabled CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x0 Root: Correctable- Non-Fatal- Fatal- PME- 00:06.0 PCI bridge: Intel Corporation E7520 PCI Express Port C (rev 0c) (prog-if 00 [Normal decode]) Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=06, subordinate=06, sec-latency=0 I/O behind bridge: 0000f000-00000fff Memory behind bridge: fff00000-000fffff Prefetchable memory behind bridge: 00000000fff00000-0000000000000000 Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable- Address: fee00000 Data: 0000 Capabilities: [64] Express Root Port (Slot-) IRQ 0 Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <64ns, L1 <1us Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 6 Link: Latency L0s <4us, L1 unlimited Link: ASPM Disabled RCB 64 bytes Disabled CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x0 Root: Correctable- Non-Fatal- Fatal- PME- 00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02) (prog-if 00 [UHCI]) Subsystem: Intel Corporation Unknown device 3439 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin A routed to IRQ 16 Region 4: I/O ports at c880 [size=32] 00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02) (prog-if 00 [UHCI]) Subsystem: Intel Corporation Unknown device 3439 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin B routed to IRQ 18 Region 4: I/O ports at cc00 [size=32] 00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02) (prog-if 00 [UHCI]) Subsystem: Intel Corporation Unknown device 3439 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin C routed to IRQ 17 Region 4: I/O ports at cc80 [size=32] 00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02) (prog-if 20 [EHCI]) Subsystem: Intel Corporation Unknown device 3439 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin D routed to IRQ 19 Region 0: Memory at fcdfec00 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Debug port 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Bus: primary=00, secondary=07, subordinate=07, sec-latency=32 I/O behind bridge: 0000e000-0000efff Memory behind bridge: fd000000-febfffff Prefetchable memory behind bridge: e2000000-e20fffff Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity+ SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B- 00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02) Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02) (prog-if 8a [Master SecP PriP]) Subsystem: Intel Corporation Unknown device 3439 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin A routed to IRQ 17 Region 0: I/O ports at <unassigned> Region 1: I/O ports at <unassigned> Region 2: I/O ports at <unassigned> Region 3: I/O ports at <unassigned> Region 4: I/O ports at fc00 [size=16] Region 5: Memory at e2100000 (32-bit, non-prefetchable) [size=1K] 00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02) Subsystem: Intel Corporation Unknown device 3439 Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin B routed to IRQ 21 Region 4: I/O ports at 0540 [size=32] 01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09) (prog-if 00 [Normal decode]) Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 64 bytes Bus: primary=01, secondary=02, subordinate=02, sec-latency=64 I/O behind bridge: 0000f000-00000fff Memory behind bridge: fff00000-000fffff Prefetchable memory behind bridge: 00000000fff00000-0000000000000000 Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- Capabilities: [44] Express PCI/PCI-X Bridge IRQ 0 Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <64ns, L1 <1us Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 256 bytes, MaxReadReq 512 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 0 Link: Latency L0s unlimited, L1 unlimited Link: ASPM Disabled CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x8 Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [6c] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [d8] PCI-X bridge device Secondary Status: 64bit+ 133MHz+ SCD- USC- SCO- SRD- Freq=100MHz Status: Dev=01:00.0 64bit- 133MHz- SCD- USC- SCO- SRD- Upstream: Capacity=65535 CommitmentLimit=65535 Downstream: Capacity=65535 CommitmentLimit=65535 01:00.1 PIC: Intel Corporation 6700/6702PXH I/OxAPIC Interrupt Controller A (rev 09) (prog-if 20 [IO(X)-APIC]) Subsystem: Intel Corporation Unknown device 3439 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Region 0: Memory at fcefe000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Express Endpoint IRQ 0 Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <64ns, L1 <1us Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 256 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 0 Link: Latency L0s unlimited, L1 unlimited Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x8 Capabilities: [6c] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 64 bytes Bus: primary=01, secondary=03, subordinate=03, sec-latency=48 I/O behind bridge: 0000d000-0000dfff Memory behind bridge: fcf00000-fcffffff Prefetchable memory behind bridge: 00000000fb800000-00000000fbf00000 Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- Capabilities: [44] Express PCI/PCI-X Bridge IRQ 0 Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <64ns, L1 <1us Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 256 bytes, MaxReadReq 512 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 0 Link: Latency L0s unlimited, L1 unlimited Link: ASPM Disabled CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x8 Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [6c] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [d8] PCI-X bridge device Secondary Status: 64bit+ 133MHz+ SCD- USC- SCO- SRD- Freq=conv Status: Dev=01:00.2 64bit- 133MHz- SCD- USC- SCO- SRD- Upstream: Capacity=65535 CommitmentLimit=65535 Downstream: Capacity=65535 CommitmentLimit=65535 01:00.3 PIC: Intel Corporation 6700PXH I/OxAPIC Interrupt Controller B (rev 09) (prog-if 20 [IO(X)-APIC]) Subsystem: Intel Corporation Unknown device 3439 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Region 0: Memory at fceff000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Express Endpoint IRQ 0 Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <64ns, L1 <1us Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 256 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 0 Link: Latency L0s unlimited, L1 unlimited Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x8 Capabilities: [6c] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 03:01.0 RAID bus controller: 3ware Inc 9xxx-series SATA-RAID Subsystem: 3ware Inc 9xxx-series SATA-RAID Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (2250ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 20 Region 0: I/O ports at dc00 [size=256] Region 1: Memory at fcfffc00 (64-bit, non-prefetchable) [size=256] Region 3: Memory at fb800000 (64-bit, prefetchable) [size=8M] Expansion ROM at fcfe0000 [disabled] [size=64K] Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 07:04.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller (rev 05) Subsystem: Intel Corporation Unknown device 3439 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (63750ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at febe0000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at ec80 [size=64] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] PCI-X non-bridge device Command: DPERE- ERO+ RBC=512 OST=1 Status: Dev=00:00.0 64bit- 133MHz- SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=8 RSCEM- 266MHz- 533MHz- 07:0c.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA]) Subsystem: Intel Corporation Unknown device 3439 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (2000ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 11 Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M] Region 1: I/O ports at e800 [size=256] Region 2: Memory at febdb000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at e2000000 [disabled] [size=128K] Capabilities: [5c] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- KERNEL Version: 2.6.19-1.2895.fc6PAE Thanks Daniel, John. While I check this problem, can you make sure if the BIOS version you're running is the latest? Thanks, Daniel, John, I'll also need as much information on memory modules you can get. Or at least the brand, model, etc so I can get the information I need. Thanks I'm in the process moving across Germany, I'll get back to you with info as soon as I can. I have the same problem with RHEL4 Update4. It's very strange that my box panics every __64__ minutes, does that mean there is a counter for the UE error checking? Oops message: MC0: CE page 0x41a18, offset 0x0, grain 4096, syndrome 0x8420, row 1,E MC0: CE - no information available: e7xxx CE log register overflow MC0: UE page 0x41a19, offset 0x0, grain 4096, row 1, labels ":": e7xxx UE Kernel panic - not syncing: MC0: UE page 0x41a19, offset 0x0, grain 4096, row 1E Badness in smp_call_function at arch/i386/kernel/smp.c:577 [<c0116ba4>] smp_call_function+0x50/0xc9 [<c01229d5>] vprintk+0x136/0x14a [<c0116c5d>] smp_send_stop+0x13/0x1c [<c01220b6>] panic+0x5b/0x147 [<f8826d55>] edac_mc_handle_ue+0x160/0x176 [edac_mc] [<c01204f2>] autoremove_wake_function+0xd/0x2d [<c011e7c6>] __wake_up_common+0x36/0x51 [<c011e80a>] __wake_up+0x29/0x3c [<c0122aec>] release_console_sem+0xa4/0xa9 [<f88480b4>] process_ue+0x22/0x27 [e7xxx_edac] [<f884825d>] e7xxx_process_error_info+0x81/0x88 [e7xxx_edac] [<f8848280>] e7xxx_check+0x1c/0x22 [e7xxx_edac] [<f8826dca>] check_mc_devices+0x24/0x2d [edac_mc] [<f8826ddc>] check_mc+0x9/0x1c4 [edac_mc] [<f8826fe0>] edac_kernel_thread+0x49/0x86 [edac_mc] [<f8826f97>] edac_kernel_thread+0x0/0x86 [edac_mc] [<c01041f5>] kernel_thread_helper+0x5/0xb Daniel, if you remove the EDAC the machine is stable? Did you checked if your RAM modules with memtest86? Just trying to make sure it's not hardware's fault. Yes, machine is rock stable without the EDAC modules. And RAM modules have been checked (one-by-one and together) with memtest86, no problems. BIOS is almost newest one, but as this is an "old" board, the BIOS version I have is prolly to be considered "mature". I didn't have the time yet to dismantle the box, as I'm in the middle of moving. Best regards, Daniel Sorry, the comment #22 is addressed to Daniel Chou. Didn't noticed there're two Daniels on this bug :) I tested RAM modules with memtest86 last Friday, and there were a mess of errors. The test passed in another box. I don't know whether RAM modules in the two boxes are different. I will go further next week and share more information. Nice to meet you, Daniel:) Daniel Chou, on the another box that memtest86 runs fine, you have the same errors from EDAC? No, the another box works without any noise from EDAC. I checked the RAM modules in both of the two boxes and they were of the same type. Today an engineer from vendor took all the RAMs away to do more test. I will post result once I get feedback. Thanks Daniel Chou, any news about the RAMs? Daniel Roesen, any news on your front? Sorry for replying so late. Engineers believed that there's something wrong with the RAMs, but they gave no details. There is no further information. Thanks Daniel Chou, so the EDAC behavior is the expected in your case, only printing messages while using the bad memory modules, right? I think so. My vendor said the parity-check chip on the NIC was bad for sure. The message output and panic problem were gone after changing another NIC. Thanks Created attachment 159283 [details]
lcpci -vvv output from Abit IC7-G
Same problem here with both FC6 and RHEL5 on Abit IC7-G (875P chipset, same as P4C800). 3GB of ECC RAM tests perfectly. Boleslaw, please attach here the messages you're getting and just in case, check if you got the latest BIOS version for that mainboard. Thanks Unfortunately my hardware has changed and I cannot test this anymore. However, I did have the latest BIOS. Also the messages were not as numerous as the OP's (only occasionally single messages). (In reply to comment #35) > Unfortunately my hardware has changed and I cannot test this anymore. However, I > did have the latest BIOS. Also the messages were not as numerous as the OP's > (only occasionally single messages). In my NEC Express5800 TM700 the problem is still present. The installation of ubuntu 7.04 is impossibile. After the boot the system starts to beep. The console writing are EDAC errors as usual. I have tried with rhel5 rescue disk, 2.6.18-8.1.8.el5xen and the edac problem is not present. Any possibility that xen kernel solves the problem? Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers This bug is open for a Fedora version that is no longer maintained and will not be fixed by Fedora. Therefore we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen thus bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. I have seen this again with RHEL 5.2, kernel: Linux gator 2.6.18-92.1.10.el5 #1 SMP Tue Aug 5 07:41:53 EDT 2008 i686 i686 i386 GNU/Linux The exact message was: EDAC MC0: UE page 0x2b8, offset 0x0, grain 4096, row 0, labels ":": i82875p UE It is not easily reproducible. Hardware is as described above, latest BIOS. Boleslaw, this can be a memory module problem. Can you run two or three full memtest86 on this machine to be sure? I ran 15 full passes of memtest86 with no errors. I am observing this under Fedora 9: EDAC MC0: UE page 0x3XXXX, offset 0x0, grain 128, row 1, labels ":": i82975x UE 1 1-s intervals on a ASUS P5W DH Deluxe Motherboard (Intel 975X, Intel ICH7R) which has 2GByte ECC RAM. But not at every boot! Sometimes there are no messages, sometimes the 1-s ticker appears. Memtest shows no problems with ECC enabled or disabled. A similar report for that motherboard on Debian: http://www.gossamer-threads.com/lists/linux/kernel/962382 Bug should be reopened I guess. OK, this is a bit of a coincidence, but in addition to IC7-G on which I reported above I also have P5W DH (as in comment #42) and I am also seeing the same error as David both with Fedora 8 (2.6.26.3-14.fc8.x86_64) and Fedora 9 2.6.26.3-29.fc9.x86_64. However it does not happen with 2.6.25.11-60.fc8.x86_64. Also at roughly 1 sec intervals (but speeds up and down occasionally). This is with 8GB ECC RAM, fully tested. I am attaching lspci -vvv. Created attachment 317287 [details]
lspci -vvv output from Asus P5W DH Deluxe
The workaround for P5W DH is to add a file to /etc/modprobe.d with this in it: blacklist i82975x_edac This prevents the EDAC module from loading. AFAICT, it is only needed for error reporting but if it's broken then it's no better than none. I got the idea from http://faq.aslab.com/index.php?sid=7662&lang=en&action=artikel&cat=93&id=154&artlang=en Reopening as the problem still persists in F9 and RHEL5 This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Problem persists in Fedora 11 (kernel-PAE-2.6.29.4-167.fc11.i686) when manually loading the i82875p_edac module. Not sure why it's not being loaded anymore, seems like autoloading for proper EDAC module was removed for F10 oder F11. Hello, I currently have RHEL5.4 WS running stable on an Asus P5WDG2-WS motherboard. Attempting to boot to a Fedora 12 Live CD generates a constant stream of EDAC errors such that the system becomes unresponsive. Rebooting back to RHEL5.4 and all is fine. The memory tests fine in memtest86 and is workstation class (Kingston ECC w/ Thermal Monitor). No memory errors are reported within RHEL 5.4. Regards, M. Marlowe It is possible that we are not seeing errors within RHEL5.4 because the kernel 2.6.18-164.11.1.el5PAE appears not to have a module for the i82975. Manually modprobing EDAC_MC provides no real information and I can not see any other mdac modules automatically loaded. If there is a hardware issue, I'd certainly like to know about it, although I suspect we are just seeing a spurious error resulting somehow by the brain deadness of the legacy asus motherboard here. I suspect to boot the fedora 12 live cd, I'll need to find the right option to pass to grub boot prompt to disable the edac module(s). This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |