Description of problem: When Running Under Moderate Load, a Dual Xeon 2.8 Ghz Server has a Machine Check Exception. It is a Dell new Dell PowerEdge 1600SC. Machine Check Exception 0000000000000004 The Machine has HyperThreading enabled. Version-Release number of selected component (if applicable): kernel-smp-2.4.20-18.9 How reproducible: Always Steps to Reproduce: We are able to duplicate this somtimes within minutes of starting our servers, but other times it will take an hour or more to happen. Our servers do create about 300 Processes, each with 4 threads.(About 1200 Threads from our servers, plus whatever the system is already running) Actual Results: A Machine Check Exception. Expected Results: No Machine Check Exception. Additional info: The Dell Memory Diagnostics say the RAM is good. Memtest86 version 2.9 also say the RAM is good. Here is a lspci -v -x: 00:00.0 Host bridge: ServerWorks GCNB-LE Host Bridge (rev 32) Flags: fast devsel 00: 66 11 17 00 00 00 00 00 32 00 00 06 10 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:00.1 Host bridge: ServerWorks GCNB-LE Host Bridge Flags: fast devsel 00: 66 11 17 00 00 00 00 00 00 00 00 06 10 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:02.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02) Subsystem: Dell Computer Corporation: Unknown device 0135 Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 16 Memory at fe100000 (32-bit, non-prefetchable) [size=128K] I/O ports at ecc0 [size=64] Capabilities: [dc] Power Management version 2 Capabilities: [e4] PCI-X non-bridge device. Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- 00: 86 80 0e 10 17 01 30 02 02 00 00 02 10 20 00 00 10: 00 00 10 fe 00 00 00 00 c1 ec 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 35 01 30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 ff 00 00:0e.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA]) Subsystem: Dell Computer Corporation: Unknown device 0135 Flags: bus master, VGA palette snoop, stepping, medium devsel, latency 32 Memory at fd000000 (32-bit, non-prefetchable) [size=16M] I/O ports at e800 [size=256] Memory at fe121000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at <unassigned> [disabled] [size=128K] Capabilities: [5c] Power Management version 2 00: 02 10 52 47 a7 00 90 02 27 00 00 03 10 20 00 00 10: 00 00 00 fd 01 e8 00 00 00 10 12 fe 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 35 01 30: 00 00 00 00 5c 00 00 00 00 00 00 00 ff 00 08 00 00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93) Subsystem: ServerWorks CSB5 South Bridge Flags: bus master, medium devsel, latency 32 00: 66 11 01 02 47 01 00 22 93 00 00 06 00 20 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 01 02 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) (prog-if 8a [Master SecP PriP]) Subsystem: Dell Computer Corporation: Unknown device 4135 Flags: bus master, medium devsel, latency 64 I/O ports at <ignored> I/O ports at <ignored> I/O ports at <ignored> I/O ports at <ignored> I/O ports at 08b0 [size=16] 00: 66 11 12 02 45 01 00 02 93 8a 01 01 08 40 80 00 10: f1 01 00 00 f5 03 00 00 71 01 00 00 75 03 00 00 20: b1 08 00 00 00 00 00 00 00 00 00 00 28 10 35 41 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05) (prog-if 10 [OHCI]) Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller Flags: bus master, medium devsel, latency 32, IRQ 10 Memory at fe120000 (32-bit, non-prefetchable) [size=4K] 00: 66 11 20 02 57 01 80 82 05 10 03 0c 00 20 80 00 10: 00 00 12 fe 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 20 02 30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 50 00:0f.3 ISA bridge: ServerWorks GCLE Host Bridge Subsystem: ServerWorks: Unknown device 0230 Flags: bus master, medium devsel, latency 0 00: 66 11 25 02 44 01 00 02 00 00 01 06 00 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 30 02 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 05) Flags: 66Mhz, medium devsel Capabilities: [60] PCI-X non-bridge device. 00: 66 11 01 01 42 01 b0 22 05 00 00 06 00 20 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00 00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 05) Flags: 66Mhz, medium devsel Capabilities: [60] PCI-X non-bridge device. 00: 66 11 01 01 42 01 30 22 05 00 00 06 00 20 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00 01:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07) Subsystem: Dell Computer Corporation: Unknown device 0135 Flags: bus master, 66Mhz, medium devsel, latency 72, IRQ 29 I/O ports at dc00 [size=256] Memory at fcf10000 (64-bit, non-prefetchable) [size=64K] Memory at fcf00000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at fce00000 [disabled] [size=1M] Capabilities: [50] Power Management version 2 Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- 00: 00 10 30 00 17 01 30 02 07 00 00 01 10 48 00 00 10: 01 dc 00 00 04 00 f1 fc 00 00 00 00 04 00 f0 fc 20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 35 01 30: 00 00 e0 fc 50 00 00 00 00 00 00 00 05 01 11 12
A machine check is raised by the CPU when the hardware decides something is not right. I need the rest of the numbers to do anything about it, but it almost always points to hardware problems.
This overwrote a running 'top' program when it came. This is what I could read off the Screen: CPU 0: Machine Check Exception: 0000000000000004 CPU 3: Machine Check Exception: 0000000000000004 CPU 1: Machine Check Exception: 0000000000000004 Bank 0: be000000108081f<00> Kernel panic: Unable to continue Kernel panic: Unable to continue In interrupt handler - not syncing
Paniced this time with a clear screen: CPU 2: Machine Check Exception: 0000000000000004 CPU 0: Machine Check Exception: 0000000000000004 Kernel panic: CPU context curruet Kernel panic: Unable to continue
Using a Vanilla 2.4.21 Kernel that we built we are unable to cause a kernel panic after 14 hours of stressing. Compared to the Redhat Kernel were we have panics normaly within minutes of starting our servers. This is only Anecdotal evidence to point to a problem in the Redhat 9 kernel. It is hard to blame it on the kernel with a seemingly random "hardware" problem, but the fact is using a vanilla 2.4.21 kernel we are unable to cause these panics; This sugggests that there is a problem with some of the patches Redhat applies to their kernel.
Bank 0: be000000108081f [VALID][UNCORRECTABLE][ENABLED] Bus/Interconnect error Local processor originated request Request did not time out Generic Read Other transaction Generic encoding for level There are basically only two ways that can occur - one is a CPU errata the other is a fault on the system. The PIV has only one errata I can find that can cause spurious MCE and that involves executing code in slow memory (such as PCI bus) while hyperthreading. Not something Red Hat Linux (or any other OS) actually does.
Same to me, running GNU/Linux 2.6.8.1 with MCE enabled on a IBM XSeries 235. After a day or so the system ist halted with the message: >> CPU 0: Machine Check Exception: 0000000000000004 >> CPU 1: Machine Check Exception: 0000000000000004 >> CPU 2: Machine Check Exception: 0000000000000004 >> CPU 3: Machine Check Exception: 0000000000000004 >> System halted /proc/cpuinfo: processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 9 cpu MHz : 2795.625 cache size : 512 KB physical id : 0 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid bogomips : 5505.02 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 9 cpu MHz : 2795.625 cache size : 512 KB physical id : 0 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid bogomips : 5570.56 processor : 2 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 9 cpu MHz : 2795.625 cache size : 512 KB physical id : 3 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid bogomips : 5570.56 processor : 3 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 9 cpu MHz : 2795.625 cache size : 512 KB physical id : 3 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid bogomips : 5570.56 lspci -vx: 0000:00:00.0 Host bridge: ServerWorks CMIC-LE Host Bridge (GC-LE chipset) (rev 33) Flags: fast devsel 00: 66 11 14 00 00 00 00 00 33 00 00 06 10 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000:00:00.1 Host bridge: ServerWorks CMIC-LE Host Bridge (GC-LE chipset) Flags: fast devsel 00: 66 11 14 00 00 00 00 00 00 00 00 06 10 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000:00:00.2 Host bridge: ServerWorks CMIC-LE Host Bridge (GC-LE chipset) Flags: fast devsel 00: 66 11 14 00 00 00 00 00 00 00 00 06 10 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000:00:09.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA]) Subsystem: IBM: Unknown device 0240 Flags: bus master, stepping, medium devsel, latency 64, IRQ 153 Memory at fd000000 (32-bit, non-prefetchable) I/O ports at 2200 [size=256] Memory at febff000 (32-bit, non-prefetchable) [size=4K] Capabilities: [5c] Power Management version 2 00: 02 10 52 47 87 00 90 02 27 00 00 03 08 40 00 00 10: 00 00 00 fd 01 22 00 00 00 f0 bf fe 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 40 02 30: 00 00 00 00 5c 00 00 00 00 00 00 00 0a 01 08 00 0000:00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93) Subsystem: ServerWorks CSB5 South Bridge Flags: bus master, medium devsel, latency 64 00: 66 11 01 02 47 01 00 22 93 00 00 06 00 40 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 01 02 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000:00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) (prog-if 82 [Master PriP]) Subsystem: ServerWorks CSB5 IDE Controller Flags: bus master, medium devsel, latency 64 I/O ports at <ignored> I/O ports at <ignored> I/O ports at <ignored> I/O ports at <ignored> I/O ports at 0700 [size=16] 00: 66 11 12 02 55 01 00 02 93 82 01 01 08 40 80 00 10: f1 01 00 00 f5 03 00 00 71 01 00 00 75 03 00 00 20: 01 07 00 00 00 00 00 00 00 00 00 00 66 11 12 02 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000:00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05) (prog-if 10 [OHCI]) Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller Flags: bus master, medium devsel, latency 64, IRQ 97 Memory at febfe000 (32-bit, non-prefetchable) 00: 66 11 20 02 57 01 80 02 05 10 03 0c 08 40 80 00 10: 00 e0 bf fe 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 20 02 30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 01 00 50 0000:00:0f.3 ISA bridge: ServerWorks CSB5 LPC bridge Subsystem: ServerWorks: Unknown device 0230 Flags: bus master, medium devsel, latency 0 00: 66 11 25 02 44 01 00 02 00 00 01 06 00 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 30 02 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000:00:10.0 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05) Flags: 66Mhz, medium devsel Capabilities: [60] 00: 66 11 01 01 42 01 30 22 05 00 00 06 00 40 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00 0000:00:10.2 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05) Flags: 66Mhz, medium devsel Capabilities: [60] 00: 66 11 01 01 42 01 30 22 05 00 00 06 00 40 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00 0000:00:11.0 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05) Flags: 66Mhz, medium devsel Capabilities: [60] 00: 66 11 01 01 42 01 b0 22 05 00 00 06 00 40 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00 0000:00:11.2 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05) Flags: 66Mhz, medium devsel Capabilities: [60] 00: 66 11 01 01 42 01 b0 22 05 00 00 06 00 40 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00 0000:02:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit Ethernet (rev 02) Subsystem: IBM: Unknown device 026f Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 169 Memory at fbff0000 (64-bit, non-prefetchable) Capabilities: [40] PCI-X non-bridge device. Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- 00: e4 14 a7 16 46 01 b0 02 02 00 00 02 08 40 00 00 10: 04 00 ff fb 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 6f 02 30: 00 00 00 00 40 00 00 00 00 00 00 00 03 01 40 00 0000:05:07.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) Subsystem: IBM: Unknown device 026c Flags: bus master, 66Mhz, medium devsel, latency 72, IRQ 97 I/O ports at 2300 Memory at f9ff0000 (64-bit, non-prefetchable) [size=64K] Memory at f9fe0000 (64-bit, non-prefetchable) [size=64K] Capabilities: [50] Power Management version 2 Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [68] 00: 00 10 30 00 57 01 30 02 07 00 00 01 08 48 80 00 10: 01 23 00 00 04 00 ff f9 00 00 00 00 04 00 fe f9 20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 6c 02 30: 00 00 00 00 50 00 00 00 00 00 00 00 09 01 11 12 0000:05:07.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) Subsystem: IBM: Unknown device 026c Flags: bus master, 66Mhz, medium devsel, latency 72, IRQ 161 I/O ports at 2400 Memory at f9fd0000 (64-bit, non-prefetchable) [size=64K] Memory at f9fc0000 (64-bit, non-prefetchable) [size=64K] Capabilities: [50] Power Management version 2 Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [68] 00: 00 10 30 00 57 01 30 02 07 00 00 01 08 48 80 00 10: 01 24 00 00 04 00 fd f9 00 00 00 00 04 00 fc f9 20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 6c 02 30: 00 00 00 00 50 00 00 00 00 00 00 00 09 02 11 12
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/