Bug 97959
Summary: | (MCE HW?)Machine Check Exception on Dual Xeon with HT | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Paul Querna <chip> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED WONTFIX | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 9 | CC: | alan |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-09-30 15:41:11 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Paul Querna
2003-06-24 16:54:35 UTC
A machine check is raised by the CPU when the hardware decides something is not right. I need the rest of the numbers to do anything about it, but it almost always points to hardware problems. This overwrote a running 'top' program when it came. This is what I could read off the Screen: CPU 0: Machine Check Exception: 0000000000000004 CPU 3: Machine Check Exception: 0000000000000004 CPU 1: Machine Check Exception: 0000000000000004 Bank 0: be000000108081f<00> Kernel panic: Unable to continue Kernel panic: Unable to continue In interrupt handler - not syncing Paniced this time with a clear screen: CPU 2: Machine Check Exception: 0000000000000004 CPU 0: Machine Check Exception: 0000000000000004 Kernel panic: CPU context curruet Kernel panic: Unable to continue Using a Vanilla 2.4.21 Kernel that we built we are unable to cause a kernel panic after 14 hours of stressing. Compared to the Redhat Kernel were we have panics normaly within minutes of starting our servers. This is only Anecdotal evidence to point to a problem in the Redhat 9 kernel. It is hard to blame it on the kernel with a seemingly random "hardware" problem, but the fact is using a vanilla 2.4.21 kernel we are unable to cause these panics; This sugggests that there is a problem with some of the patches Redhat applies to their kernel. Bank 0: be000000108081f [VALID][UNCORRECTABLE][ENABLED] Bus/Interconnect error Local processor originated request Request did not time out Generic Read Other transaction Generic encoding for level There are basically only two ways that can occur - one is a CPU errata the other is a fault on the system. The PIV has only one errata I can find that can cause spurious MCE and that involves executing code in slow memory (such as PCI bus) while hyperthreading. Not something Red Hat Linux (or any other OS) actually does. Same to me, running GNU/Linux 2.6.8.1 with MCE enabled on a IBM
XSeries 235. After a day or so the system ist halted with the message:
>> CPU 0: Machine Check Exception: 0000000000000004
>> CPU 1: Machine Check Exception: 0000000000000004
>> CPU 2: Machine Check Exception: 0000000000000004
>> CPU 3: Machine Check Exception: 0000000000000004
>> System halted
/proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.80GHz
stepping : 9
cpu MHz : 2795.625
cache size : 512 KB
physical id : 0
siblings : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips : 5505.02
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.80GHz
stepping : 9
cpu MHz : 2795.625
cache size : 512 KB
physical id : 0
siblings : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips : 5570.56
processor : 2
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.80GHz
stepping : 9
cpu MHz : 2795.625
cache size : 512 KB
physical id : 3
siblings : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips : 5570.56
processor : 3
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.80GHz
stepping : 9
cpu MHz : 2795.625
cache size : 512 KB
physical id : 3
siblings : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips : 5570.56
lspci -vx:
0000:00:00.0 Host bridge: ServerWorks CMIC-LE Host Bridge (GC-LE
chipset) (rev 33)
Flags: fast devsel
00: 66 11 14 00 00 00 00 00 33 00 00 06 10 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000:00:00.1 Host bridge: ServerWorks CMIC-LE Host Bridge (GC-LE chipset)
Flags: fast devsel
00: 66 11 14 00 00 00 00 00 00 00 00 06 10 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000:00:00.2 Host bridge: ServerWorks CMIC-LE Host Bridge (GC-LE chipset)
Flags: fast devsel
00: 66 11 14 00 00 00 00 00 00 00 00 06 10 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000:00:09.0 VGA compatible controller: ATI Technologies Inc Rage XL
(rev 27) (prog-if 00 [VGA])
Subsystem: IBM: Unknown device 0240
Flags: bus master, stepping, medium devsel, latency 64, IRQ 153
Memory at fd000000 (32-bit, non-prefetchable)
I/O ports at 2200 [size=256]
Memory at febff000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [5c] Power Management version 2
00: 02 10 52 47 87 00 90 02 27 00 00 03 08 40 00 00
10: 00 00 00 fd 01 22 00 00 00 f0 bf fe 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 40 02
30: 00 00 00 00 5c 00 00 00 00 00 00 00 0a 01 08 00
0000:00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93)
Subsystem: ServerWorks CSB5 South Bridge
Flags: bus master, medium devsel, latency 64
00: 66 11 01 02 47 01 00 22 93 00 00 06 00 40 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 01 02
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000:00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93)
(prog-if 82 [Master PriP])
Subsystem: ServerWorks CSB5 IDE Controller
Flags: bus master, medium devsel, latency 64
I/O ports at <ignored>
I/O ports at <ignored>
I/O ports at <ignored>
I/O ports at <ignored>
I/O ports at 0700 [size=16]
00: 66 11 12 02 55 01 00 02 93 82 01 01 08 40 80 00
10: f1 01 00 00 f5 03 00 00 71 01 00 00 75 03 00 00
20: 01 07 00 00 00 00 00 00 00 00 00 00 66 11 12 02
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000:00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller
(rev 05) (prog-if 10 [OHCI])
Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller
Flags: bus master, medium devsel, latency 64, IRQ 97
Memory at febfe000 (32-bit, non-prefetchable)
00: 66 11 20 02 57 01 80 02 05 10 03 0c 08 40 80 00
10: 00 e0 bf fe 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 20 02
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 01 00 50
0000:00:0f.3 ISA bridge: ServerWorks CSB5 LPC bridge
Subsystem: ServerWorks: Unknown device 0230
Flags: bus master, medium devsel, latency 0
00: 66 11 25 02 44 01 00 02 00 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 30 02
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000:00:10.0 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05)
Flags: 66Mhz, medium devsel
Capabilities: [60] 00: 66 11 01 01 42 01 30 22 05 00 00 06 00 40 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00
0000:00:10.2 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05)
Flags: 66Mhz, medium devsel
Capabilities: [60] 00: 66 11 01 01 42 01 30 22 05 00 00 06 00 40 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00
0000:00:11.0 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05)
Flags: 66Mhz, medium devsel
Capabilities: [60] 00: 66 11 01 01 42 01 b0 22 05 00 00 06 00 40 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00
0000:00:11.2 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05)
Flags: 66Mhz, medium devsel
Capabilities: [60] 00: 66 11 01 01 42 01 b0 22 05 00 00 06 00 40 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00
0000:02:08.0 Ethernet controller: Broadcom Corporation NetXtreme
BCM5703X Gigabit Ethernet (rev 02)
Subsystem: IBM: Unknown device 026f
Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 169
Memory at fbff0000 (64-bit, non-prefetchable)
Capabilities: [40] PCI-X non-bridge device.
Capabilities: [48] Power Management version 2
Capabilities: [50] Vital Product Data
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-
00: e4 14 a7 16 46 01 b0 02 02 00 00 02 08 40 00 00
10: 04 00 ff fb 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 6f 02
30: 00 00 00 00 40 00 00 00 00 00 00 00 03 01 40 00
0000:05:07.0 SCSI storage controller: LSI Logic / Symbios Logic
53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
Subsystem: IBM: Unknown device 026c
Flags: bus master, 66Mhz, medium devsel, latency 72, IRQ 97
I/O ports at 2300
Memory at f9ff0000 (64-bit, non-prefetchable) [size=64K]
Memory at f9fe0000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [50] Power Management version 2
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Capabilities: [68] 00: 00 10 30 00 57 01 30 02 07 00 00 01 08 48 80 00
10: 01 23 00 00 04 00 ff f9 00 00 00 00 04 00 fe f9
20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 6c 02
30: 00 00 00 00 50 00 00 00 00 00 00 00 09 01 11 12
0000:05:07.1 SCSI storage controller: LSI Logic / Symbios Logic
53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
Subsystem: IBM: Unknown device 026c
Flags: bus master, 66Mhz, medium devsel, latency 72, IRQ 161
I/O ports at 2400
Memory at f9fd0000 (64-bit, non-prefetchable) [size=64K]
Memory at f9fc0000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [50] Power Management version 2
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Capabilities: [68] 00: 00 10 30 00 57 01 30 02 07 00 00 01 08 48 80 00
10: 01 24 00 00 04 00 fd f9 00 00 00 00 04 00 fc f9
20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 6c 02
30: 00 00 00 00 50 00 00 00 00 00 00 00 09 02 11 12
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |