Bug 97959 - (MCE HW?)Machine Check Exception on Dual Xeon with HT
Summary: (MCE HW?)Machine Check Exception on Dual Xeon with HT
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 9
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-06-24 16:54 UTC by Paul Querna
Modified: 2007-04-18 16:55 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-09-30 15:41:11 UTC
Embargoed:


Attachments (Terms of Use)

Description Paul Querna 2003-06-24 16:54:35 UTC
Description of problem:
When Running Under Moderate Load, a Dual Xeon 2.8 Ghz Server has a Machine Check
Exception.  It is a Dell new Dell PowerEdge 1600SC.

Machine Check Exception 0000000000000004

The Machine has HyperThreading enabled.




Version-Release number of selected component (if applicable):
kernel-smp-2.4.20-18.9

How reproducible:
Always

Steps to Reproduce:
We are able to duplicate this somtimes within minutes of starting our servers,
but other times it will take an hour or more to happen.  Our servers do create
about 300 Processes, each with 4 threads.(About 1200 Threads from our servers,
plus whatever the system is already running)

Actual Results:  A Machine Check Exception.

Expected Results:  No Machine Check Exception.

Additional info:

The Dell Memory Diagnostics say the RAM is good.  
Memtest86 version 2.9 also say the RAM is good.

Here is a lspci -v -x:

00:00.0 Host bridge: ServerWorks GCNB-LE Host Bridge (rev 32)
	Flags: fast devsel
00: 66 11 17 00 00 00 00 00 32 00 00 06 10 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:00.1 Host bridge: ServerWorks GCNB-LE Host Bridge
	Flags: fast devsel
00: 66 11 17 00 00 00 00 00 00 00 00 06 10 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:02.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller
(rev 02)
	Subsystem: Dell Computer Corporation: Unknown device 0135
	Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 16
	Memory at fe100000 (32-bit, non-prefetchable) [size=128K]
	I/O ports at ecc0 [size=64]
	Capabilities: [dc] Power Management version 2
	Capabilities: [e4] PCI-X non-bridge device.
	Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
00: 86 80 0e 10 17 01 30 02 02 00 00 02 10 20 00 00
10: 00 00 10 fe 00 00 00 00 c1 ec 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 35 01
30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 ff 00

00:0e.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
(prog-if 00 [VGA])
	Subsystem: Dell Computer Corporation: Unknown device 0135
	Flags: bus master, VGA palette snoop, stepping, medium devsel, latency 32
	Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
	I/O ports at e800 [size=256]
	Memory at fe121000 (32-bit, non-prefetchable) [size=4K]
	Expansion ROM at <unassigned> [disabled] [size=128K]
	Capabilities: [5c] Power Management version 2
00: 02 10 52 47 a7 00 90 02 27 00 00 03 10 20 00 00
10: 00 00 00 fd 01 e8 00 00 00 10 12 fe 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 35 01
30: 00 00 00 00 5c 00 00 00 00 00 00 00 ff 00 08 00

00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93)
	Subsystem: ServerWorks CSB5 South Bridge
	Flags: bus master, medium devsel, latency 32
00: 66 11 01 02 47 01 00 22 93 00 00 06 00 20 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 01 02
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) (prog-if 8a
[Master SecP PriP])
	Subsystem: Dell Computer Corporation: Unknown device 4135
	Flags: bus master, medium devsel, latency 64
	I/O ports at <ignored>
	I/O ports at <ignored>
	I/O ports at <ignored>
	I/O ports at <ignored>
	I/O ports at 08b0 [size=16]
00: 66 11 12 02 45 01 00 02 93 8a 01 01 08 40 80 00
10: f1 01 00 00 f5 03 00 00 71 01 00 00 75 03 00 00
20: b1 08 00 00 00 00 00 00 00 00 00 00 28 10 35 41
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05)
(prog-if 10 [OHCI])
	Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller
	Flags: bus master, medium devsel, latency 32, IRQ 10
	Memory at fe120000 (32-bit, non-prefetchable) [size=4K]
00: 66 11 20 02 57 01 80 82 05 10 03 0c 00 20 80 00
10: 00 00 12 fe 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 20 02
30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 50

00:0f.3 ISA bridge: ServerWorks GCLE Host Bridge
	Subsystem: ServerWorks: Unknown device 0230
	Flags: bus master, medium devsel, latency 0
00: 66 11 25 02 44 01 00 02 00 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 30 02
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 05)
	Flags: 66Mhz, medium devsel
	Capabilities: [60] PCI-X non-bridge device.
00: 66 11 01 01 42 01 b0 22 05 00 00 06 00 20 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00

00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 05)
	Flags: 66Mhz, medium devsel
	Capabilities: [60] PCI-X non-bridge device.
00: 66 11 01 01 42 01 30 22 05 00 00 06 00 20 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00

01:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)
	Subsystem: Dell Computer Corporation: Unknown device 0135
	Flags: bus master, 66Mhz, medium devsel, latency 72, IRQ 29
	I/O ports at dc00 [size=256]
	Memory at fcf10000 (64-bit, non-prefetchable) [size=64K]
	Memory at fcf00000 (64-bit, non-prefetchable) [size=64K]
	Expansion ROM at fce00000 [disabled] [size=1M]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
00: 00 10 30 00 17 01 30 02 07 00 00 01 10 48 00 00
10: 01 dc 00 00 04 00 f1 fc 00 00 00 00 04 00 f0 fc
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 35 01
30: 00 00 e0 fc 50 00 00 00 00 00 00 00 05 01 11 12

Comment 1 Alan Cox 2003-06-24 21:21:21 UTC
A machine check is raised by the CPU when the hardware decides something is not
right. I need the rest of the numbers to do anything about it, but it almost
always points to hardware problems.

Comment 2 Paul Querna 2003-06-25 00:08:21 UTC
This overwrote a running 'top' program when it came.  This is what I could read
off the Screen:

CPU 0: Machine Check Exception: 0000000000000004
CPU 3: Machine Check Exception: 0000000000000004
CPU 1: Machine Check Exception: 0000000000000004
Bank 0: be000000108081f<00>
Kernel panic: Unable to continue
Kernel panic: Unable to continue
In interrupt handler - not syncing



Comment 3 Paul Querna 2003-06-25 00:35:30 UTC
Paniced this time with a clear screen:

CPU 2: Machine Check Exception: 0000000000000004
CPU 0: Machine Check Exception: 0000000000000004
Kernel panic: CPU context curruet
Kernel panic: Unable to continue


Comment 4 Paul Querna 2003-06-25 17:33:18 UTC
Using a Vanilla 2.4.21 Kernel that we built we are unable to cause a kernel
panic after 14 hours of stressing.  Compared to the Redhat Kernel were we have
panics normaly within minutes of starting our servers. 

This is only Anecdotal evidence to point to a problem in the Redhat 9 kernel.  

It is hard to blame it on the kernel with a seemingly random "hardware" problem,
but the fact is using a vanilla 2.4.21 kernel we are unable to cause these
panics; This sugggests that there is a problem with some of the patches Redhat
applies to their kernel.

Comment 5 Alan Cox 2003-06-25 22:28:32 UTC
Bank 0: be000000108081f

[VALID][UNCORRECTABLE][ENABLED]

Bus/Interconnect error
  Local processor originated request
  Request did not time out
  Generic Read
  Other transaction
  Generic encoding for level

There are basically only two ways that can occur - one is a CPU errata the other
is a fault on the system. The PIV has only one errata I can find that can cause
spurious MCE and that involves executing code in slow memory (such as PCI bus)
while hyperthreading. Not something Red Hat Linux (or any other OS) actually does.



Comment 6 Sascha Willuweit 2004-08-30 08:10:59 UTC
Same to me, running GNU/Linux 2.6.8.1 with MCE enabled on a IBM
XSeries 235. After a day or so the system ist halted with the message:
>> CPU 0: Machine Check Exception: 0000000000000004
>> CPU 1: Machine Check Exception: 0000000000000004
>> CPU 2: Machine Check Exception: 0000000000000004
>> CPU 3: Machine Check Exception: 0000000000000004
>> System halted


/proc/cpuinfo:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Intel(R) Xeon(TM) CPU 2.80GHz
stepping	: 9
cpu MHz		: 2795.625
cache size	: 512 KB
physical id	: 0
siblings	: 2
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips	: 5505.02

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Intel(R) Xeon(TM) CPU 2.80GHz
stepping	: 9
cpu MHz		: 2795.625
cache size	: 512 KB
physical id	: 0
siblings	: 2
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips	: 5570.56

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Intel(R) Xeon(TM) CPU 2.80GHz
stepping	: 9
cpu MHz		: 2795.625
cache size	: 512 KB
physical id	: 3
siblings	: 2
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips	: 5570.56

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Intel(R) Xeon(TM) CPU 2.80GHz
stepping	: 9
cpu MHz		: 2795.625
cache size	: 512 KB
physical id	: 3
siblings	: 2
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips	: 5570.56

lspci -vx:
0000:00:00.0 Host bridge: ServerWorks CMIC-LE Host Bridge (GC-LE
chipset) (rev 33)
	Flags: fast devsel
00: 66 11 14 00 00 00 00 00 33 00 00 06 10 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000:00:00.1 Host bridge: ServerWorks CMIC-LE Host Bridge (GC-LE chipset)
	Flags: fast devsel
00: 66 11 14 00 00 00 00 00 00 00 00 06 10 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000:00:00.2 Host bridge: ServerWorks CMIC-LE Host Bridge (GC-LE chipset)
	Flags: fast devsel
00: 66 11 14 00 00 00 00 00 00 00 00 06 10 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000:00:09.0 VGA compatible controller: ATI Technologies Inc Rage XL
(rev 27) (prog-if 00 [VGA])
	Subsystem: IBM: Unknown device 0240
	Flags: bus master, stepping, medium devsel, latency 64, IRQ 153
	Memory at fd000000 (32-bit, non-prefetchable)
	I/O ports at 2200 [size=256]
	Memory at febff000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: [5c] Power Management version 2
00: 02 10 52 47 87 00 90 02 27 00 00 03 08 40 00 00
10: 00 00 00 fd 01 22 00 00 00 f0 bf fe 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 40 02
30: 00 00 00 00 5c 00 00 00 00 00 00 00 0a 01 08 00

0000:00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93)
	Subsystem: ServerWorks CSB5 South Bridge
	Flags: bus master, medium devsel, latency 64
00: 66 11 01 02 47 01 00 22 93 00 00 06 00 40 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 01 02
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000:00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93)
(prog-if 82 [Master PriP])
	Subsystem: ServerWorks CSB5 IDE Controller
	Flags: bus master, medium devsel, latency 64
	I/O ports at <ignored>
	I/O ports at <ignored>
	I/O ports at <ignored>
	I/O ports at <ignored>
	I/O ports at 0700 [size=16]
00: 66 11 12 02 55 01 00 02 93 82 01 01 08 40 80 00
10: f1 01 00 00 f5 03 00 00 71 01 00 00 75 03 00 00
20: 01 07 00 00 00 00 00 00 00 00 00 00 66 11 12 02
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000:00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller
(rev 05) (prog-if 10 [OHCI])
	Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller
	Flags: bus master, medium devsel, latency 64, IRQ 97
	Memory at febfe000 (32-bit, non-prefetchable)
00: 66 11 20 02 57 01 80 02 05 10 03 0c 08 40 80 00
10: 00 e0 bf fe 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 20 02
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 01 00 50

0000:00:0f.3 ISA bridge: ServerWorks CSB5 LPC bridge
	Subsystem: ServerWorks: Unknown device 0230
	Flags: bus master, medium devsel, latency 0
00: 66 11 25 02 44 01 00 02 00 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 30 02
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000:00:10.0 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05)
	Flags: 66Mhz, medium devsel
	Capabilities: [60] 00: 66 11 01 01 42 01 30 22 05 00 00 06 00 40 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00

0000:00:10.2 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05)
	Flags: 66Mhz, medium devsel
	Capabilities: [60] 00: 66 11 01 01 42 01 30 22 05 00 00 06 00 40 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00

0000:00:11.0 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05)
	Flags: 66Mhz, medium devsel
	Capabilities: [60] 00: 66 11 01 01 42 01 b0 22 05 00 00 06 00 40 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00

0000:00:11.2 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05)
	Flags: 66Mhz, medium devsel
	Capabilities: [60] 00: 66 11 01 01 42 01 b0 22 05 00 00 06 00 40 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00

0000:02:08.0 Ethernet controller: Broadcom Corporation NetXtreme
BCM5703X Gigabit Ethernet (rev 02)
	Subsystem: IBM: Unknown device 026f
	Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 169
	Memory at fbff0000 (64-bit, non-prefetchable)
	Capabilities: [40] PCI-X non-bridge device.
	Capabilities: [48] Power Management version 2
	Capabilities: [50] Vital Product Data
	Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-
00: e4 14 a7 16 46 01 b0 02 02 00 00 02 08 40 00 00
10: 04 00 ff fb 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 6f 02
30: 00 00 00 00 40 00 00 00 00 00 00 00 03 01 40 00

0000:05:07.0 SCSI storage controller: LSI Logic / Symbios Logic
53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
	Subsystem: IBM: Unknown device 026c
	Flags: bus master, 66Mhz, medium devsel, latency 72, IRQ 97
	I/O ports at 2300
	Memory at f9ff0000 (64-bit, non-prefetchable) [size=64K]
	Memory at f9fe0000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
	Capabilities: [68] 00: 00 10 30 00 57 01 30 02 07 00 00 01 08 48 80 00
10: 01 23 00 00 04 00 ff f9 00 00 00 00 04 00 fe f9
20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 6c 02
30: 00 00 00 00 50 00 00 00 00 00 00 00 09 01 11 12

0000:05:07.1 SCSI storage controller: LSI Logic / Symbios Logic
53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
	Subsystem: IBM: Unknown device 026c
	Flags: bus master, 66Mhz, medium devsel, latency 72, IRQ 161
	I/O ports at 2400
	Memory at f9fd0000 (64-bit, non-prefetchable) [size=64K]
	Memory at f9fc0000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
	Capabilities: [68] 00: 00 10 30 00 57 01 30 02 07 00 00 01 08 48 80 00
10: 01 24 00 00 04 00 fd f9 00 00 00 00 04 00 fc f9
20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 6c 02
30: 00 00 00 00 50 00 00 00 00 00 00 00 09 02 11 12

Comment 7 Bugzilla owner 2004-09-30 15:41:11 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/



Note You need to log in before you can comment on or make changes to this bug.