Bug 31038

Summary: kernel does a 'machine check' on bootup, hangs
Product: [Retired] Red Hat Linux Reporter: Elliot Lee <sopwith>
Component: kernelAssignee: Phil Copeland <copeland>
Status: CLOSED WORKSFORME QA Contact: Brock Organ <borgan>
Severity: high Docs Contact:
Priority: medium    
Version: 7.3CC: jimhllrn
Target Milestone: ---   
Target Release: ---   
Hardware: alpha   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-04-08 00:12:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Some sort of boot log none

Description Elliot Lee 2001-03-08 00:44:20 UTC
During kernel startup, machine check happens and things hang.

This happens with 2.4 kernels only; the kernel in 7.0 works

This is a Digital Ultimate Workstation box; ostrich-deluxe.labs.redhat.com

Comment 1 Arjan van de Ven 2001-03-08 09:38:04 UTC
A Machine Check Exception (I assume that is what you mean) is the way for the
CPU to indicate that it is toast.

sounds like broken hardware to me.........

If this is not what you mean, please clarify what you mean.

Comment 2 Elliot Lee 2001-03-11 20:25:59 UTC
I do not know whether this is what you call a machine check exception. I do know
that a dump of a bunch of register-type things happens. Here's the last section
from the output:

IOD 1 Register Subpacket - Bridge Base Address fbe0000000
WHOAMI = 2fa
PCI_REV = 6002032
CAP_CTRL = 6480ff1
HAE_MEM = 0
HAE_IO = 0
INT_CTL = 3
INT_REG = 0
INT_MASK0 = fe0000
INT_MASK1 = 0
MC_ERR0 = e0000000
MC_ERR1 = e88ff
CAP_ERR = 0
PCI_ERR1 = 0
MDPA_STAT = 0
MDPA_SYN = 0
MDPB_STAT = 0
MDPB_SYN = 0


Comment 3 Jason Duerstock 2001-03-11 20:43:22 UTC
I have an Alphastation 4/233 (Avanti) running ARC firmware and MILO
2.0.35.  I only get a screen-wide dotted line, a pause, and then
the screen flickers and goes back to MILO.

Comment 4 Elliot Lee 2001-03-12 18:57:44 UTC
When I boot up using serial console (to try to get the full messages), I don't
get the machine check spewage. Instead, the last two lines on the screen are:

SMP: Total of 2 processors activated (1855.17 BogoMIPS)
  got res[8000:801f] for resource 0 of Creative Labs SB Live! EMU10000

The last line doesn't get logged to the serial console. This happens with
2.4.2-0.1.22smp.

With bryce's generic.img on serial console, which I think is 2.4.2-0.1.25smp,
the error is different:
<2>MCPCIA machine check: vector=0x670 pc=0xfffffc00008203f4 code=0x980001

and after repeating that for a while,
<1>Unable to handle kernel paging request at virtual address fffff8f980b15420

(this doesn't show up on serial console either, only the main screen).

Comment 5 Elliot Lee 2001-03-12 23:54:56 UTC
Created attachment 12502 [details]
Some sort of boot log

Comment 6 Elliot Lee 2001-03-13 21:27:38 UTC
2.4.2-ac20 with a very minimal configuration does same behaviour.

Comment 7 Elliot Lee 2001-03-13 22:43:04 UTC
(minimal config includes no SMP, so it does seem to happen with or without SMP)

When I use addr2line to find the code that is generating the mcpcia check, it
shows up as line 231 of include/asm-alpha/core_mcpcia.h - the next-to-last line
of the mcpcia_inb routine.

Comment 8 Elliot Lee 2001-03-14 19:49:58 UTC
It happens in the initialize_kbd function (drivers/char/pc_keyb.c), which does
the I/O that would call mcpcia_inb where the mcheck actually happens...

about to do initialize_kbd
MCPCIA machine check: vector=0x670 pc=0xfffffc00008d74a4 code=0x980001
machine check type: unknown

It might be useful to find out what the machine check code means... Anyone know
where to get that info?

Comment 9 Elliot Lee 2001-04-29 17:50:29 UTC
This appears to be fixed in the latest kernel (the one in wolverine-alpha2).
Other RAID problems happen, but that is probably not an Alpha bug.

Comment 10 jim halloran 2003-12-29 01:24:21 UTC
I have tried 3 installs of RH-9 and each time is successful, but on 
bootup it hangs at a line that says:  INIT: version 2.84 booting.

That is as far as it get.  I'm not a programmer, but if you have an 
idea of what might be wrong.

I have a Gateway computer w/Athlon 700MZ, 128MG ram and a Western 
Digital 75GB HD.  RH9 is the only software on the HD.l

jimhllrn