Bug 45612
Summary: | kernel-2.4.3-12 does not boot on AST P5/90 | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Ed McKenzie <eem12> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED NOTABUG | QA Contact: | Brock Organ <borgan> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.1 | CC: | alan |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2001-06-24 22:00:29 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ed McKenzie
2001-06-23 21:32:53 UTC
The error message was ... CPU#0: Machine Check Exception: 0x 235EEC (type 0x D). Correcting my earlier report, the kernel prints about half to two-thirds of a screen of messages before blowing up (I assume the above fault could happen at any point, and knowing what the kernel is trying to init would be useful? It scrolls by way too fast to see.) Machine Check Exception is a hardwarefault... By 'hardwarefault' do you mean broken hardware or a triple fault in software? This machine has never to my knowledge failed to boot any other linux kernel in this way. A machine check exception is raised when the board or processor decides something bad might be happening. In some cases the machine will run fine because the threshold on the check seems to be tighter than an actual crash (which I guess make ssense). This can also include overheat/fan faults but in your case the fault is Arjan - I am curious why it then blew up rather than continuing. I would be very interested to know what a kernel patched in arch/i386/kernel/entry.S ENTRY(machine_check) pushl $0 (remove the pushl $0) does on this problem box. Removing the push instruction causes the kernel to panic with a traceback that happily fills the screen. "Aiee, killing interrupt handler!" seem to be the favorite dying words for this particular kernel. With 2.4.3-12 pristine, an actual panic occurs only occasionally. I compiled with gcc 2.96-85. Ok, so the fault not recovering is real. I suspect your CPU is absolutely borderline if it wasnt showing faults before the machine check. You can certainly disable the check (mcheck_init in arch/i386/kernel/bluesmoke.c) but I am not sure that is wise given that it is an integrity check for the system So, to recap and resolve this report, a CPU machine check was added sometime circa 2.4.3, and it may crash on borderline machines that formerly booted anyway, correct? I also notice there's no explicit check for the AMD K6 in that code. Are such processors treated as Pentium-compatible, or do they not support MCE at all? The arch/i386/kernel/setup.c code only calls the mcheck init for processors it knows about. Currently that is: AMD Athlon/Duron (basically intel compatible) VIA/Cyrix MIII/ VIA C3 (limited features) Intel (pentium or higher) Winchip/WinchipII/WinchipIII (limited features) The older Cyrix processors and the K5/K6 apparently don't have the functionality Hello list; I recently picked up an IBM Thinkpad 755CX, this is an older model (1995). Pentium-75, 40M ram 3.2G. I also have a "Dock II" docking station. This has built-in scsi controller, I added a 2G fireball drive and a NEC 24X scsi cdrom. I ran the on-board diags - all fine, and the floppy based diags - all fine. Ok - so it works fine under DOS and W95, I installed W98SE to test it -ok, then formatted back to DOS with cd support. When I try to install RH7.1 (or Roswell-1) it fails. I tried both CDROM (via autoboot from dos), and a boot disk. It gets to the point of "running /sbin/loader", then dies: with a black screen "CPU#0: Machine Check Exception: 0x 1234 (type 0 xD)." this scrolls forever and you have to power-off. The dock-II has an adaptec AIC-6360 in it - what is the correct line to use it? I have tried various combinations of "linux dd text aha152x=0x340,11,7,1" and "linux dd text aic6x60="0x340.11,7,1". None of which worked. It uses the "Adaptec 620/6360/6370" driver for dos. Both the 7.1 and Roswell CD sets are fine. ( it does same thing for RH 7.0, and RH 6.0) I searched for info on the web and found out how to set up the MWAVE and power management, but could not find the install info. NOTE ************ I Tried Toms rootdisk and it worked fine!!!!! ************** Toms rootdisk (1.7.218) output of dmesg <snip> Intel Pentium with F0 0F bug - workaround enabled. alias mapping IDT readonly ... ... done Linux version 2.0.37 (root@6M) (gcc version 2.7.2.3) #13 Fri Oct 15 <snip> scsi : 0 hosts. scsi : detected total. <snip> aha152x: BIOS test: passed, auto configuration: ok, detected 1 controller(s) aha152x0: vital data: PORTBASE=0x340, IRQ=11, SCSI ID=7, reconnect=enabled, parity=enabled, synchronous=disabled, delay=100, extended translation=disabled aha152x: trying software interrupt, ok. scsi0 : Adaptec 152x SCSI driver; $Revision: 1.18 $ scsi : 1 host. Vendor: QUANTUM Model: FIREBALL_TM2110S Rev: 300N Type: Direct-Access ANSI SCSI revision: 02 Detected scsi disk sda at scsi0, channel 0, id 0, lun 0 Vendor: NEC Model: CD-ROM DRIVE:464 Rev: 1.04 Type: CD-ROM ANSI SCSI revision: 02 Detected scsi CD-ROM sr0 at scsi0, channel 0, id 6, lun 0 SCSI device sda: hdwr sector= 512 bytes. Sectors= 4124736 [2014 MB] [2.0 GB] sda: sda1 scsi : 1 host. -------------------------------- contents of /etc/mtab <snip> /dev/sda1 /mnt/vfat vfat rw 0 0 /dev/sr0 /mnt/cdrom iso9660 ro 0 0 ------------------------------ Tom's found the controller, hard drive and cdrom. I was able to mount and move files around. ( iso9660 and vfat ). So what do I have to do to get RH7+ onto this thing? any ideas? Also does anyone know what the video chipset is - W98 just say 'Digital' Its a SVGA 1M 800x600x16bit TFT Thank you Mick Chris Cloiber suggested 'linux nomce' I will try it later. This bug is obsoleted by bug 55097 and the errata 2.4.9-13 kernel. |