Bug 39191

Summary: hex dump during boot, and then system hangs
Product: [Retired] Red Hat Linux Reporter: Need Real Name <hobbs>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 7.1CC: alan, robatino
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-06-29 09:26:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
capture of boot messages via serial console
none
output of dmidecode none

Description Need Real Name 2001-05-05 06:16:43 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.61 [en] (Win98; U)

Description of problem:
After installation of 7.1, my e-machine e-tower 333cs dumps hex and hangs the system on each subsequent boot.


How reproducible:
Always

Steps to Reproduce:
1. Install Red Hat Linux 7.1 on (at least my) e-machines e-tower 333cs machine
2. Boot up the computer

	

Actual Results:  3. After getting the graphical lilo screen, after a couple second or so of normal-looking booting output, the computer dumps out 
a few pages of hex, and then just stops (it happens so fast and then the hex scrolls past the last line, so I do not know exactly where it got 
screwed or the last successful line it ran)


Expected Results:  successfully boot up linux

Additional info:


The computer has been running Red Hat Linux 6.2 for months with no problems.

I have reproduced the bug in the following ways:

   - Trying to do a straight *upgrade* from 6.2 to 7.1
   - Doing a straight workstation fresh intall to 7.1
   - Doing a custom install to 7.1

Also tried booting with the following commands (as suggested by Red Hat tech support):

    linux nodma
    linux nodma noapm
    linux nodma noapm nousb

Also went into rescue  boot mode and the file system at least looked normal.

Cyrix processor

Please note that I then reinstalled 6.2, and that loads and seems to be working fine.

Comment 1 Bill Nottingham 2001-05-07 13:48:25 UTC
If you could copy down the oops text (including all the magic numbers) it
could be helpful.

Assigning to kernel.

Comment 2 Andre Robatino 2001-06-24 14:32:12 UTC
  I have the same problem.  Originally this happened upon attempting an update,
then after doing a clean install.  I also have an etower 333cs with the only
modifications being a bigger HD, more memory, an ISA modem, and an ethernet
card.  A few extra details:  I managed to see the message "calibrating delay
loop" before it crashes, so it's shortly after that.  The same problem happens
whether booting from the HD or the boot floppy I made during installation, but
booting from the Seahat install CD in rescue mode is successful.  The same thing
happens with the original 2.4.2 kernel, the newest 2.4.3-12 kernel, and the
Mandrake 2.4.3 kernel from a month or so back.  The call trace numbers for both
the 2.4.2 and 2.4.3-12 kernels follow a dual arithmetic progression: for
example,

<fc70c14f> <fc70c14f> <fc78c14f> <fc78c14f> <fc80c14f> <fc80c14f> ...

where each number is repeated twice, and the difference between successive
numbers alternates between 80000 and 20000 (hex).  The only difference between
the two versions of the kernel is an overall additive constant, the differences
of 80000 and 20000 are the same.  There is more than one screenful of these
numbers so I can't tell how they start, but they repeat about once every 5
seconds.  The keyboard is completely disabled, so Ctrl-Alt-Del doesn't work, and
neither does the power-off button, the only way to turn the machine off is by
cutting power to it.  I salvaged the Seawolf installation by downgrading the
kernel to 2.2.19 together with a few packages that depended on 2.4.  But there
are minor problems on shutting down including a fail message for NFS lockd and
for halting at the very end, after the disks are unmounted.  These are probably
undocumented dependencies on the 2.4 kernel so should go away if the 2.4 kernel
is fixed.

Comment 3 Arjan van de Ven 2001-06-24 15:08:44 UTC
We just released a 2.4.3-12 updated kernel last friday.
Also, how much memory do you have and does this go away when you 
type "mem=xxxM" (with "xxx" being the amount of ram in megabyte minus 2 )


Comment 4 Andre Robatino 2001-06-25 02:11:56 UTC
  I have 256M of RAM.  I tried booting the new RH 2.4.3-12 kernel with the
"mem=254M" boot option.  No luck.  The behavior is subtly different - before, it
looked like it was in an infinite loop, with the same numbers appearing over and
over about every 5 seconds.  Now, it just shows the numbers and appears to
hang.  The numbers form the same pattern as before, i.e. each repeated twice,
and differences of 80000 and 20000 hex.
  If you have a general idea of what's wrong, I could experiment with different
boot options.  Any suggestions?

Comment 5 Arjan van de Ven 2001-06-25 07:03:12 UTC
"no_hlt" or "no-hlt" would be interesting as bootoption to try

Comment 6 Andre Robatino 2001-06-25 14:31:05 UTC
  I tried no-hlt.  No change.  By the way, this machine uses a VIA chipset and a
Cyrix M2 CPU.  The more detailed info that used to be at http://www.e4me.com
seems to be gone.  The maximum memory for this machine was originally advertised
as 256M.  Would that be a motherboard limitation or just the maximum you could
pop into the 2 memory slots given the size of DIMMs at that time?

Comment 7 Andre Robatino 2001-06-28 02:59:28 UTC
Created attachment 22006 [details]
capture of boot messages via serial console

Comment 8 Andre Robatino 2001-06-28 03:02:05 UTC
  I hope this capture file will help.  I also forgot to mention that even though
this machine is an i686, I also tried the i386 version of the newest kernel with
the same problem, so it's not those optimizations.

Comment 9 Alan Cox 2001-06-28 21:09:18 UTC
I'm running 2.4.x fine on a VIA chipset cyrix MII quite similar to yours. The
trace is a pretty clear jump to nowhere. 

You say "  I have the same problem.  Originally this happened upon attempting an
update. then after doing a clean install" 

Does that mean the installion kernel ran correctly ? and it was after the
install it died ?


Comment 10 Andre Robatino 2001-06-28 21:16:12 UTC
  First I upgraded from 7.0 to 7.1.  The upgrade itself went fine, but as soon
as I tried to boot the new kernel it crashed.  After a while I gave up and
decided to try a clean install of 7.1.  The same problem occurred - the install
itself goes fine, but the new kernel crashed when I tried to boot it.  This
happens if I boot either from the HD or the boot floppy containing the 2.4
kernel made during installation.  However, I can boot fine from the 7.1 install
CD in rescue mode.

Comment 11 Alan Cox 2001-06-28 21:24:18 UTC
Ok that is very important info.

The BOOT kernel is built for absolute maximal compatibility with anything the
world can throw at it and also with as few feature sets as possible to keep the
size down so it fits on the boot disk

The first obvious thing it lacks is APM. Can you try booting with the additional
option to disable APM. (I forget what it is unfortunately - Arjan ?)



Comment 12 Andre Robatino 2001-06-28 23:44:30 UTC
  By using the boot option "apm=off", it now works!  I upgraded the packages I
had downgraded previously due to their dependence on the 2.4 kernel, and
everything seems to work fine, except that when I halt the machine, not only
does it not power down automatically, but the power button doesn't work either,
so I have to cut power to the machine to turn it off.  I had to do the same
thing anyway while running the 2.2.19 kernel with Seawolf.  Is this normal
behavior with APM disabled?

Comment 13 Alan Cox 2001-06-29 09:02:29 UTC
That is the normal behaviour with APM disabled in many cases yes

Ok can you grab

ftp.linux.org.uk:/pub/linux/alan/DMI/dmiscan.c

compile it, run it as root and attach the output to the bug. That will let me
add the box to the various internal tables so we know APM is to be avoided on
it.

You might also btw look for BIOS updates, you may find a BIOS update fixes this
problem.


Comment 14 Andre Robatino 2001-06-29 09:26:18 UTC
Created attachment 22186 [details]
output of dmidecode

Comment 15 Alan Cox 2001-06-29 16:01:44 UTC
Thanks. I've added that entry to my codebase so that at some future point
kernels will automatically avoid APM on that box. Do let me know if you ever
find a BIOS upgrade exists amd if it cures it
 
Marked NOTABUG because it is a BIOS bug. I'll see about getting the block entry
in though.


Comment 16 Andre Robatino 2002-07-29 03:07:12 UTC
  I updated the BIOS from version 1.11 to 1.20 using the file E120.exe on the
eMachines Help Site (<http://www.e4all.info>), and then built a custom kernel,
editing out the part in dmi_scan.c that disables the APM for the Delhi3
motherboard.  No luck, the problem still exists.
  My machine is a dual boot RH 7.3/Win98.  Although there were APM problems with
Win98 when I bought the machine (3 years ago), after a year or so they went
away, and APM works fine now in Win98 (both versions 1.11 and 1.20 of the BIOS).
 Apparently Microsoft did a software workaround.  Maybe the BIOS uses an
outdated version of APM.  Is it possible to make a guess as to what MS did, and
is there any utility I could run in Win98 to get info on the state of the APM? 
Device Manager indicates that it is normal.

Comment 17 Andre Robatino 2004-05-29 09:56:38 UTC
  After installing Fedora 2 on this machine, it is able to use ACPI
successfully instead of APM, though there are warning messages:

ACPI: IRQ9 SCI: Level Trigger.
    ACPI-0179: *** Warning: The ACPI AML in your computer contains
errors, please nag the manufacturer to correct it.
    ACPI-0182: *** Warning: Allowing relaxed access to fields; turn on
CONFIG_ACPI_DEBUG for details.

  To use ACPI it has to be enabled in the BIOS (set "ACPI aware OS" to
"yes").  The machine appears to run normally and even powers off on
shutdown.