Bug 39191
Summary: | hex dump during boot, and then system hangs | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Need Real Name <hobbs> | ||||||
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||||
Status: | CLOSED NOTABUG | QA Contact: | |||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 7.1 | CC: | alan, robatino | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i386 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2001-06-29 09:26:21 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Need Real Name
2001-05-05 06:16:43 UTC
If you could copy down the oops text (including all the magic numbers) it could be helpful. Assigning to kernel. I have the same problem. Originally this happened upon attempting an update, then after doing a clean install. I also have an etower 333cs with the only modifications being a bigger HD, more memory, an ISA modem, and an ethernet card. A few extra details: I managed to see the message "calibrating delay loop" before it crashes, so it's shortly after that. The same problem happens whether booting from the HD or the boot floppy I made during installation, but booting from the Seahat install CD in rescue mode is successful. The same thing happens with the original 2.4.2 kernel, the newest 2.4.3-12 kernel, and the Mandrake 2.4.3 kernel from a month or so back. The call trace numbers for both the 2.4.2 and 2.4.3-12 kernels follow a dual arithmetic progression: for example, <fc70c14f> <fc70c14f> <fc78c14f> <fc78c14f> <fc80c14f> <fc80c14f> ... where each number is repeated twice, and the difference between successive numbers alternates between 80000 and 20000 (hex). The only difference between the two versions of the kernel is an overall additive constant, the differences of 80000 and 20000 are the same. There is more than one screenful of these numbers so I can't tell how they start, but they repeat about once every 5 seconds. The keyboard is completely disabled, so Ctrl-Alt-Del doesn't work, and neither does the power-off button, the only way to turn the machine off is by cutting power to it. I salvaged the Seawolf installation by downgrading the kernel to 2.2.19 together with a few packages that depended on 2.4. But there are minor problems on shutting down including a fail message for NFS lockd and for halting at the very end, after the disks are unmounted. These are probably undocumented dependencies on the 2.4 kernel so should go away if the 2.4 kernel is fixed. We just released a 2.4.3-12 updated kernel last friday. Also, how much memory do you have and does this go away when you type "mem=xxxM" (with "xxx" being the amount of ram in megabyte minus 2 ) I have 256M of RAM. I tried booting the new RH 2.4.3-12 kernel with the "mem=254M" boot option. No luck. The behavior is subtly different - before, it looked like it was in an infinite loop, with the same numbers appearing over and over about every 5 seconds. Now, it just shows the numbers and appears to hang. The numbers form the same pattern as before, i.e. each repeated twice, and differences of 80000 and 20000 hex. If you have a general idea of what's wrong, I could experiment with different boot options. Any suggestions? "no_hlt" or "no-hlt" would be interesting as bootoption to try I tried no-hlt. No change. By the way, this machine uses a VIA chipset and a Cyrix M2 CPU. The more detailed info that used to be at http://www.e4me.com seems to be gone. The maximum memory for this machine was originally advertised as 256M. Would that be a motherboard limitation or just the maximum you could pop into the 2 memory slots given the size of DIMMs at that time? Created attachment 22006 [details]
capture of boot messages via serial console
I hope this capture file will help. I also forgot to mention that even though this machine is an i686, I also tried the i386 version of the newest kernel with the same problem, so it's not those optimizations. I'm running 2.4.x fine on a VIA chipset cyrix MII quite similar to yours. The trace is a pretty clear jump to nowhere. You say " I have the same problem. Originally this happened upon attempting an update. then after doing a clean install" Does that mean the installion kernel ran correctly ? and it was after the install it died ? First I upgraded from 7.0 to 7.1. The upgrade itself went fine, but as soon as I tried to boot the new kernel it crashed. After a while I gave up and decided to try a clean install of 7.1. The same problem occurred - the install itself goes fine, but the new kernel crashed when I tried to boot it. This happens if I boot either from the HD or the boot floppy containing the 2.4 kernel made during installation. However, I can boot fine from the 7.1 install CD in rescue mode. Ok that is very important info. The BOOT kernel is built for absolute maximal compatibility with anything the world can throw at it and also with as few feature sets as possible to keep the size down so it fits on the boot disk The first obvious thing it lacks is APM. Can you try booting with the additional option to disable APM. (I forget what it is unfortunately - Arjan ?) By using the boot option "apm=off", it now works! I upgraded the packages I had downgraded previously due to their dependence on the 2.4 kernel, and everything seems to work fine, except that when I halt the machine, not only does it not power down automatically, but the power button doesn't work either, so I have to cut power to the machine to turn it off. I had to do the same thing anyway while running the 2.2.19 kernel with Seawolf. Is this normal behavior with APM disabled? That is the normal behaviour with APM disabled in many cases yes Ok can you grab ftp.linux.org.uk:/pub/linux/alan/DMI/dmiscan.c compile it, run it as root and attach the output to the bug. That will let me add the box to the various internal tables so we know APM is to be avoided on it. You might also btw look for BIOS updates, you may find a BIOS update fixes this problem. Created attachment 22186 [details]
output of dmidecode
Thanks. I've added that entry to my codebase so that at some future point kernels will automatically avoid APM on that box. Do let me know if you ever find a BIOS upgrade exists amd if it cures it Marked NOTABUG because it is a BIOS bug. I'll see about getting the block entry in though. I updated the BIOS from version 1.11 to 1.20 using the file E120.exe on the eMachines Help Site (<http://www.e4all.info>), and then built a custom kernel, editing out the part in dmi_scan.c that disables the APM for the Delhi3 motherboard. No luck, the problem still exists. My machine is a dual boot RH 7.3/Win98. Although there were APM problems with Win98 when I bought the machine (3 years ago), after a year or so they went away, and APM works fine now in Win98 (both versions 1.11 and 1.20 of the BIOS). Apparently Microsoft did a software workaround. Maybe the BIOS uses an outdated version of APM. Is it possible to make a guess as to what MS did, and is there any utility I could run in Win98 to get info on the state of the APM? Device Manager indicates that it is normal. After installing Fedora 2 on this machine, it is able to use ACPI successfully instead of APM, though there are warning messages: ACPI: IRQ9 SCI: Level Trigger. ACPI-0179: *** Warning: The ACPI AML in your computer contains errors, please nag the manufacturer to correct it. ACPI-0182: *** Warning: Allowing relaxed access to fields; turn on CONFIG_ACPI_DEBUG for details. To use ACPI it has to be enabled in the BIOS (set "ACPI aware OS" to "yes"). The machine appears to run normally and even powers off on shutdown. |