Red Hat Bugzilla – Bug 80809
SMP crash with dual Athlon, 2.4.18-19.7.xsmp kernel
Last modified: 2007-04-18 12:49:24 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2) Gecko/20021126
Description of problem:
I have a dual Athlon (AuthenticAMD AMD Athlon(tm) MP Processor 1800+
1532 MHZ) on a Tyan MB. Video is MGA G550 AGP. Using Adaptec Raid-10
with several external disk drives.
Since upgrading to the last 2 or 3 Red Hat kernels, including the one
referenced above, the machine hangs hard (no ctrl-alt-del, blank
screen) after a few hours of uptime. The load on this system is not
heavy, and it typically fails when no one is using it.
dmesg tells me: AMD errata #22 may apply: Add "noapic" to the command
line if system instability. I have tried adding "noapic" to the boot
parameters, and it had no effect. In fact, the message about "noapic"
is still shown.
After several hours of googling, I'm confused about the status of this
problem. I have seen recommendations for adding "mem=nopentium" to
the boot command line, and I've also seen a statement that it's not
necessary with recent kernels. AMD's site says that 'noapic' is only
necessary with older kernels.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Boot and use the system
2.Wait for hang
Actual Results: System hangs after 10-36 hours
Expected Results: System continues to run
See attached dmesg output
Created attachment 89012 [details]
dmesg output from the affected machine
Note that "noapic" option is selected, and does not fix the problem.
Do you happen to see the following error messages in the logs:
APIC error on CPU0: 08(08) ?
I've built new RedHat 7.3 CDs with the latest errata applied, including kernel
2.4.18-19.7.x. I can't install on a Tyan Tiger MPX (S2466N-4M, beta BIOS
2466403m) with dual 2200+ CPU, the kernel 2.4.18-19.7.xBOOT won't stop spilling
the error message posted above. The machine won't accept CTRL-ALT-DEL, only the
reset button helps.
Installing with the original kernel that came with the installer CDs
(2.4.18-3BOOT) works, and so does 2.4.18-4BOOT.
I tried adding boot options like nosmp and noapic at no avail.
In the end I stumbled upon this message:
Check the answer of Jack F. Vogel.
I don't know if this has anything to do with the hangs you see, I just thought
I'd add the problems I have right now with Athlon MP and kernel 2.4.18-19.7.x.
I don't see the message you noted in the log. It appears
(not confirmed yet) that the problem I was seeing may have
been a hardware issue caused by heating. We added cooling
to the machine and it now has a 6-day uptime. I'm willing
to declare my problem solved if I get 10-day uptime.
OK, with over a month of uptime, I'm officially declaring this one a false
alarm. The problem appears to have been cooling.