Bug 80809 - SMP crash with dual Athlon, 2.4.18-19.7.xsmp kernel
Summary: SMP crash with dual Athlon, 2.4.18-19.7.xsmp kernel
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: athlon
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-12-31 16:42 UTC by Al Hadsell
Modified: 2007-04-18 16:49 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2003-02-11 13:53:48 UTC
Embargoed:


Attachments (Terms of Use)
dmesg output from the affected machine (9.90 KB, text/plain)
2002-12-31 16:46 UTC, Al Hadsell
no flags Details

Description Al Hadsell 2002-12-31 16:42:39 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2) Gecko/20021126

Description of problem:
I have a dual Athlon (AuthenticAMD AMD Athlon(tm) MP Processor 1800+
1532 MHZ) on a Tyan MB.  Video is MGA G550 AGP.  Using Adaptec Raid-10
with several external disk drives.  

Since upgrading to the last 2 or 3 Red Hat kernels, including the one
referenced above, the machine hangs hard (no ctrl-alt-del, blank
screen) after a few hours of uptime.  The load on this system is not
heavy, and it typically fails when no one is using it.

dmesg tells me: AMD errata #22 may apply: Add "noapic" to the command
line if system instability.  I have tried adding "noapic" to the boot
parameters, and it had no effect.  In fact, the message about "noapic"
is still shown.

After several hours of googling, I'm confused about the status of this
problem.  I have seen recommendations for adding "mem=nopentium" to
the boot command line, and I've also seen a statement that it's not
necessary with recent kernels.  AMD's site says that 'noapic' is only
necessary with older kernels.

Version-Release number of selected component (if applicable):
2.4.18-19.7.xsmp

How reproducible:
Always

Steps to Reproduce:
1.Boot and use the system
2.Wait for hang
    

Actual Results:  System hangs after 10-36 hours

Expected Results:  System continues to run

Additional info:

See attached dmesg output

Comment 1 Al Hadsell 2002-12-31 16:46:39 UTC
Created attachment 89012 [details]
dmesg output from the affected machine

Note that "noapic" option is selected, and does not fix the problem.

Comment 2 Marc Schmitt 2003-01-08 16:25:09 UTC
Al,
Arjan,

Do you happen to see the following error messages in the logs:

APIC error on CPU0: 08(08) ?


I've built new RedHat 7.3 CDs with the latest errata applied, including kernel
2.4.18-19.7.x. I can't install on a Tyan Tiger MPX (S2466N-4M, beta BIOS
2466403m) with dual 2200+ CPU, the kernel 2.4.18-19.7.xBOOT won't stop spilling
the error message posted above. The machine won't accept CTRL-ALT-DEL, only the
reset button helps.
Installing with the original kernel that came with the installer CDs
(2.4.18-3BOOT) works, and so does 2.4.18-4BOOT.

I tried adding boot options like nosmp and noapic at no avail.
In the end I stumbled upon this message:
http://hypermail.idiosynkrasia.net/linux-kernel/archived/2002/week29/0281.html
Check the answer of Jack F. Vogel.

I don't know if this has anything to do with the hangs you see, I just thought
I'd add the problems I have right now with Athlon MP and kernel 2.4.18-19.7.x.

Greetz
     Marc

Comment 3 Al Hadsell 2003-01-08 19:53:07 UTC
I don't see the message you noted in the log.  It appears 
(not confirmed yet) that the problem I was seeing may have 
been a hardware issue caused by heating.  We added cooling 
to the machine and it now has a 6-day uptime.  I'm willing
to declare my problem solved if I get 10-day uptime.

Comment 4 Al Hadsell 2003-02-11 13:53:48 UTC
OK, with over a month of uptime, I'm officially declaring this one a false
alarm.  The problem appears to have been cooling.


Note You need to log in before you can comment on or make changes to this bug.