Bug 71204 - SMP system crash monthly
Summary: SMP system crash monthly
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.1
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-08-09 21:57 UTC by Trevor Cordes
Modified: 2008-08-01 16:22 UTC (History)
0 users

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2004-09-30 15:39:50 UTC
Embargoed:


Attachments (Terms of Use)
picture of screen when it crashed (131.21 KB, image/jpeg)
2002-08-09 21:58 UTC, Trevor Cordes
no flags Details

Description Trevor Cordes 2002-08-09 21:57:01 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.79 [en] (X11; U; Linux 2.4.18-5 i686)

Description of problem:
SMP dual P3-733 on a MSI-6521 VIA SMP board and AMI MegaRAID 500 crashes
approximately monthly.  This is on a customer's site and usually they just press
reset, but this time I managed to get them to take a picture with some good
debug output.  /var/log/messages does not contain anything about the crash. 
This is the only output I could find.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. turn on system
2. wait a month or 2
3. crash
	

Actual Results:  crash freeze lock-up kaboom

Expected Results:  should run smoothly

Additional info:

Comment 1 Trevor Cordes 2002-08-09 21:58:23 UTC
Created attachment 69784 [details]
picture of screen when it crashed

Comment 2 Arjan van de Ven 2002-08-09 22:48:56 UTC
any idea what exact kernel is running ?

Comment 3 Trevor Cordes 2002-08-09 22:51:30 UTC
Sorry:

#rpm -qa |grep kernel
kernel-2.4.9-34
kernel-smp-2.4.9-31
kernel-headers-2.4.9-34
kernel-smp-2.4.9-34
kernel-2.4.9-31


Comment 4 Trevor Cordes 2002-08-09 22:52:26 UTC
#uname  -a
Linux www.blah.com 2.4.9-34smp #1 SMP Sat Jun 1 06:15:25 EDT 2002 i686 unknown


Comment 5 Trevor Cordes 2002-08-11 10:26:46 UTC
More info: most APIC / VIA SMP crashes that I've read about here seem to be load
related.  Before the last crash I had a script of mine running that was
basically "uptime >> logfile" every 2 seconds.

Below is the log right before and up to the point it crashed.  As you can see,
the load was quite low.  In fact, during normal operation (it's a production web
server) the load never gets above 1.  Backups that are run every few hours can
get near 2, but the crashes never seem to occur then, at least not that I've
noticed, and so I don't think this issue is load related.

 11:09am  up 35 days, 13:12,  9 users,  load average: 0.30, 0.22, 0.19
 11:09am  up 35 days, 13:12,  9 users,  load average: 0.30, 0.22, 0.19
 11:09am  up 35 days, 13:12,  9 users,  load average: 0.28, 0.22, 0.19
 11:09am  up 35 days, 13:12,  9 users,  load average: 0.28, 0.22, 0.19
 11:09am  up 35 days, 13:12,  9 users,  load average: 0.25, 0.21, 0.19
 11:09am  up 35 days, 13:12,  9 users,  load average: 0.25, 0.21, 0.19
 11:09am  up 35 days, 13:13,  9 users,  load average: 0.23, 0.21, 0.18


Comment 6 Trevor Cordes 2002-08-29 11:32:24 UTC
It just crashed again.  This time with a different message:

Uhhuh. NMI received for unknown reason 21.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
Uhhuh. NMI received for unknown reason 31.

It was hard-crashed and had to be physically reset.

There is a sister machine identical to this one that I am loading up with RH 7.3
and all patches and swapping as the live server shortly.  We'll see if this
helps any with isolating whether this is h/w or s/w.

Comment 7 Trevor Cordes 2002-09-21 23:11:39 UTC
Installed the sister machine on Sep 8 with RH 7.3 on identical hardware.  So far
no crashes!  This may be turn out to be a RH 7.1 kernel issue, or hardware gone
flakey after 1 year of flawless operation.

Will update this bug after another month or two of operation.  *fingers crossed*

Comment 8 Bugzilla owner 2004-09-30 15:39:50 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/



Note You need to log in before you can comment on or make changes to this bug.