Bug 71204 - SMP system crash monthly
SMP system crash monthly
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.1
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-08-09 17:57 EDT by Trevor Cordes
Modified: 2008-08-01 12:22 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-30 11:39:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
picture of screen when it crashed (131.21 KB, image/jpeg)
2002-08-09 17:58 EDT, Trevor Cordes
no flags Details

  None (edit)
Description Trevor Cordes 2002-08-09 17:57:01 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.79 [en] (X11; U; Linux 2.4.18-5 i686)

Description of problem:
SMP dual P3-733 on a MSI-6521 VIA SMP board and AMI MegaRAID 500 crashes
approximately monthly.  This is on a customer's site and usually they just press
reset, but this time I managed to get them to take a picture with some good
debug output.  /var/log/messages does not contain anything about the crash. 
This is the only output I could find.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. turn on system
2. wait a month or 2
3. crash
	

Actual Results:  crash freeze lock-up kaboom

Expected Results:  should run smoothly

Additional info:
Comment 1 Trevor Cordes 2002-08-09 17:58:23 EDT
Created attachment 69784 [details]
picture of screen when it crashed
Comment 2 Arjan van de Ven 2002-08-09 18:48:56 EDT
any idea what exact kernel is running ?
Comment 3 Trevor Cordes 2002-08-09 18:51:30 EDT
Sorry:

#rpm -qa |grep kernel
kernel-2.4.9-34
kernel-smp-2.4.9-31
kernel-headers-2.4.9-34
kernel-smp-2.4.9-34
kernel-2.4.9-31
Comment 4 Trevor Cordes 2002-08-09 18:52:26 EDT
#uname  -a
Linux www.blah.com 2.4.9-34smp #1 SMP Sat Jun 1 06:15:25 EDT 2002 i686 unknown
Comment 5 Trevor Cordes 2002-08-11 06:26:46 EDT
More info: most APIC / VIA SMP crashes that I've read about here seem to be load
related.  Before the last crash I had a script of mine running that was
basically "uptime >> logfile" every 2 seconds.

Below is the log right before and up to the point it crashed.  As you can see,
the load was quite low.  In fact, during normal operation (it's a production web
server) the load never gets above 1.  Backups that are run every few hours can
get near 2, but the crashes never seem to occur then, at least not that I've
noticed, and so I don't think this issue is load related.

 11:09am  up 35 days, 13:12,  9 users,  load average: 0.30, 0.22, 0.19
 11:09am  up 35 days, 13:12,  9 users,  load average: 0.30, 0.22, 0.19
 11:09am  up 35 days, 13:12,  9 users,  load average: 0.28, 0.22, 0.19
 11:09am  up 35 days, 13:12,  9 users,  load average: 0.28, 0.22, 0.19
 11:09am  up 35 days, 13:12,  9 users,  load average: 0.25, 0.21, 0.19
 11:09am  up 35 days, 13:12,  9 users,  load average: 0.25, 0.21, 0.19
 11:09am  up 35 days, 13:13,  9 users,  load average: 0.23, 0.21, 0.18
Comment 6 Trevor Cordes 2002-08-29 07:32:24 EDT
It just crashed again.  This time with a different message:

Uhhuh. NMI received for unknown reason 21.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
Uhhuh. NMI received for unknown reason 31.

It was hard-crashed and had to be physically reset.

There is a sister machine identical to this one that I am loading up with RH 7.3
and all patches and swapping as the live server shortly.  We'll see if this
helps any with isolating whether this is h/w or s/w.
Comment 7 Trevor Cordes 2002-09-21 19:11:39 EDT
Installed the sister machine on Sep 8 with RH 7.3 on identical hardware.  So far
no crashes!  This may be turn out to be a RH 7.1 kernel issue, or hardware gone
flakey after 1 year of flawless operation.

Will update this bug after another month or two of operation.  *fingers crossed*
Comment 8 Bugzilla owner 2004-09-30 11:39:50 EDT
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.