From Bugzilla Helper: User-Agent: Mozilla/4.79 [en] (X11; U; Linux 2.4.18-5 i686) Description of problem: SMP dual P3-733 on a MSI-6521 VIA SMP board and AMI MegaRAID 500 crashes approximately monthly. This is on a customer's site and usually they just press reset, but this time I managed to get them to take a picture with some good debug output. /var/log/messages does not contain anything about the crash. This is the only output I could find. Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: 1. turn on system 2. wait a month or 2 3. crash Actual Results: crash freeze lock-up kaboom Expected Results: should run smoothly Additional info:
Created attachment 69784 [details] picture of screen when it crashed
any idea what exact kernel is running ?
Sorry: #rpm -qa |grep kernel kernel-2.4.9-34 kernel-smp-2.4.9-31 kernel-headers-2.4.9-34 kernel-smp-2.4.9-34 kernel-2.4.9-31
#uname -a Linux www.blah.com 2.4.9-34smp #1 SMP Sat Jun 1 06:15:25 EDT 2002 i686 unknown
More info: most APIC / VIA SMP crashes that I've read about here seem to be load related. Before the last crash I had a script of mine running that was basically "uptime >> logfile" every 2 seconds. Below is the log right before and up to the point it crashed. As you can see, the load was quite low. In fact, during normal operation (it's a production web server) the load never gets above 1. Backups that are run every few hours can get near 2, but the crashes never seem to occur then, at least not that I've noticed, and so I don't think this issue is load related. 11:09am up 35 days, 13:12, 9 users, load average: 0.30, 0.22, 0.19 11:09am up 35 days, 13:12, 9 users, load average: 0.30, 0.22, 0.19 11:09am up 35 days, 13:12, 9 users, load average: 0.28, 0.22, 0.19 11:09am up 35 days, 13:12, 9 users, load average: 0.28, 0.22, 0.19 11:09am up 35 days, 13:12, 9 users, load average: 0.25, 0.21, 0.19 11:09am up 35 days, 13:12, 9 users, load average: 0.25, 0.21, 0.19 11:09am up 35 days, 13:13, 9 users, load average: 0.23, 0.21, 0.18
It just crashed again. This time with a different message: Uhhuh. NMI received for unknown reason 21. Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? Uhhuh. NMI received for unknown reason 31. It was hard-crashed and had to be physically reset. There is a sister machine identical to this one that I am loading up with RH 7.3 and all patches and swapping as the live server shortly. We'll see if this helps any with isolating whether this is h/w or s/w.
Installed the sister machine on Sep 8 with RH 7.3 on identical hardware. So far no crashes! This may be turn out to be a RH 7.1 kernel issue, or hardware gone flakey after 1 year of flawless operation. Will update this bug after another month or two of operation. *fingers crossed*
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/