Bug 71204
Summary: | SMP system crash monthly | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Trevor Cordes <trevor> | ||||
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.1 | ||||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-09-30 15:39:50 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Trevor Cordes
2002-08-09 21:57:01 UTC
Created attachment 69784 [details]
picture of screen when it crashed
any idea what exact kernel is running ? Sorry: #rpm -qa |grep kernel kernel-2.4.9-34 kernel-smp-2.4.9-31 kernel-headers-2.4.9-34 kernel-smp-2.4.9-34 kernel-2.4.9-31 #uname -a Linux www.blah.com 2.4.9-34smp #1 SMP Sat Jun 1 06:15:25 EDT 2002 i686 unknown More info: most APIC / VIA SMP crashes that I've read about here seem to be load related. Before the last crash I had a script of mine running that was basically "uptime >> logfile" every 2 seconds. Below is the log right before and up to the point it crashed. As you can see, the load was quite low. In fact, during normal operation (it's a production web server) the load never gets above 1. Backups that are run every few hours can get near 2, but the crashes never seem to occur then, at least not that I've noticed, and so I don't think this issue is load related. 11:09am up 35 days, 13:12, 9 users, load average: 0.30, 0.22, 0.19 11:09am up 35 days, 13:12, 9 users, load average: 0.30, 0.22, 0.19 11:09am up 35 days, 13:12, 9 users, load average: 0.28, 0.22, 0.19 11:09am up 35 days, 13:12, 9 users, load average: 0.28, 0.22, 0.19 11:09am up 35 days, 13:12, 9 users, load average: 0.25, 0.21, 0.19 11:09am up 35 days, 13:12, 9 users, load average: 0.25, 0.21, 0.19 11:09am up 35 days, 13:13, 9 users, load average: 0.23, 0.21, 0.18 It just crashed again. This time with a different message: Uhhuh. NMI received for unknown reason 21. Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? Uhhuh. NMI received for unknown reason 31. It was hard-crashed and had to be physically reset. There is a sister machine identical to this one that I am loading up with RH 7.3 and all patches and swapping as the live server shortly. We'll see if this helps any with isolating whether this is h/w or s/w. Installed the sister machine on Sep 8 with RH 7.3 on identical hardware. So far no crashes! This may be turn out to be a RH 7.1 kernel issue, or hardware gone flakey after 1 year of flawless operation. Will update this bug after another month or two of operation. *fingers crossed* Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |