Bug 89141

Summary: Machine oopses, then hangs
Product: [Retired] Red Hat Linux Reporter: Mario Lorenz <ml>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 7.2   
Target Milestone: ---   
Target Release: ---   
Hardware: athlon   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-05-29 09:26:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Several kernel oopses from /var/log/messages none

Description Mario Lorenz 2003-04-18 09:11:01 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Description of problem:
On random occasions (no pattern has been established yet), the kernel first
oopses, services will stop responding, until finally the machine panics and
needs to be rebooted.

The box had similar problems two weeks ago; suspecting hardware problems 
(we have 2.4.18-24.7 and -27.7 running on other boxen without problems
I changed the whole box with a spare. Yesterday the machine hung twice, today again.

relevant oopses are attached, as is the boot log from the last boot, for
hardware information. 

I was running -24 on this box as I rely on /proc/cmdline (see #88047)
and do not have any users on the box so ptrace was not an issue for me.
As of today I gave -27 a try, albeit I do not expect this to help.

Please notice in the latest dump that now the lmsensors modules were loaded - I
supervised temperature and fan rpm to rule out temperature
related problems. Values were all fine.

Also, RAM is ECC, so a failure is - albeit always possible - unlikely.


Version-Release number of selected component (if applicable):
2.4.18-24.7.x

How reproducible:
Sometimes

Steps to Reproduce:
1.Reboot the box to get it working
2.Wait
3.
    

Actual Results:  Sometimes, kernel oopses, as detailed above. Automatic service
monitor pages me in the middle of the night.

Expected Results:  Flawless performance without oops; good night of
uninterrupted sleep.


Additional info:

Comment 1 Mario Lorenz 2003-04-18 09:12:28 UTC
Created attachment 91185 [details]
Several kernel oopses from /var/log/messages

Comment 2 Mario Lorenz 2003-04-18 09:16:12 UTC
As a side note: I wanted to look into the 2.4.18-24.7.x source rpm again to notice
that all the mirrors have already erased it.
Is there some publically accessible ftp site which has all the old updates ?
(google is futile - it finds all the mirrors but the files have gone by now)


Comment 3 Mario Lorenz 2003-05-29 09:26:33 UTC
Closing due to proven hardware defect. The mainboard in question finally died due
to faulty electrolytic capacitors. The other one had RAM problems. I guess PC
hardware currently is in such a sorry state of affairs that failover for any
application needs to become standard....