Bug 158039

Summary: nfsd oopses on testing kernel update for FC3
Product: [Fedora] Fedora Reporter: Alexandre Oliva <oliva>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED WORKSFORME QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: davej, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-05-19 14:10:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Oopses none

Description Alexandre Oliva 2005-05-18 01:10:26 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050512 Fedora/1.0.4-2 Firefox/1.0.4

Description of problem:
Got all of these oopses on the same box over the past few weeks, running various different kernels.  It might be faulty hardware, so take it with a grain of salt, but I don't have any other boxes with identical hardware configuration to tell whether it's something specific to the set of modules involved, nor easy local access to run hardware tests.  There are two ext3 oopses and some nfsd oopses from the stable kernel as well, could this all be caused filesystem corruption?  I'm thinking of bringing the system down for an fsck.

Version-Release number of selected component (if applicable):
kernel-2.6.11-1.20_FC3

How reproducible:
Didn't try

Steps to Reproduce:
1.Boot up either the stable or the testing 2.6.11 FC3 kernel and let it run for days.

Actual Results:  Oopses I'll attach.

Expected Results:  No such oopses.

Additional info:

Comment 1 Alexandre Oliva 2005-05-18 01:11:26 UTC
Created attachment 114493 [details]
Oopses

Comment 2 Alexandre Oliva 2005-05-18 03:00:59 UTC
fsck didn't find any inconsistencies, but a local user reported some recent
suspicion on overheating, and the failures appear to be related with peak use.

Comment 3 Steve Dickson 2005-05-18 11:38:07 UTC
Oops are never good for data integrity. 
Why do you think this is faulty hardware? 


Comment 4 Alexandre Oliva 2005-05-18 15:15:15 UTC
That was the suspicion of another sysadmin.  Apparently the box has never been
exactly rock solid, with some programs crashing every now and then, odd messages
on cron mail, and so on, but this had never (apparently) affected its ability to
serve out filesystems over nfs.  The box was recently taken off to a computer
repair facility at the uni, and they suspected the goop that attaches the cooler
to the processor might be at fault, and replaced it, but that had no effect
whatsoever.  If anything, crashes are now more frequent.

Besides, we have many other boxes running NFS servers with the very same
software, although not exactly the same hardware, so I found it unlikely that
things would crash so often for one box and not for others.  This one isn't even
the most heavily used server.  I figured, if such oopses should be hitting
others, you'd know about it, so I thought I'd file it, but don't waste too much
time on it until we can get better assurance that it's not caused by hardware
problems.  I've downgraded to 2.6.10-1.670_FC3 yesterday, and now the box is off
line.  I can't tell whether it crashed or was taken to the repair facility
again.  Aah, the wonders of being a remote sysadmin :-)

Comment 5 Alexandre Oliva 2005-05-19 14:10:41 UTC
The box failed again, and was taken to the repair office again.  They ran a
memtest again, and found both memory modules to be defective.  I'll probably
have to go on site and verify the testing, but we're now pretty sure it's
hardware failure.  Sorry about the noise.

(s/1.670_FC3/1.770_FC3/ in the previous comment, BTW)