Bug 97330

Summary: hard kernel lockup/hang within minutes of boot
Product: [Retired] Red Hat Linux Reporter: Danny Yee <bookreviewer>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 7.3   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:41:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Danny Yee 2003-06-13 05:51:14 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030314

Description of problem:
After upgrading (from kernel-smp-2.4.18-19.7.x), my server now hangs (hard, no
ping response) within around 5 minutes of booting.  Nothing at all appears in
the log files, and the hung machine never responds SysReq key commands.

Previous kernels (all the way back to RH 7.1) weren't totally stable, but the
mean time between lockups was closer to 10 days than five minutes.  With the
earlier kernels, events that helped to crash the system were dumps to a (local)
SDLT tape drive or large file transfers over the network (e.g. netatalk or ftp).

I'm hoping the more frequent crashes with the newer kernel are a guide to what
the problem is.  It's almost certainly hardware-related, but there's nothing
unusual about the hardware (SDS2 motherboard with EEPro ethernet, aic7899 SCSI
and I20 raid) - and I've tried running without the RAID and with a 3com ethernet
card instead and the crashes still happened.  [Is there some way to stop the
kernel driving the PCI bus at max?]  Even weirder, I had the same problem on a
completely different machine, and it followed me to the current hardware over
rsync... it's been happening to me ever since I first moved to kernel 2.4 - ie
for over a year now :-(



Version-Release number of selected component (if applicable):
kernel-smp-2.4.20-18.7

How reproducible:
Sometimes

Steps to Reproduce:
1. do an SDLT dump - about 50% likely to cause a lockup in the course of dumping
100gig (DLT dumps never seemed to cause a problem).
(or) 2. try copying several gigabytes to a Mac via netatalk - again, about 50%
likely to cause a lockup, approaching 100% with time.
    

Additional info:

If you have any queries, I'm happy to provide more information.  I can provide
all the hair I've pulled out over this, too :-).

Comment 1 Bugzilla owner 2004-09-30 15:41:08 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/