Bug 97082 - (FS JFS)Filesystem simply stops responding
(FS JFS)Filesystem simply stops responding
Status: CLOSED WONTFIX
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
9
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Stephen Tweedie
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-06-09 22:24 EDT by Hrunting Johnson
Modified: 2007-04-18 12:54 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-30 11:41:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Hrunting Johnson 2003-06-09 22:24:16 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030516
Mozilla Firebird/0.6

Description of problem:
After running quite a period of time (a week, sometimes, more, sometimes less),
a filesystem undergoing heavy read/write I/O simply stops responding.

Facts:
jfs and ext3
both 1.7TB RAID 5 arrays
no coredumps
no entries in /var/log/messages (/var/log/*
/proc/mdstat shows all drives up
*any* operation on affected filesystem result in process going into
"uninterruptible sleep" state
other filesystems (which share all drives of affected filesystem) respond normally
drives spread across 2 3ware 7500 Escalade controllers

The only thing that comes close is this post to the lists:
https://listman.redhat.com/archives/ataraid-list/2002-August/msg00092.html

Has this been seen/verified by anyone else?  What's the next step to debug this
thing?

Version-Release number of selected component (if applicable):
2.4.20-18.9, 2.4.20-13.9

How reproducible:
Sometimes

Steps to Reproduce:
1. Make RAID5 filesystem and format it
2. mount it
3. let system run
4. wait for lockup
    

Actual Results:  Filsystem stops responding

Expected Results:  Filesystem continues responding.

Additional info:
Comment 1 Arjan van de Ven 2003-06-10 02:32:54 EDT
please try the -18 kernel; it has the "stall fixes" that were discussed on the
lkml mailinglist recently.

Also for most setups the limit on storage is 1Tb, although for some specific
setups you can go upto 2Tb, it seems you're lucky I guess.
Comment 2 Hrunting Johnson 2003-06-10 08:14:18 EDT
See notes.  This happened with the 2.4.20-13.9 and 2.4.20-18.9.  Last night's 
incident was with the 2.4.20-18.9-smp kernel on a JFS filesystem.  If you're 
talking about another -18 kernel, let me know.
Comment 3 Stephen Tweedie 2003-07-16 07:24:31 EDT
We really need to see the log output from before the hang, and a complete call
trace (alt-sysrq-t) capture during the hang --- serial console is ideal for that.
Comment 4 Bugzilla owner 2004-09-30 11:41:05 EDT
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.