From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030516
Description of problem:
After running quite a period of time (a week, sometimes, more, sometimes less),
a filesystem undergoing heavy read/write I/O simply stops responding.
jfs and ext3
both 1.7TB RAID 5 arrays
no entries in /var/log/messages (/var/log/*
/proc/mdstat shows all drives up
*any* operation on affected filesystem result in process going into
"uninterruptible sleep" state
other filesystems (which share all drives of affected filesystem) respond normally
drives spread across 2 3ware 7500 Escalade controllers
The only thing that comes close is this post to the lists:
Has this been seen/verified by anyone else? What's the next step to debug this
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Make RAID5 filesystem and format it
2. mount it
3. let system run
4. wait for lockup
Actual Results: Filsystem stops responding
Expected Results: Filesystem continues responding.
please try the -18 kernel; it has the "stall fixes" that were discussed on the
lkml mailinglist recently.
Also for most setups the limit on storage is 1Tb, although for some specific
setups you can go upto 2Tb, it seems you're lucky I guess.
See notes. This happened with the 2.4.20-13.9 and 2.4.20-18.9. Last night's
incident was with the 2.4.20-18.9-smp kernel on a JFS filesystem. If you're
talking about another -18 kernel, let me know.
We really need to see the log output from before the hang, and a complete call
trace (alt-sysrq-t) capture during the hang --- serial console is ideal for that.
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases,
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/