From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6 Description of problem: After running quite a period of time (a week, sometimes, more, sometimes less), a filesystem undergoing heavy read/write I/O simply stops responding. Facts: jfs and ext3 both 1.7TB RAID 5 arrays no coredumps no entries in /var/log/messages (/var/log/* /proc/mdstat shows all drives up *any* operation on affected filesystem result in process going into "uninterruptible sleep" state other filesystems (which share all drives of affected filesystem) respond normally drives spread across 2 3ware 7500 Escalade controllers The only thing that comes close is this post to the lists: https://listman.redhat.com/archives/ataraid-list/2002-August/msg00092.html Has this been seen/verified by anyone else? What's the next step to debug this thing? Version-Release number of selected component (if applicable): 2.4.20-18.9, 2.4.20-13.9 How reproducible: Sometimes Steps to Reproduce: 1. Make RAID5 filesystem and format it 2. mount it 3. let system run 4. wait for lockup Actual Results: Filsystem stops responding Expected Results: Filesystem continues responding. Additional info:
please try the -18 kernel; it has the "stall fixes" that were discussed on the lkml mailinglist recently. Also for most setups the limit on storage is 1Tb, although for some specific setups you can go upto 2Tb, it seems you're lucky I guess.
See notes. This happened with the 2.4.20-13.9 and 2.4.20-18.9. Last night's incident was with the 2.4.20-18.9-smp kernel on a JFS filesystem. If you're talking about another -18 kernel, let me know.
We really need to see the log output from before the hang, and a complete call trace (alt-sysrq-t) capture during the hang --- serial console is ideal for that.
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/