97082 – (FS JFS)Filesystem simply stops responding

Bug 97082 - (FS JFS)Filesystem simply stops responding

Summary: (FS JFS)Filesystem simply stops responding

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	9
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Stephen Tweedie
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-06-10 02:24 UTC by Hrunting Johnson
Modified:	2007-04-18 16:54 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:41:05 UTC
Embargoed:

Attachments	(Terms of Use)

Description Hrunting Johnson 2003-06-10 02:24:16 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030516
Mozilla Firebird/0.6

Description of problem:
After running quite a period of time (a week, sometimes, more, sometimes less),
a filesystem undergoing heavy read/write I/O simply stops responding.

Facts:
jfs and ext3
both 1.7TB RAID 5 arrays
no coredumps
no entries in /var/log/messages (/var/log/*
/proc/mdstat shows all drives up
*any* operation on affected filesystem result in process going into
"uninterruptible sleep" state
other filesystems (which share all drives of affected filesystem) respond normally
drives spread across 2 3ware 7500 Escalade controllers

The only thing that comes close is this post to the lists:
https://listman.redhat.com/archives/ataraid-list/2002-August/msg00092.html

Has this been seen/verified by anyone else?  What's the next step to debug this
thing?

Version-Release number of selected component (if applicable):
2.4.20-18.9, 2.4.20-13.9

How reproducible:
Sometimes

Steps to Reproduce:
1. Make RAID5 filesystem and format it
2. mount it
3. let system run
4. wait for lockup
    

Actual Results:  Filsystem stops responding

Expected Results:  Filesystem continues responding.

Additional info:

Comment 1 Arjan van de Ven 2003-06-10 06:32:54 UTC

please try the -18 kernel; it has the "stall fixes" that were discussed on the
lkml mailinglist recently.

Also for most setups the limit on storage is 1Tb, although for some specific
setups you can go upto 2Tb, it seems you're lucky I guess.

Comment 2 Hrunting Johnson 2003-06-10 12:14:18 UTC

See notes.  This happened with the 2.4.20-13.9 and 2.4.20-18.9.  Last night's 
incident was with the 2.4.20-18.9-smp kernel on a JFS filesystem.  If you're 
talking about another -18 kernel, let me know.

Comment 3 Stephen Tweedie 2003-07-16 11:24:31 UTC

We really need to see the log output from before the hang, and a complete call
trace (alt-sysrq-t) capture during the hang --- serial console is ideal for that.

Comment 4 Bugzilla owner 2004-09-30 15:41:05 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.