Bug 122252

Summary: ext3/quota deadlock condition consistently hangs systems
Product: Red Hat Enterprise Linux 3 Reporter: Marc Wallman <rhbugzillamarcw>
Component: kernelAssignee: Stephen Tweedie <sct>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: esandeen, lwoodman, petrides, riel, strovato, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
URL: http://www.kernel.org/pub/linux/kernel/v2.4/ChangeLog-2.4.25
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-19 19:26:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Backport of fix for quota/ext3 deadlock from kernel-2.4.25 none

Description Marc Wallman 2004-05-02 14:34:00 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
Gecko/20040207 Firefox/0.8

Description of problem:
We have been having trouble with 3ES hosts locking up when running
with quotas enabled on an ext3 filesystem. The problem happens at
random times, under both heavy and light loads. We are unable to run
more than a few days, regardless of the load, without our systems
locking up.

The bug was identified and fixed in the mainline 2.4.25 kernel, but as
far as I can tell, this fix has not been backported yet to the v3ES
kernel. I have examined both the changelog for the 3ES kernel and
looked at the source code for the 2.4.21-9.0.3.EL.

The fix was submitted in v2.4.25-pre5 by jack:ucw.cz. See the URL to
the 2.4.25 changelog in the URL field. Can someone backport this patch?

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Enable quotas on a ext3 filesystem.
2. Have disk activity on it (for us, uw-imapd is the kind
   of disk activity that generates the lockup)
3. Wait, probalby not more than a few days.
    

Actual Results:  Our hosts will consistently hang after a few days. We
are unable to keep them stable enough with quotas enabled to run them
as production servers.

Expected Results:  The host should not lock up.

Additional info:

Comment 1 Rik van Riel 2004-05-02 16:03:18 UTC
Reassigned to ext3 author.

Comment 2 Kevin Fenzi 2004-09-08 18:15:01 UTC
Is any progress being made to track this issue down?
It seems to have been around for quite a while, and it means you
basically can't use quotas in a production env. 
I see it here on a server, usually less than a day after enabling quotas. 

I have sysrq output when it's in the deadlock state. 
Anything else we can do to help solve this issue?


Comment 3 David Lehman 2004-10-28 23:00:15 UTC
Created attachment 105927 [details]
Backport of fix for quota/ext3 deadlock from kernel-2.4.25

Comment 5 Michael Simms 2004-11-04 01:39:15 UTC
Does this mean we'll see an official EL kernel with this fix sometime
soon?

Comment 7 Ernie Petrides 2004-11-04 20:09:59 UTC
No fix for this problem has yet been committed to a RHEL3 patch pool,
and specifically U4 is already closed (and in beta now).

Comment 11 fkass 2005-01-27 15:52:04 UTC
This really should be increased in priority!  We are seeing this same
problem and it is creating major issues for us.  Do we apply this
outdated patch onto 2.4.21-27.0.2.ELsmp?  Do we ignore RH kernels and
just put in 2.6.10 which is supposed to have fixed the problem?  Do we
step our filesystem back down to ext2?  I'd like to know how RH
suggests we fix the problem...

Comment 12 strovato 2005-12-01 18:06:26 UTC
Is this patch going to be added to the official Red Hat kernel at some point?  I
was bit by this bug, but compiling a custom kernel with the attached patch has
fixed the problem.

Comment 14 strovato 2006-01-23 14:46:58 UTC
*** Bug 173135 has been marked as a duplicate of this bug. ***

Comment 15 strovato 2007-07-25 07:35:46 UTC
Please add this patch to the official Red Hat kernel.  Thank you.

Comment 16 RHEL Program Management 2007-10-19 19:26:56 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.