From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040207 Firefox/0.8 Description of problem: We have been having trouble with 3ES hosts locking up when running with quotas enabled on an ext3 filesystem. The problem happens at random times, under both heavy and light loads. We are unable to run more than a few days, regardless of the load, without our systems locking up. The bug was identified and fixed in the mainline 2.4.25 kernel, but as far as I can tell, this fix has not been backported yet to the v3ES kernel. I have examined both the changelog for the 3ES kernel and looked at the source code for the 2.4.21-9.0.3.EL. The fix was submitted in v2.4.25-pre5 by jack:ucw.cz. See the URL to the 2.4.25 changelog in the URL field. Can someone backport this patch? Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Enable quotas on a ext3 filesystem. 2. Have disk activity on it (for us, uw-imapd is the kind of disk activity that generates the lockup) 3. Wait, probalby not more than a few days. Actual Results: Our hosts will consistently hang after a few days. We are unable to keep them stable enough with quotas enabled to run them as production servers. Expected Results: The host should not lock up. Additional info:
Reassigned to ext3 author.
Is any progress being made to track this issue down? It seems to have been around for quite a while, and it means you basically can't use quotas in a production env. I see it here on a server, usually less than a day after enabling quotas. I have sysrq output when it's in the deadlock state. Anything else we can do to help solve this issue?
Created attachment 105927 [details] Backport of fix for quota/ext3 deadlock from kernel-2.4.25
Does this mean we'll see an official EL kernel with this fix sometime soon?
No fix for this problem has yet been committed to a RHEL3 patch pool, and specifically U4 is already closed (and in beta now).
This really should be increased in priority! We are seeing this same problem and it is creating major issues for us. Do we apply this outdated patch onto 2.4.21-27.0.2.ELsmp? Do we ignore RH kernels and just put in 2.6.10 which is supposed to have fixed the problem? Do we step our filesystem back down to ext2? I'd like to know how RH suggests we fix the problem...
Is this patch going to be added to the official Red Hat kernel at some point? I was bit by this bug, but compiling a custom kernel with the attached patch has fixed the problem.
*** Bug 173135 has been marked as a duplicate of this bug. ***
Please add this patch to the official Red Hat kernel. Thank you.
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.