Bug 110528
Summary: | File system corruption RH9/ext3 | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Curtis Regentin <cregentin> |
Component: | kernel | Assignee: | Dave Jones <davej> |
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 9 | CC: | cregentin, pfrields, ppokorny, sct |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-09-30 15:41:43 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Curtis Regentin
2003-11-20 19:52:11 UTC
While investigating, I found some concerns. 1) While the tuning of ext3's error behavior is available, this may a case where "goto error_return" is in order instead of knowingly smashing the file-system. Philosophical issue, I guess. 2) The notes on the web site regarding 2.4.20-18 contain the following statement: " A potential data corruption scenario has been identified. This scenario can occur under heavy, complex I/O loads. The scenario only occurs while performing memory mapped file I/O, where the file is simultaneously unlinked and the corresponding file blocks reallocated. Furthermore, the memory mapped writes must be to a partial page at the end of a file on an ext3 file system. As such, Red Hat considers this an unlikely scenario. " This statement is in the list of bugs fixed. Is it fixed, or is it identified? If it's not fixed, what are the symptoms? 3) I checked out the patches on the latest kernel (2.4.20-20), which contain a bunch of new checks in the ext3 block allocation and freeing routine. This would lead me to believe that I'm not the only one seeing the problem. Also in this patch set (linux-2.4.20-selected-ac-bits.patch) on line 48466 is the following change: @@ -336,7 +335,6 @@ do_more: wait_on_buffer (bh); } if (overflow) { - block += count; count = overflow; goto do_more; } Now, I don't know the fs code very well, but this appears to completely disable the freeing of block ranges spanning group boundaries, and results in continuously freeing the same blocks at the end of the first group over and over until "count" runs out. It seems to me (in my ignorance) that freeing block ranges spanning group boundaries may be a bad thing indeed - but I would think it would indicate an error in the code calling the free routine, and should not be handled in the free routine by doing bizarre things. Again, I may be wholly ignorant in this. If my assumptions are correct, it would seem that if freeing blocks spanning group boundaries is a problem (because metadata is on the boundry?), that this code would hide the problem - but cause some blocks that should be freed, to never be freed. So I'd like to know: Is this a known bug? What is it? Is it fixed? Is the code in 2.4.20-20 as scary as it looks? Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |