Bug 119033 - Random ext3 filesystem corruption under heavy disk activity load
Summary: Random ext3 filesystem corruption under heavy disk activity load
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Doug Ledford
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-03-24 01:02 UTC by Benjamin Franz
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-09-19 18:44:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Benjamin Franz 2004-03-24 01:02:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1)
Gecko/20021003

Description of problem:
We originally installed a RHEL3 system on a dual processor Xeon
hyperthreaded P4 system. After about three weeks of uptime, it
developed ext3 filesystem corruption (random files would suddenly
appear as if their sizes were in the multi-terabyte range for
example). It repeatedly developed filesystem corruption even after
being fscked and so we replaced the server with a nearly identical
machine running RH9, and a single processor (a hyperthreaded p4 Xeon).
It _also_ developed ext3 filesystem corruption after about 3 weeks of
uptime. When I attempted to delete a corrupted file entry, the entire
server crashed and could not be recovered using fsck.



Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Install RHEL3/RH9 to a dual or single p4 Xeon system with
hyperthreading enabled and 3ware SATA raid5 system with 1 gigabyte of
RAM. Disable 'atime' for the partitions.
2.Install qmail mail server
3.Run under sustained mail traffic load (~40,000 messages per day) for
roughly 3 weeks
4.Run nightly rsync backups of entire server
    

Actual Results:  Corruption in random places of the ext3 filesystem -
the corruption appears _anywhere_ in the filesystem, even in
directories where nothing has been modified.

Expected Results:  No filesystem corruption

Additional info:

Our RHEL3 server id is 1004130933. The second box is identical, except
it was running RH9 and only had one processor instead of two.

Comment 1 Benjamin Franz 2004-03-25 02:16:21 UTC
I've been doing some Google digging, and discovered this may be a
3ware hardware issue. There is a thread at
http://forums.storagereview.net/index.php?showtopic=14162 that
indicates that 3ware 66Mhz products have a serious problem on Intel
750X chipset and some AMD boards - particularly if using a
manufacturer riser board.

3ware appears to be trying to keep a low profile on it, but there is a
technical brief on it at
https://www.3ware.com/kbadmin/attachments/TM900-0045-00%20Rev%20A_P.pdf

Comment 3 Doug Ledford 2006-09-19 18:44:38 UTC
As the second comment pointed out, this would appeared to be a 3Ware issue.  We
didn't get any other reports of ext3 corruption like this.  I'm closing this bug
out as NOTABUG since it appears it was a hardware issue.


Note You need to log in before you can comment on or make changes to this bug.