Bug 119033 - Random ext3 filesystem corruption under heavy disk activity load
Random ext3 filesystem corruption under heavy disk activity load
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Doug Ledford
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-03-23 20:02 EST by Benjamin Franz
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-09-19 14:44:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Benjamin Franz 2004-03-23 20:02:30 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1)
Gecko/20021003

Description of problem:
We originally installed a RHEL3 system on a dual processor Xeon
hyperthreaded P4 system. After about three weeks of uptime, it
developed ext3 filesystem corruption (random files would suddenly
appear as if their sizes were in the multi-terabyte range for
example). It repeatedly developed filesystem corruption even after
being fscked and so we replaced the server with a nearly identical
machine running RH9, and a single processor (a hyperthreaded p4 Xeon).
It _also_ developed ext3 filesystem corruption after about 3 weeks of
uptime. When I attempted to delete a corrupted file entry, the entire
server crashed and could not be recovered using fsck.



Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Install RHEL3/RH9 to a dual or single p4 Xeon system with
hyperthreading enabled and 3ware SATA raid5 system with 1 gigabyte of
RAM. Disable 'atime' for the partitions.
2.Install qmail mail server
3.Run under sustained mail traffic load (~40,000 messages per day) for
roughly 3 weeks
4.Run nightly rsync backups of entire server
    

Actual Results:  Corruption in random places of the ext3 filesystem -
the corruption appears _anywhere_ in the filesystem, even in
directories where nothing has been modified.

Expected Results:  No filesystem corruption

Additional info:

Our RHEL3 server id is 1004130933. The second box is identical, except
it was running RH9 and only had one processor instead of two.
Comment 1 Benjamin Franz 2004-03-24 21:16:21 EST
I've been doing some Google digging, and discovered this may be a
3ware hardware issue. There is a thread at
http://forums.storagereview.net/index.php?showtopic=14162 that
indicates that 3ware 66Mhz products have a serious problem on Intel
750X chipset and some AMD boards - particularly if using a
manufacturer riser board.

3ware appears to be trying to keep a low profile on it, but there is a
technical brief on it at
https://www.3ware.com/kbadmin/attachments/TM900-0045-00%20Rev%20A_P.pdf
Comment 3 Doug Ledford 2006-09-19 14:44:38 EDT
As the second comment pointed out, this would appeared to be a 3Ware issue.  We
didn't get any other reports of ext3 corruption like this.  I'm closing this bug
out as NOTABUG since it appears it was a hardware issue.

Note You need to log in before you can comment on or make changes to this bug.