From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 Description of problem: We originally installed a RHEL3 system on a dual processor Xeon hyperthreaded P4 system. After about three weeks of uptime, it developed ext3 filesystem corruption (random files would suddenly appear as if their sizes were in the multi-terabyte range for example). It repeatedly developed filesystem corruption even after being fscked and so we replaced the server with a nearly identical machine running RH9, and a single processor (a hyperthreaded p4 Xeon). It _also_ developed ext3 filesystem corruption after about 3 weeks of uptime. When I attempted to delete a corrupted file entry, the entire server crashed and could not be recovered using fsck. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Install RHEL3/RH9 to a dual or single p4 Xeon system with hyperthreading enabled and 3ware SATA raid5 system with 1 gigabyte of RAM. Disable 'atime' for the partitions. 2.Install qmail mail server 3.Run under sustained mail traffic load (~40,000 messages per day) for roughly 3 weeks 4.Run nightly rsync backups of entire server Actual Results: Corruption in random places of the ext3 filesystem - the corruption appears _anywhere_ in the filesystem, even in directories where nothing has been modified. Expected Results: No filesystem corruption Additional info: Our RHEL3 server id is 1004130933. The second box is identical, except it was running RH9 and only had one processor instead of two.
I've been doing some Google digging, and discovered this may be a 3ware hardware issue. There is a thread at http://forums.storagereview.net/index.php?showtopic=14162 that indicates that 3ware 66Mhz products have a serious problem on Intel 750X chipset and some AMD boards - particularly if using a manufacturer riser board. 3ware appears to be trying to keep a low profile on it, but there is a technical brief on it at https://www.3ware.com/kbadmin/attachments/TM900-0045-00%20Rev%20A_P.pdf
As the second comment pointed out, this would appeared to be a 3Ware issue. We didn't get any other reports of ext3 corruption like this. I'm closing this bug out as NOTABUG since it appears it was a hardware issue.