Bug 116784

Summary: ext3 partition on IDE disk gets corrupted
Product: [Retired] Red Hat Linux Reporter: Pierre Demartines <pierred>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 9CC: riel
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:41:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
output of dmesg none

Description Pierre Demartines 2004-02-25 05:05:04 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1)
Gecko/20030225

Description of problem:
Summary
-------
I suspect the kernel to corrupt the ext3 partition when intensive use
of the disk is made (via cvs tagging).


How reproducible
-----------------
3 disks corrupted in 2 months

Chronology
----------
December 26:
my RH7.2 system crashes --most of the 100GB disk can be recovered.

I take the opportunity to upgrade to RH9 and a brand new Hitachi
ATA/IDE 200GB disk.


February 14:
major slow down (the system is almost entirely un-responsive and
X-windows doesn't refresh anymore), followed by a crash, while one of
my users does a cvs tag ...

After reboot, the disk shows substantial damage.

One more reboot, and this time the disk can not be recovered at all.

I buy another disk (WesternDigital WD2000), re-install RH9 and restore
the data I had backed up (not all of it, alas, since I don't have a
way to backup 200GB).


February 24:
Now, this new disk is starting to report errors as well:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=147238261,
high=8, low=13020533, sector=142611536
end_request: I/O error, dev 03:03 (hda), sector 142611536

etc...

Since this is the 3rd disk that ends up reporting the same type of
error, each time while cvs is trying to update its repository, I am
starting to suspect a software bug rather than a hardware one.


Version-Release number of selected component (if applicable):
kernel-2.4.20-28.9

How reproducible:
Sometimes

Steps to Reproduce:
1. cvs tag -rbuildxxxxxx  product  (most of the time works fine)
2. dma_intr problems appear in /var/log/messages
3. partition corrupted, sometimes boot won't even mount the partition
anymore. Linux rescue won't mount it either.


Additional info:

FVIW, the project under cvs is 144MB (snapshot), while the cvs
repository for that module is 809MB.  Of course, since cvs is so
clever, any tagging requires rewriting of all the files (just to add
the tag at the beginning, so in our case that's 0.8GB every time).

Comment 1 Pierre Demartines 2004-02-25 05:06:50 UTC
Created attachment 98030 [details]
output of dmesg

Comment 2 Dave Jones 2004-02-25 15:59:34 UTC
those errors do look very much like hardware failures unfortunatly.

your ide controller isn't exactly uncommon either, so this would be
more widespread if there was a bug there.

I suggest you check cabling etc, and make sure you have a strong
enough power supply.


Comment 3 Bugzilla owner 2004-09-30 15:41:50 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/