116784 – ext3 partition on IDE disk gets corrupted

Bug 116784 - ext3 partition on IDE disk gets corrupted

Summary: ext3 partition on IDE disk gets corrupted

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	9
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-02-25 05:05 UTC by Pierre Demartines
Modified:	2007-04-18 17:03 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:41:50 UTC
Embargoed:

Attachments	(Terms of Use)
output of dmesg (11.69 KB, text/plain) 2004-02-25 05:06 UTC, Pierre Demartines	no flags	Details
View All

Description Pierre Demartines 2004-02-25 05:05:04 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1)
Gecko/20030225

Description of problem:
Summary
-------
I suspect the kernel to corrupt the ext3 partition when intensive use
of the disk is made (via cvs tagging).


How reproducible
-----------------
3 disks corrupted in 2 months

Chronology
----------
December 26:
my RH7.2 system crashes --most of the 100GB disk can be recovered.

I take the opportunity to upgrade to RH9 and a brand new Hitachi
ATA/IDE 200GB disk.


February 14:
major slow down (the system is almost entirely un-responsive and
X-windows doesn't refresh anymore), followed by a crash, while one of
my users does a cvs tag ...

After reboot, the disk shows substantial damage.

One more reboot, and this time the disk can not be recovered at all.

I buy another disk (WesternDigital WD2000), re-install RH9 and restore
the data I had backed up (not all of it, alas, since I don't have a
way to backup 200GB).


February 24:
Now, this new disk is starting to report errors as well:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=147238261,
high=8, low=13020533, sector=142611536
end_request: I/O error, dev 03:03 (hda), sector 142611536

etc...

Since this is the 3rd disk that ends up reporting the same type of
error, each time while cvs is trying to update its repository, I am
starting to suspect a software bug rather than a hardware one.


Version-Release number of selected component (if applicable):
kernel-2.4.20-28.9

How reproducible:
Sometimes

Steps to Reproduce:
1. cvs tag -rbuildxxxxxx  product  (most of the time works fine)
2. dma_intr problems appear in /var/log/messages
3. partition corrupted, sometimes boot won't even mount the partition
anymore. Linux rescue won't mount it either.


Additional info:

FVIW, the project under cvs is 144MB (snapshot), while the cvs
repository for that module is 809MB.  Of course, since cvs is so
clever, any tagging requires rewriting of all the files (just to add
the tag at the beginning, so in our case that's 0.8GB every time).

Comment 1 Pierre Demartines 2004-02-25 05:06:50 UTC

Created attachment 98030 [details]
output of dmesg

Comment 2 Dave Jones 2004-02-25 15:59:34 UTC

those errors do look very much like hardware failures unfortunatly.

your ide controller isn't exactly uncommon either, so this would be
more widespread if there was a bug there.

I suggest you check cabling etc, and make sure you have a strong
enough power supply.

Comment 3 Bugzilla owner 2004-09-30 15:41:50 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.