Bug 220711

Summary:	ext3 file system corruption
Product:	[Fedora] Fedora	Reporter:	Pawel Salek <pawsa>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED NOTABUG	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6	CC:	esandeen, umar, wtogami
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-01-27 15:15:31 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pawel Salek 2006-12-24 00:26:55 UTC

Description of problem:
ext3 file system mounted on /dev/mapper/VolGroup00-LogVol00 gets corrupted after
few hours of intensive usage. There is no direct indication it is hardware related.

Version-Release number of selected component (if applicable):
kernel-2.6.18-1.2868.fc6

How reproducible:
This happened three times over last three days. System detected corruption on
reboot twice, and remounted the file system read-only once. I probably need to
collect more statistics on this.


Steps to Reproduce:
1.Install kernel-2.6.18-1.2868.fc6
2. Run few compilations...
  
I am running now kernel-2.6.18-1.2849.fc6 to rule out the hardware. I wonder
whether it is relevant or not, but when the  HAL init.d system starts up,
following message is generated:
hdb: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
hdb: drive_cmd: error=0x04 { DriveStatusError }
This drive is not a part of corrupted group. The corrupted vgroup is on hda. The
disk configuration is following:
VP_IDE: IDE controller at PCI slot 0000:00:11.1
ACPI: Unable to derive IRQ for device 0000:00:11.1
ACPI: PCI Interrupt 0000:00:11.1[A]: no GSI
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt8233a (rev 00) IDE UDMA133 controller on pci0000:00:11.1
    ide0: BM-DMA at 0x9800-0x9807, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0x9808-0x980f, BIOS settings: hdc:DMA, hdd:pio
Probing IDE interface ide0...
hda: IC35L080AVVA07-0, ATA DISK drive
hdb: WDC WD2000JB-00GVA0, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: PIONEER DVD-RW DVR-106D, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 160836480 sectors (82348 MB) w/1863KiB Cache, CHS=65535/16/63, UDMA(100)
hda: cache flushes supported
 hda: hda1 hda2
hdb: max request size: 512KiB
hdb: 390721968 sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63, UDMA(100)
hdb: cache flushes supported
 hdb: hdb1 hdb2 hdb3

I will try to post updates here.

Comment 1 Sammy 2006-12-26 19:01:41 UTC

I have the same/similar setup:
===================================================================
# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      63197036   9496128  50438856  16% /
/dev/sda3               101105     11311     84573  12% /boot
tmpfs                  1990808         0   1990808   0% /dev/shm
===================================================================

having a x86_64 system running kernel-2.6.18-1.2868.fc6, which I rebuild
by only selecting em64t processor option and turning on kernel debug
options.

I have recompiled many rpm packages with this kernel already and did not
encounter ext3 corruption. Could it be your disk acting up?
Sammy

Comment 2 Pawel Salek 2006-12-26 23:28:07 UTC

Well, I cannot tell for sure it is the hardware but SMART reports no error. It
may be just as well my 5 years old motherboard or some interaction between PATA
harddisk and DVD burner sitting on the same cable...

Comment 3 Eric Sandeen 2007-01-27 05:52:17 UTC

I'd probably chalk it up to hardware, but it might be interesting to at least
post some details on how your fs is corrupted, i.e. what fsck found.

If SMART thinks things are ok, you might consider seeing if the driver mfgr has
their own test bootdisk, sometimes that's more useful.  I'd also scour the logs
for any messages about hda if you are certain that the vgroup for the corrupted
fs only contains hda.

Seeing { DriveReady SeekComplete Error } on any drive in your system doesn't
give me a good feeling about the reliability of the hardware in this box though :)

-Eric

Comment 4 Pawel Salek 2007-01-27 15:15:31 UTC

I would close this bug for now but I do not know what the proper resolution code
should be. I have not been able to reproduce the disk corruption since. SMARTD
signals nothing, forced disk checks reveal no new problems. The DriveReady
errors are still there - but they occur only*once* per session, when HAL service
is being started. Current kernel print additionally information:
ide: failed opcode was: 0xb0
just after the error line.

Comment 5 Jarod Wilson 2007-01-30 16:27:24 UTC

I actually seem to recall seeing these one-time hdX spews on boot with certain
drives, as well as someone saying it was just that the drive didn't support a
certain feature and the message wasn't actually anything to worry about (aside
from looking scary).

Aha, there's the bug I was after: bug 214502, particularly the 4th comment from
Dave Jones.

Comment 6 Pawel Salek 2007-01-31 08:51:08 UTC

Yes, my remaining problem looks pretty much like the one described in bug
214502. It is very likely the disk corruption was unrelated.