Description of problem: ext3 file system mounted on /dev/mapper/VolGroup00-LogVol00 gets corrupted after few hours of intensive usage. There is no direct indication it is hardware related. Version-Release number of selected component (if applicable): kernel-2.6.18-1.2868.fc6 How reproducible: This happened three times over last three days. System detected corruption on reboot twice, and remounted the file system read-only once. I probably need to collect more statistics on this. Steps to Reproduce: 1.Install kernel-2.6.18-1.2868.fc6 2. Run few compilations... I am running now kernel-2.6.18-1.2849.fc6 to rule out the hardware. I wonder whether it is relevant or not, but when the HAL init.d system starts up, following message is generated: hdb: drive_cmd: status=0x51 { DriveReady SeekComplete Error } hdb: drive_cmd: error=0x04 { DriveStatusError } This drive is not a part of corrupted group. The corrupted vgroup is on hda. The disk configuration is following: VP_IDE: IDE controller at PCI slot 0000:00:11.1 ACPI: Unable to derive IRQ for device 0000:00:11.1 ACPI: PCI Interrupt 0000:00:11.1[A]: no GSI VP_IDE: chipset revision 6 VP_IDE: not 100% native mode: will probe irqs later VP_IDE: VIA vt8233a (rev 00) IDE UDMA133 controller on pci0000:00:11.1 ide0: BM-DMA at 0x9800-0x9807, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0x9808-0x980f, BIOS settings: hdc:DMA, hdd:pio Probing IDE interface ide0... hda: IC35L080AVVA07-0, ATA DISK drive hdb: WDC WD2000JB-00GVA0, ATA DISK drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hdc: PIONEER DVD-RW DVR-106D, ATAPI CD/DVD-ROM drive ide1 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 160836480 sectors (82348 MB) w/1863KiB Cache, CHS=65535/16/63, UDMA(100) hda: cache flushes supported hda: hda1 hda2 hdb: max request size: 512KiB hdb: 390721968 sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63, UDMA(100) hdb: cache flushes supported hdb: hdb1 hdb2 hdb3 I will try to post updates here.
I have the same/similar setup: =================================================================== # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 63197036 9496128 50438856 16% / /dev/sda3 101105 11311 84573 12% /boot tmpfs 1990808 0 1990808 0% /dev/shm =================================================================== having a x86_64 system running kernel-2.6.18-1.2868.fc6, which I rebuild by only selecting em64t processor option and turning on kernel debug options. I have recompiled many rpm packages with this kernel already and did not encounter ext3 corruption. Could it be your disk acting up? Sammy
Well, I cannot tell for sure it is the hardware but SMART reports no error. It may be just as well my 5 years old motherboard or some interaction between PATA harddisk and DVD burner sitting on the same cable...
I'd probably chalk it up to hardware, but it might be interesting to at least post some details on how your fs is corrupted, i.e. what fsck found. If SMART thinks things are ok, you might consider seeing if the driver mfgr has their own test bootdisk, sometimes that's more useful. I'd also scour the logs for any messages about hda if you are certain that the vgroup for the corrupted fs only contains hda. Seeing { DriveReady SeekComplete Error } on any drive in your system doesn't give me a good feeling about the reliability of the hardware in this box though :) -Eric
I would close this bug for now but I do not know what the proper resolution code should be. I have not been able to reproduce the disk corruption since. SMARTD signals nothing, forced disk checks reveal no new problems. The DriveReady errors are still there - but they occur only*once* per session, when HAL service is being started. Current kernel print additionally information: ide: failed opcode was: 0xb0 just after the error line.
I actually seem to recall seeing these one-time hdX spews on boot with certain drives, as well as someone saying it was just that the drive didn't support a certain feature and the message wasn't actually anything to worry about (aside from looking scary). Aha, there's the bug I was after: bug 214502, particularly the 4th comment from Dave Jones.
Yes, my remaining problem looks pretty much like the one described in bug 214502. It is very likely the disk corruption was unrelated.