Bug 142248
Summary: | Possible ext3 filesystem corruption with 667 or 681 kernel | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Philippe Rigault <prigault> |
Component: | kernel | Assignee: | Alan Cox <alan> |
Status: | CLOSED CANTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 3 | CC: | alan, davej, jesus.salvo, sct, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-10-03 00:41:33 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Philippe Rigault
2004-12-08 15:32:44 UTC
fsck can only check filesystem metadata. It has no idea what file contents should look like, so any corruption that has only hit internal file data blocks will not be visible to fsck. "rpm -V" has access to file checksums so is able to check the correctness of file contents. "rpm -Va" to check all packages (though be aware that things like config files are expected to have changed since install time.) can you paste the output of 'lspci' please ? hdparm -i /dev/hda and any ide related messages from 'dmesg' ? Might also be worth looking through /var/log/messages to see if theres any other nasty looking IDE messages above the ones you pasted. I forgot to mention that hda is a CD-RW and hdc is the hard disk. > can you paste the output of 'lspci' please ? 00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 03) 00:01.0 PCI bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE Host-to-AGP Bridge (rev 03) 00:1d.0 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 02) 00:1d.7 USB Controller: Intel Corp. 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 02) 00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev 82) 00:1f.0 ISA bridge: Intel Corp. 82801DB/DBL (ICH4/ICH4-L) LPC Interface Bridge (rev 02) 00:1f.1 IDE interface: Intel Corp. 82801DB (ICH4) IDE Controller (rev 02) 00:1f.5 Multimedia audio controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 02) 00:1f.6 Modem: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Modem Controller (rev 02) 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility M7 LW [Radeon Mobility 7500] 02:01.0 Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01) 02:02.0 Network controller: Broadcom Corporation BCM4309 802.11a/b/g (rev 02) 02:04.0 CardBus bridge: Texas Instruments PCI4510 PC card Cardbus Controller (rev 02) 02:04.1 FireWire (IEEE 1394): Texas Instruments PCI4510 IEEE-1394 Controller >hdparm -i /dev/hda /dev/hda: Model=HL-DT-STCD-RW/DVD-ROM GCC-4240N, FwRev=E112, SerialNo= Config={ Fixed Removeable DTR<=5Mbs DTR>10Mbs nonMagnetic } RawCHS=0/0/0, TrkSize=0, SectSize=0, ECCbytes=0 BuffType=unknown, BuffSize=0kB, MaxMultSect=0 (maybe): CurCHS=0/0/0, CurSects=0, LBA=yes, LBAsects=0 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 *udma2 AdvancedPM=no Drive conforms to: device does not report version: * signifies the current active mode While I am at it, her is hdc: /dev/hdc: Model=IC25N060ATMR04-0, FwRev=MO3OAD0A, SerialNo=MRG357K3HPJBYH Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=DualPortCache, BuffSize=7884kB, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=117210240 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=yes: mode=0x80 (128) WriteCache=enabled Drive conforms to: ATA/ATAPI-6 T13 1410D revision 3a: * signifies the current active mode > and any ide related messages from 'dmesg' ? Indeed, I don't like the 'Wait for ready failed before probe' ones: PCI: Enabling device 0000:00:1f.1 (0005 -> 0007) ACPI: PCI interrupt 0000:00:1f.1[A] -> GSI 11 (level, low) -> IRQ 11 ICH4: chipset revision 2 ICH4: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xbfa0-0xbfa7, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xbfa8-0xbfaf, BIOS settings: hdc:DMA, hdd:pio Probing IDE interface ide0... hda: HL-DT-STCD-RW/DVD-ROM GCC-4240N, ATAPI CD/DVD-ROM drive Using cfq io scheduler ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hdc: IC25N060ATMR04-0, ATA DISK drive ide1 at 0x170-0x177,0x376 on irq 15 Probing IDE interface ide2... ide2: Wait for ready failed before probe ! Probing IDE interface ide3... ide3: Wait for ready failed before probe ! Probing IDE interface ide4... ide4: Wait for ready failed before probe ! Probing IDE interface ide5... ide5: Wait for ready failed before probe ! hdc: max request size: 1024KiB hdc: 117210240 sectors (60011 MB) w/7884KiB Cache, CHS=16383/255/63, UDMA(100) hdc: cache flushes supported hdc: hdc1 hdc2 hdc3 hdc4 < hdc5 hdc6 hdc7 hdc8 hdc9 hdc10 > hda: ATAPI 24X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 ide-floppy driver 0.99.newide > Might also be worth looking through /var/log/messages to see if theres any other > nasty looking IDE messages above the ones you pasted. I am sending to you privately the complete /var/log/messages (17k compressed). The first thing I noticed is that the first time the machine booted after the install (667 kernel), there were no messages like "Wait for ready failed before probe !": Dec 7 15:02:27 mybox kernel: RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize Dec 7 15:02:27 mybox kernel: Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 Dec 7 15:02:27 mybox kernel: ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx Dec 7 15:02:27 mybox kernel: ICH4: IDE controller at PCI slot 0000:00:1f.1 Dec 7 15:02:27 mybox kernel: PCI: Enabling device 0000:00:1f.1 (0005 -> 0007) Dec 7 15:02:27 mybox kernel: ACPI: PCI interrupt 0000:00:1f.1[A] -> GSI 11 (level, low) -> IRQ 11 Dec 7 15:02:27 mybox kernel: ICH4: chipset revision 2 Dec 7 15:02:27 mybox kernel: ICH4: not 100%% native mode: will probe irqs later Dec 7 15:02:27 mybox kernel: ide0: BM-DMA at 0xbfa0-0xbfa7, BIOS settings: hda:DMA, hdb:pio Dec 7 15:02:27 mybox kernel: ide1: BM-DMA at 0xbfa8-0xbfaf, BIOS settings: hdc:DMA, hdd:pio Dec 7 15:02:27 mybox kernel: hda: HL-DT-STCD-RW/DVD-ROM GCC-4240N, ATAPI CD/DVD-ROM drive Dec 7 15:02:27 mybox kernel: Using cfq io scheduler Dec 7 15:02:27 mybox kernel: ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Dec 7 15:02:27 mybox kernel: hdc: IC25N060ATMR04-0, ATA DISK drive Dec 7 15:02:27 mybox kernel: ide1 at 0x170-0x177,0x376 on irq 15 Dec 7 15:02:27 mybox kernel: hdc: max request size: 1024KiB Dec 7 15:02:27 mybox kernel: hdc: 117210240 sectors (60011 MB) w/7884KiB Cache, CHS=16383/255/63, UDMA(100) Dec 7 15:02:27 mybox kernel: hdc: hdc1 hdc2 hdc3 hdc4 < hdc5 hdc6 hdc7 hdc8 hdc9 hdc10 > Dec 7 15:02:27 mybox kernel: hda: ATAPI 24X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33) Dec 7 15:02:27 mybox kernel: Uniform CD-ROM driver Revision: 3.20 Dec 7 15:02:27 mybox kernel: ide-floppy driver 0.99.newide The probe messages are just escaped irrelevant debug. Ignore those. The rest seems in itself like the machine had problems with drives - is this a laptop that is getting suspended/restored ? The two sets of traces are DMA fails (still running) We whack the hard disk to reset it The disk stays busy We try PIO [end of log] The other one is quite similar - the IDE CD decided it was busy still after our timeout. There was no sense data available and we tried to switch down to PIO > The rest seems in itself like the machine had problems with drives - is this a > laptop that is getting suspended/restored ? Not since the install of FC3. BUT _prior_ to the FC3 install, it did run FC2 with a custom kernel (vanilla 2.6.9 patched with swsusp2, which worked brilliantly btw) and has been suspended/restored a few times. The partition that had the corruption problem was formatted during FC3 install though. I had this problem as well, with the last 2.6.9 FC3 kernel before the 2.6.10 kernel was released, since I do recall updating the kernel last week ( and the last 2.6.9 kernel was 724 ). Hardware: Dell PowerEdge 750. No LVM, no RAID used ( neither hardware or software ). I recall seeing "ext3 journal aborted" several times on the console. Since its all hosed anyway, am reinstalling FC3, this time I'll be updating to the latest 2.6.10 kernel. I have similar issues with a fresh FC3 install on all new hardware. Drive is IDE UDMA 100, Seagate. Machine is a AMD desktop. I do not have details on specifics, but the above messages are similar. I see ext3 journal aborted in a steady stream. I also seem to have had similar problems with a AMD 64 bit system using a serial ATA drive with a promise controller and a VIA chipset. I updated to the latest FC3 updates and it seems to have fixed the 64-bit system. This is a major bad bug, though I have no clue where it is. As a side note, be sure to update HAL when you update the kernel-for those of us used to older distros. An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you. This bug has been automatically closed as part of a mass update. It had been in NEEDINFO state since July 2005. If this bug still exists in current errata kernels, please reopen this bug. There are a large number of inactive bugs in the database, and this is the only way to purge them. Thank you. |