From Bugzilla Helper: User-Agent: Mozilla/4.75 [en] (X11; U; SunOS 5.6 sun4u) This may be an issue with the ext2 filesystem itself rather than e2fsprogs, but this is the nearest appropriate component I could find. I have been having files go missing for a couple of days (which I initially blamed on tmpwatch in bug 27145). However today some ext2fs errors appeared on the console. When I rebooted the system I had to run fsck manually which reported a lot of errors. Have I been unlucky to hit a bad block, or are there underlying problems with the filesystem software? (I don't remember any problems before I upgraded to Fisher). Reproducible: Didn't try Here are some extracts from /var/adm/messages Feb 14 09:23:33 itspc116 kernel: attempt to access beyond end of device Feb 14 09:23:33 itspc116 kernel: 03:03: rw=0, want=8388820, limit=901152 Feb 14 09:23:33 itspc116 kernel: EXT2-fs error (device ide0(3,3)): ext2_readdir: directory #54593 contains a hole at offset 0 Feb 14 09:23:33 itspc116 kernel: attempt to access beyond end of device Feb 14 09:23:33 itspc116 kernel: 03:03: rw=0, want=10485788, limit=901152 Feb 14 09:23:33 itspc116 kernel: EXT2-fs error (device ide0(3,3)): ext2_readdir: directory #54593 contains a hole at offset 4096 Feb 14 09:23:33 itspc116 kernel: attempt to access beyond end of device Feb 14 09:23:33 itspc116 kernel: 03:03: rw=0, want=6291560, limit=901152 Feb 14 09:23:33 itspc116 kernel: EXT2-fs error (device ide0(3,3)): ext2_readdir: directory #54593 contains a hole at offset 8192 Feb 14 09:23:33 itspc116 kernel: attempt to access beyond end of device Feb 14 09:23:33 itspc116 kernel: 03:03: rw=2, want=538050772, limit=901152 Feb 14 09:23:33 itspc116 kernel: EXT2-fs error (device ide0(3,3)): ext2_readdir: bad entry in directory #54593: rec_len %% 4 != 0 - offset=0, inode=33188, rec_l en=831, name_len=0
We (Red Hat) should really try to resolve this before next release.
I have found further disk corruption, having previously successfully run fsck -f (single user) without errors. fsck -nf now tells me thing like Inode 55105 has illegal block(s). Illegal block #0 (4041469680) in inode 55105. IGNORED. and Error while iterating over blocks in inode 55105: Illegal triply indirect block found Also I have spotted errors such as the following occuring in the /var/log/messages file, which didn't occur when I was running RH6.2 . Feb 16 09:21:05 itspc116 kernel: hda: status error: status=0x58 { DriveReady SeekComplete DataRequest } Feb 16 09:21:06 itspc116 kernel: hda: drive not ready for command
What ide controller, mainboard and disks are you using. What is the exact version of the kernel that is running on this system? This seems to be a kernel problem with resulting disk corruption. I'll reassign this to the kernel rpm, but will watch further info about it.
I am not an expert on hardware so I hope this makes sense Motherboard: ATX (pentium 166) "RM Advanced/ML Pentium Systemboard" IDE controller: PIIX3 "82371SB PCI ISA/IDE Xcelerator" Hard Disk: ST32132A Kernel: 2.4.0-0.99.11
Can you please try newer kernels from ftp://ftp.redhat.com/pub/rawhide/i386/RedHat/RPMS/ or from ftp://ftp.redhat.com/pub/redhat/beta/wolverine/i386/RedHat/RPMS/ to check if newer kernels have this already fixed?
I have upgraded the kernel to that in wolverine (2.4.1-0.1.9). The "drive not ready" messages are still there (if they are in fact related to the disk corruption), eg. Feb 22 11:36:25 itspc116 kernel: hda: status error: status=0x58 { DriveReady SeekComplete DataRequest } Feb 22 11:36:25 itspc116 kernel: hda: drive not ready for command but it may be several days before the disk corruption reappears, if the upgrade hasn't fixed it.
I have some more evidence that suggests the problem is still there. I upgraded the XFree packages to wolverine, and afterwards fsck reports some block bitmap differences, even when the file system is mounted read-only, eg. Pass 5: Checking group summary information Block bitmap differences: -186349 -186350 -186351 -186352 -186353 -186354 -186355 -187286 -187287 -187288 -187289 -187290 -187291 -187292 -195118 -195119 -195120 -195121 -195122 -195123 -195124 -195203 -195204 -195205 -195206 -195207 -195208 -195209
I had some more file corruption yesterday. I logged the fsck session to clean it (from a second partition) if this information would be useful.
Can you try 2.4.1-0.1.14 from rawhide? If that doesn't fix it, the next rawhide we put out will have a "nodma" option that will make it easier to debug this.
I am on 2.4.1-0.1.14 now, I haven't seen any file corruption yet, (though the system has hung, requiring a reset), but entries in /var/log/messages look suspicious, for example Mar 1 14:01:19 itspc116 kernel: hda: status error: status=0x58 { DriveReady SeekComplete DataRequest } Mar 1 14:01:19 itspc116 kernel: hda: drive not ready for command Mar 1 14:01:19 itspc116 kernel: attempt to access beyond end of device Mar 1 14:01:19 itspc116 kernel: 03:03: rw=0, want=790435384, limit=901152
I have had some more file/directory corruption with the 2.4.2-0.1.19 kernel. I have a log of the fsck session afterwards (logged to a separate partition) and messages in /var/log/messages if any of this is useful.
I have had more corruption with 2.4.2-0.1.28 (while I was upgrading to 2.4.2-0.1.49). Again I have more details if you want them.
{ DriveReady SeekComplete DataRequest } is usually an indication that your cables are outside of the allowed limits. This usually shows up only when using the higher DMA modes which our kernel now does for a while. If you don't want to change cables, you can always boot with "ide=nodma" on the commandline of the kernel (eg on the lilo prompt).... Please test this and reopen the bug if this doesn't help.
I encountered this problem, and the ide=nodma boot option avoids it, but I'm doubtful of a cable problem since this system is a laptop. Error indications in syslog were: May 15 10:14:07 cjohnsonPC kernel: hda: status error: status=0x58 { DriveReady SeekComplete DataRequest } May 15 10:14:07 cjohnsonPC kernel: hda: drive not ready for command The HW/SW information in syslog were: May 15 10:11:52 cjohnsonPC kernel: Uniform Multi-Platform E-IDE driver Revision: 6.31 May 15 10:11:52 cjohnsonPC kernel: ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx May 15 10:11:52 cjohnsonPC kernel: PIIX4: IDE controller on PCI bus 00 dev 39 May 15 10:11:52 cjohnsonPC kernel: ide0: BM-DMA at 0x38a0-0x38a7, BIOS settings: hda:DMA, hdb:DMA May 15 10:11:52 cjohnsonPC kernel: ide1: BM-DMA at 0x38a8-0x38af, BIOS settings: hdc:pio, hdd:pio May 15 10:11:52 cjohnsonPC kernel: hda: IBM-DARA-212000, ATA DISK drive May 15 10:11:52 cjohnsonPC kernel: ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 May 15 10:11:52 cjohnsonPC kernel: hda: 23579136 sectors (12073 MB) w/418KiB Cache, CHS=1559/240/63, UDMA(33) May 15 10:11:52 cjohnsonPC kernel: hda: hda1 hda2 hda3 < hda5 hda6 hda7 > I am running kernel-2.4.2-2 i686 straight out of RH 7.1. The laptop is a Compaq Armada M700. If someone at RedHat or Compaq would persue this issue I would gladly provide any needed info.