27614 – File system errors

Bug 27614 - File system errors

Summary: File system errors

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.1
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Stephen Tweedie
QA Contact:	Aaron Brown
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-02-14 14:18 UTC by Michael Young
Modified:	2007-04-18 16:31 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2001-04-06 12:14:49 UTC
Embargoed:

Attachments	(Terms of Use)

Description Michael Young 2001-02-14 14:18:46 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.75 [en] (X11; U; SunOS 5.6 sun4u)

This may be an issue with the ext2 filesystem itself rather than e2fsprogs,
but this
is the nearest appropriate component I could find.
I have been having files go missing for a couple of days (which I initially
blamed on tmpwatch in bug 27145). However today some ext2fs errors appeared
on the
console. When I rebooted the system I had to run fsck manually which
reported a lot of errors. Have I been unlucky to hit a bad block, or are
there underlying problems with the filesystem software? (I don't remember
any problems before I upgraded to Fisher). 

Reproducible: Didn't try

Here are some extracts from /var/adm/messages
Feb 14 09:23:33 itspc116 kernel: attempt to access beyond end of device
Feb 14 09:23:33 itspc116 kernel: 03:03: rw=0, want=8388820, limit=901152
Feb 14 09:23:33 itspc116 kernel: EXT2-fs error (device ide0(3,3)):
ext2_readdir:
 directory #54593 contains a hole at offset 0
Feb 14 09:23:33 itspc116 kernel: attempt to access beyond end of device
Feb 14 09:23:33 itspc116 kernel: 03:03: rw=0, want=10485788, limit=901152
Feb 14 09:23:33 itspc116 kernel: EXT2-fs error (device ide0(3,3)):
ext2_readdir:
 directory #54593 contains a hole at offset 4096
Feb 14 09:23:33 itspc116 kernel: attempt to access beyond end of device
Feb 14 09:23:33 itspc116 kernel: 03:03: rw=0, want=6291560, limit=901152
Feb 14 09:23:33 itspc116 kernel: EXT2-fs error (device ide0(3,3)):
ext2_readdir:
 directory #54593 contains a hole at offset 8192
Feb 14 09:23:33 itspc116 kernel: attempt to access beyond end of device
Feb 14 09:23:33 itspc116 kernel: 03:03: rw=2, want=538050772, limit=901152
Feb 14 09:23:33 itspc116 kernel: EXT2-fs error (device ide0(3,3)):
ext2_readdir:
 bad entry in directory #54593: rec_len %% 4 != 0 - offset=0, inode=33188,
rec_l
en=831, name_len=0

Comment 1 Glen Foster 2001-02-16 01:12:01 UTC

We (Red Hat) should really try to resolve this before next release.

Comment 2 Michael Young 2001-02-16 15:56:49 UTC

I have found further disk corruption, having previously successfully run fsck -f
(single user) without errors. fsck -nf now tells me thing like
Inode 55105 has illegal block(s).
Illegal block #0 (4041469680) in inode 55105.  IGNORED.
and
Error while iterating over blocks in inode 55105: Illegal triply indirect block
found

Also I have spotted errors such as the following occuring in the
/var/log/messages file, which didn't occur when I was running RH6.2 .

Feb 16 09:21:05 itspc116 kernel: hda: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Feb 16 09:21:06 itspc116 kernel: hda: drive not ready for command

Comment 3 Florian La Roche 2001-02-21 14:27:44 UTC

What ide controller, mainboard and disks are you using. What is the exact
version of the
kernel that is running on this system?

This seems to be a kernel problem with resulting disk corruption. I'll reassign
this to the kernel
rpm, but will watch further info about it.

Comment 4 Michael Young 2001-02-21 14:55:45 UTC

I am not an expert on hardware so I hope this makes sense
Motherboard: ATX (pentium 166) "RM Advanced/ML Pentium Systemboard"
IDE controller: PIIX3 "82371SB PCI ISA/IDE Xcelerator"
Hard Disk: ST32132A
Kernel: 2.4.0-0.99.11

Comment 5 Florian La Roche 2001-02-21 16:29:45 UTC

Can you please try newer kernels from
ftp://ftp.redhat.com/pub/rawhide/i386/RedHat/RPMS/
or from ftp://ftp.redhat.com/pub/redhat/beta/wolverine/i386/RedHat/RPMS/ to
check if
newer kernels have this already fixed?

Comment 6 Michael Young 2001-02-22 12:24:01 UTC

I have upgraded the kernel to that in wolverine (2.4.1-0.1.9). The "drive not
ready" messages are still there (if they are in fact related to the disk
corruption), eg.
Feb 22 11:36:25 itspc116 kernel: hda: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Feb 22 11:36:25 itspc116 kernel: hda: drive not ready for command
but it may be several days before the disk corruption reappears, if the upgrade
hasn't fixed it.

Comment 7 Michael Young 2001-02-23 11:02:43 UTC

I have some more evidence that suggests the problem is still there. I upgraded
the XFree packages to wolverine, and afterwards fsck reports some block bitmap
differences, even when the file system is mounted read-only, eg.
Pass 5: Checking group summary information
Block bitmap differences:  -186349 -186350 -186351 -186352 -186353 -186354
-186355 -187286 -187287 -187288 -187289 -187290 -187291 -187292 -195118 -195119
-195120 -195121 -195122 -195123 -195124 -195203 -195204 -195205 -195206 -195207
-195208 -195209

Comment 8 Michael Young 2001-02-28 15:40:57 UTC

I had some more file corruption yesterday. I logged the fsck session to clean it
(from a second partition) if this information would be useful.

Comment 9 Michael K. Johnson 2001-03-01 03:55:33 UTC

Can you try 2.4.1-0.1.14 from rawhide?

If that doesn't fix it, the next rawhide we put out will have a
"nodma" option that will make it easier to debug this.

Comment 10 Michael Young 2001-03-02 17:44:27 UTC

I am on 2.4.1-0.1.14 now, I haven't seen any file corruption yet, (though the
system has hung, requiring a reset), but entries in /var/log/messages look
suspicious, for example

Mar  1 14:01:19 itspc116 kernel: hda: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Mar  1 14:01:19 itspc116 kernel: hda: drive not ready for command
Mar  1 14:01:19 itspc116 kernel: attempt to access beyond end of device
Mar  1 14:01:19 itspc116 kernel: 03:03: rw=0, want=790435384, limit=901152

Comment 11 Michael Young 2001-03-08 12:27:42 UTC

I have had some more file/directory corruption with the 2.4.2-0.1.19 kernel. I
have a log of the fsck session afterwards (logged to a separate partition) and
messages in /var/log/messages if any of this is useful.

Comment 12 Michael Young 2001-04-06 12:14:46 UTC

I have had more corruption with 2.4.2-0.1.28 (while I was upgrading to
2.4.2-0.1.49). Again I have more details if you want them.

Comment 13 Arjan van de Ven 2001-04-07 20:19:37 UTC

{ DriveReady SeekComplete DataRequest } is usually an indication that
your cables are outside of the allowed limits. This usually shows up only
when using the higher DMA modes which our kernel now does for a while. 
If you don't want to change cables, you can always boot with "ide=nodma" on the
commandline of the kernel (eg on the lilo prompt)....

Please test this and reopen the bug if this doesn't help.

Comment 14 Christopher Johnson 2001-05-17 13:25:53 UTC

I encountered this problem, and the ide=nodma boot option avoids it, but I'm
doubtful of a cable problem since this system is a laptop.

Error indications in syslog were:
May 15 10:14:07 cjohnsonPC kernel: hda: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
May 15 10:14:07 cjohnsonPC kernel: hda: drive not ready for command

The HW/SW information in syslog were:
May 15 10:11:52 cjohnsonPC kernel: Uniform Multi-Platform E-IDE driver Revision:
6.31
May 15 10:11:52 cjohnsonPC kernel: ide: Assuming 33MHz system bus speed for PIO
modes; override with idebus=xx
May 15 10:11:52 cjohnsonPC kernel: PIIX4: IDE controller on PCI bus 00 dev 39
May 15 10:11:52 cjohnsonPC kernel:     ide0: BM-DMA at 0x38a0-0x38a7, BIOS
settings: hda:DMA, hdb:DMA
May 15 10:11:52 cjohnsonPC kernel:     ide1: BM-DMA at 0x38a8-0x38af, BIOS
settings: hdc:pio, hdd:pio
May 15 10:11:52 cjohnsonPC kernel: hda: IBM-DARA-212000, ATA DISK drive
May 15 10:11:52 cjohnsonPC kernel: ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
May 15 10:11:52 cjohnsonPC kernel: hda: 23579136 sectors (12073 MB) w/418KiB
Cache, CHS=1559/240/63, UDMA(33)
May 15 10:11:52 cjohnsonPC kernel:  hda: hda1 hda2 hda3 < hda5 hda6 hda7 >

I am running kernel-2.4.2-2 i686 straight out of RH 7.1.
The laptop is a Compaq Armada M700.

If someone at RedHat or Compaq would persue this issue I would gladly provide
any needed info.

Note You need to log in before you can comment on or make changes to this bug.