Bug 51255 - Filesystem (inode) corruption on ext3fs/raid0
Filesystem (inode) corruption on ext3fs/raid0
Status: CLOSED WORKSFORME
Product: Red Hat Public Beta
Classification: Retired
Component: kernel (Show other bugs)
roswell
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Stephen Tweedie
Brock Organ
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-08-08 15:44 EDT by Alexandre Oliva
Modified: 2007-04-18 12:35 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2001-08-24 09:06:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alexandre Oliva 2001-08-08 15:44:48 EDT
I'm running a plain roswell installation for a couple of days on a Dell
Inspiron 8000 with PIII 1.0GHz, 512MB of RAM and nVidia GeForce2Go video
card.  Installation was almost flawless: XFree4.1 is known to not support
this video card properly.  I converted a 52GB ext2 filesystem to ext3
(tune2fs -j).  The filesystem was initially created upon a RAID0 device
using most of the internal 48GB disk and all of  the modular-bay 20GB disk,
using the Red Hat Linux 7.1 installer (this is what I had originally
installed on this machine).

After a few days of relatively heavy use (GCC rebuilding included), the
following message appeared in /var/log/messages:

Aug  5 17:54:55 feijoada kernel: EXT3-fs error (device md(9,0)):
ext3_readdir: bad entry in directory #292: directory entry across blocks -
offset=0, inode=3343644091, rec_len=40264, name_len=181

Actually, I only noticed this when rsyncing a sub-directory containing what
used to be a file in inode 292: it said the file in inode 290 was not a
regular file and was being skipped.  Indeed, inodes 290, 291 and 292 had
been corrupted.  292 had become a directory, and inodes 290 and 291 had
remained as files, but quite unusual ones, containing junk.  They used to
be very small (<1024 bytes) files before the corruption.

I removed one of them, but couldn't remove the other, nor the directory. 
It's worth noting that these files hadn't been touched for ages (except
that the laptop has been with me for 2 weeks, so they were copied over
correctly, but they had been consistent before, as my backups have shown).

I rebooted the machine, and it started `fsck'ing the filesystem.  Later I
found the following messages in /var/log/messages:

Aug  6 02:32:31 feijoada kernel: EXT3-fs error (device md(9,0)):
ext3_readdir: bad entry in directory #292: directory entry across blocks -
offset=0, inode=3343644091, rec_len=40264, name_len=181
Aug  6 02:34:15 feijoada kernel: attempt to access beyond end of device
Aug  6 02:34:15 feijoada kernel: 09:00: rw=0, want=469778452, limit=52893760
Aug  6 02:34:15 feijoada kernel: attempt to access beyond end of device
Aug  6 02:34:15 feijoada kernel: 09:00: rw=0, want=604217876, limit=52893760
[this goes on and on forever; apparently, for as long as fsck was running]

fsck removed inode 291 as junk, and ``recovered'' the directory in inode
292, after copying a dup block shared with another file I hadn't touched
since the initial installation (originally, 7.1).  lsattr showed the
directory was unmodifiable, unremovable and append-only; I used chattr to
remove those attributes and managed to delete the directory.  The other
file containing the dup block was a zip file, that verified correctly after
recovery, so it wasn't damaged.
find -nouser hasn't found any other weird inodes, neither right after fsck
nor ever since.

One notable point: I'm not using the driver supplied by nVidia for XFree4,
but rather the vesa driver that comes with XFree4.1, that works flawlessly,
even though somewhat slower and with a lower resolution (1280x1024x16
instead of 1600x1200x16).  So the nVidia driver is not to blame.

Another notable point: I haven't been using suspend, to memory or to disk,
since the upgrade to 7.2beta (just because I didn't have a need for that),
so apm is not to blame either.

Yet another notable point: I've been using tmpfs for /tmp, which is my
TMPDIR, but I"m not 100% sure I already had it enabled when the problem
showed up.  Anyway, it's been enabled ever since, and no further problems
appear to have occurred.
Unfortunately, this is one of those Heisenbugs that are hard to duplicate
and harder to fix.  I hope this bug report gets some ``me too''s that may
help narrowing down the causes of the problem.
Comment 1 Glen Foster 2001-08-09 16:03:23 EDT
We (Red Hat) really need to fix this defect before next release.
Comment 2 Stephen Tweedie 2001-08-24 09:06:39 EDT
Has there been any more misbehaviour from this machine?  Given the lack of
information here, all we can assume right now is that "something on disk or in
memory became corrupt somehow".  If there's nothing else to go on, then we just
need to close this as WORKSFORME and keep the bugzilla report on file in case
anything like it ever shows up in the future --- for now it's as likely to be
hardware as software from the information available.
Comment 3 Alexandre Oliva 2001-08-24 09:39:52 EDT
No further corruptions.  I left tmpfs working for a while, then disabled it,
then tried the nVidia driver for a moment, then upgraded to 7.1.94's kernel,
then to a patched version thereof, and everything has been fine.  I think it is
reasonable to assume the bug was fixed.

Note You need to log in before you can comment on or make changes to this bug.