Bug 66308 - Filesystem corruption under ext3
Summary: Filesystem corruption under ext3
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.2
Hardware: athlon
OS: Linux
medium
high
Target Milestone: ---
Assignee: Stephen Tweedie
QA Contact: Aaron Brown
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-06-07 14:03 UTC by John
Modified: 2007-04-18 16:43 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2002-06-10 07:15:59 UTC
Embargoed:


Attachments (Terms of Use)

Description John 2002-06-07 14:03:07 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 Galeon/1.2.0 (X11; Linux i686; U;) Gecko/20020516

Description of problem:
Recently my computer overheated and crashed.  (It's a dual processor Athlon MP
1600 in an unairconditioned room. :)  Not real bright of me.)

I restarted it, and it then _crashed during journal recovery._  I doubt the
crash was due to Linux... I suspect the machine had cooled insufficiently and
crashed.

I let it cool for a long time, and then restarted again.  I did a journal
recovery (I believe) and then moved on w/o any interaction with me.

Later, though, some filesystem corruption was found.  Running "ls" in one of the
home directories returned:
  ls: kpulse10.f: Input/output error
  ls: x9pt003.dat: Input/output error
  ls: x9pr003.dat: Input/output error
  ls: x9ps003.dat: Input/output error
  ls: x9pp003.dat: Input/output error
      fftw_f77.i  kpuls10.dat  kpulse10.out  test.dat  test.f  x9pk003.dat

Looking back at /var/log/messages, there six instances of fsck clearing orphaned
inodes on /home:
Jun  5 22:52:56 crossbow fsck: /home: Clearing orphaned inode 2305690 (uid=502,
gid=502, mode=0100664, size=0) 
Jun  5 22:52:56 crossbow fsck: /home: Clearing orphaned inode 2305695 (uid=502,
gid=502, mode=0100664, size=0) 
Jun  5 22:53:39 crossbow kernel: 0x378: FIFO is 16 bytes
Jun  5 22:52:56 crossbow fsck: /home: Clearing orphaned inode 2305694 (uid=502,
gid=502, mode=0100664, size=0) 
Jun  5 22:53:40 crossbow kernel: 0x378: writeIntrThreshold is 9
Jun  5 22:52:57 crossbow fsck: /home: Clearing orphaned inode 2305696 (uid=502,
gid=502, mode=0100664, size=53248) 
Jun  5 22:53:40 crossbow kernel: 0x378: readIntrThreshold is 9
Jun  5 22:52:57 crossbow fsck: /home: Clearing orphaned inode 2305687 (uid=502,
gid=502, mode=0100664, size=0) 
Jun  5 22:52:57 crossbow fsck: /home: Clearing orphaned inode 2305686 (uid=502,
gid=502, mode=0100775, size=378017) 
Jun  5 22:52:57 crossbow fsck: /home: Clearing orphaned inode 1635643 (uid=500,
gid=500, mode=0100700, size=3184) 
Jun  5 22:52:57 crossbow fsck: /home: clean, 89262/4480448 files,
2354637/8960253 blocks 

(I can't seem to find the startup which crashed while recovering the journal in
the log.   Weird.  I'll have to make another search for it.)

I e2fscked the /home partition, and that seems to have fixed the problem.

But, I thought ext3 was supposed to fix this sort of thing.  It looks like the
journaling system does not gracefully recover if the system crashes during
journal recovery.  (I could be wrong...  that's just what it looks like to me.)

Hopefully this is useful!  Let me know if you need any more information!

John

Version-Release number of selected component (if applicable):


How reproducible:
Didn't try


Additional info:

Comment 1 Stephen Tweedie 2002-06-10 09:48:22 UTC
"I/O error" usually means that the disk has a bad sector and the filesystem
cannot read data from it.

Journaling filesystems protect you from the effects of a crash *assuming the
data is still intact on disk*.  Ext3 does survive a crash during recovery
perfectly well.  However, if you get I/O errors and bad sectors on disk, then
there's nothing that the journaling filesystem can do to correct that.  e2fsck
might fix it simply by removing the unreadable files completely.

I/O errors also occasionally occur because there is corrupt data on disk even if
that data is still readable.  Journaling relies on the data that the filesystem
sent to disk being written correctly.  If the hardware is overheating then it is
quite possible for the data to get corrupted, and in that case again the
filesystem is powerless to protect you: there's no point in writing a journal
carefully to disk if the memory, controller, cpu or disk drive is flipping bits
in the journal on its way.  Again, a full fsck may be required to sort out the
mess afterwards.


Note You need to log in before you can comment on or make changes to this bug.