Bug 597667 - ext4: journal corruption
Summary: ext4: journal corruption
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 13
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Eric Sandeen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-30 02:30 UTC by Dan Williams
Modified: 2010-06-04 12:58 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-06-04 12:58:21 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Dan Williams 2010-05-30 02:30:21 UTC
Started my F13 laptop this morning to find my root partition unmountable.  AFAIK I shut it down correctly last night using System -> Shut Down... and picking the Shut Down button.

smartctl reports no errors, and Windows boots fine so I don't think it's a hardware problem.

A Live CD complains that:

May 29 16:28:37 localhost kernel: JBD: no valid journal superblock found
May 29 16:28:37 localhost kernel: EXT4-fs (sda5): error loading journal

I went out and bought an external USB drive, dd-ed the borked /dev/sda5 (my root partition) to it, and I'm happy to provide the journal or other portions of it that you might need for diagnosis; let me know what dd commands you'd need to get what you want off of it.

fsck.ext4 recovery follows...

[root@localhost ~]# fsck.ext4 -v /dev/sda5
e2fsck 1.41.10 (10-Feb-2009)
Superblock has an invalid journal (inode 8).
Clear<y>? yes

*** ext3 journal has been deleted - filesystem is now ext2 only ***

Superblock has_journal flag is clear, but a journal inode is present.
Clear<y>? yes

F11-Preview-x86_ was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Journal inode is not in use, but contains data.  Clear<y>? yes

Pass 2: Checking directory structure   
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(360448--376831)
Fix<y>? yes
 
Free blocks count wrong for group #11 (5172, counted=21556).
Fix<y>? yes
         
Free blocks count wrong (9960550, counted=9976934).
Fix<y>? yes
       
Recreate journal<y>? yes

Creating journal (32768 blocks):  Done.

*** journal has been re-created - filesystem is now ext3 again ***

F11-Preview-x86_: ***** FILE SYSTEM WAS MODIFIED *****

 1216944 inodes used (18.05%)
    3931 non-contiguous files (0.3%)
     583 non-contiguous directories (0.0%)
         # of inodes with ind/dind/tind blocks: 0/0/0
         Extent depth histogram: 1196523/579/4
17005979 blocks used (63.10%)
       0 bad blocks
       6 large files

 1074638 regular files
  121817 directories
       9 character device files
       0 block device files
       0 fifos
  118280 links
   20131 symbolic links (19478 fast symbolic links)
     340 sockets
--------
 1335215 files

Comment 1 Eric Sandeen 2010-06-01 18:00:13 UTC
can you look at the dd image of the pre-fsck filesystem, and use debugfs to do a 

debugfs> stat <8>

to see what that journal inode looks like?

An e2image of the original fs might be useful too, then I can poke at it myself.

Thanks,
-Eric

Comment 2 Dan Williams 2010-06-01 22:24:38 UTC
debugfs 1.41.10 (10-Feb-2009)
debugfs:  stat <8>
Inode: 8   Type: regular    Mode:  0600   Flags: 0x80000
Generation: 0    Version: 0x00000000
User:     0   Group:     0   Size: 67108864
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 131072
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x49f21d55 -- Fri Apr 24 13:13:09 2009
atime: 0x00000000 -- Wed Dec 31 16:00:00 1969
mtime: 0x49f21d55 -- Fri Apr 24 13:13:09 2009
Size of extra inode fields: 0
EXTENTS:
(0-16383): 360448-376831


/etc/fstab is mounting this volume like so:

UUID=f89dd491-d9d6-43ba-a553-ab1c938c5ba0 /  ext4 defaults  1 1

Comment 3 Dan Williams 2010-06-01 22:32:26 UTC
Trying to do an e2image of the FS resulted in:

[dcbw@localhost ~]$ sudo e2image  /dev/sdb1 - | bzip2 -9 > sdb1.e2i.bz2
e2image 1.41.10 (10-Feb-2009)
lseek while writing header: Illegal seek
[dcbw@localhost ~]$ dmesg | tail
scsi 4:0:0:0: Direct-Access     Seagate  Portable         0130 PQ: 0 ANSI: 4
sd 4:0:0:0: Attached scsi generic sg2 type 0
sd 4:0:0:0: [sdb] 625142448 512-byte logical blocks: (320 GB/298 GiB)
sd 4:0:0:0: [sdb] Write Protect is off
sd 4:0:0:0: [sdb] Mode Sense: 2f 08 00 00
sd 4:0:0:0: [sdb] Assuming drive cache: write through
sd 4:0:0:0: [sdb] Assuming drive cache: write through
 sdb: sdb1
sd 4:0:0:0: [sdb] Assuming drive cache: write through
sd 4:0:0:0: [sdb] Attached SCSI disk


note that the partition I dd-ed my original fs onto is 320GB; the original fs itself is only 110GB.  I just did:

[root@localhost ~]# dd if=/dev/sda5 of=/dev/sdb1 bs=4096
26950144+0 records in
26950144+0 records out
110387789824 bytes (110 GB) copied, 4140.14 s, 26.7 MB/s

more or less, so the /dev/sdb1 is obviously larger than the original FS.

Not sure if that's something that e2image is complaining about.

Comment 4 Eric Sandeen 2010-06-02 00:51:35 UTC
Ok, thanks.  For some reason I thought it was corrupt, but no ....

> Journal inode is not in use, but contains data.  Clear<y>? yes

The problem is that the inode bitmap thought the journal inode was free.  That's ... odd.

I assume that:

debugfs> testi <8>

shows it not in use?

I wonder if it's possible that something overwrote the first part of your partition.

Maybe you can:

dd if=/dev/$WHATEVER bs=4k count=256 of=first-one-meg

bzip2 that, and attach it?  I can see what the first part of the block device looks like, maybe we'll recognize something that stomped on it.

However, I guess in the total scope of the repair, not much looked damaged.  So maybe that theory is off base ...

-Eric

Comment 5 Eric Sandeen 2010-06-02 00:55:20 UTC
for e2image you'll need to do the -r option to zip it etc:

> e2image -r /dev/hda1 - | bzip2 > hda1.e2i.bz2

etc...

Comment 6 Dan Williams 2010-06-02 08:36:01 UTC
(In reply to comment #4)
> Ok, thanks.  For some reason I thought it was corrupt, but no ....
> 
> > Journal inode is not in use, but contains data.  Clear<y>? yes
> 
> The problem is that the inode bitmap thought the journal inode was free. 
> That's ... odd.
> 
> I assume that:
> 
> debugfs> testi <8>
> 
> shows it not in use?

[dcbw@localhost ~]$ sudo debugfs /dev/sdb1
debugfs 1.41.10 (10-Feb-2009)
debugfs:  testi <8>
Inode 8 is marked in use

Comment 9 Eric Sandeen 2010-06-04 12:58:21 UTC
I looked at this a bit yesterday, and it seems that nothing other than the journal was affected; somehow the first part appears to have been overwritten by 0s.  The 2 blocks preceding the journal weren't listed as belonging to any existing file, so there was no clue there.  Since we have no reproducer, and this has never been seen before, I don't know that we can resolve this one with the information we have now.

So, sadly closing CANTFIX.  If you see this again, though, please speak up!

Thanks,
-Eric


Note You need to log in before you can comment on or make changes to this bug.