Bug 597667 - ext4: journal corruption
Summary: ext4: journal corruption
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel   
(Show other bugs)
Version: 13
Hardware: All Linux
Target Milestone: ---
Assignee: Eric Sandeen
QA Contact: Fedora Extras Quality Assurance
Depends On:
TreeView+ depends on / blocked
Reported: 2010-05-30 02:30 UTC by Dan Williams
Modified: 2010-06-04 12:58 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2010-06-04 12:58:21 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

Description Dan Williams 2010-05-30 02:30:21 UTC
Started my F13 laptop this morning to find my root partition unmountable.  AFAIK I shut it down correctly last night using System -> Shut Down... and picking the Shut Down button.

smartctl reports no errors, and Windows boots fine so I don't think it's a hardware problem.

A Live CD complains that:

May 29 16:28:37 localhost kernel: JBD: no valid journal superblock found
May 29 16:28:37 localhost kernel: EXT4-fs (sda5): error loading journal

I went out and bought an external USB drive, dd-ed the borked /dev/sda5 (my root partition) to it, and I'm happy to provide the journal or other portions of it that you might need for diagnosis; let me know what dd commands you'd need to get what you want off of it.

fsck.ext4 recovery follows...

[root@localhost ~]# fsck.ext4 -v /dev/sda5
e2fsck 1.41.10 (10-Feb-2009)
Superblock has an invalid journal (inode 8).
Clear<y>? yes

*** ext3 journal has been deleted - filesystem is now ext2 only ***

Superblock has_journal flag is clear, but a journal inode is present.
Clear<y>? yes

F11-Preview-x86_ was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Journal inode is not in use, but contains data.  Clear<y>? yes

Pass 2: Checking directory structure   
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(360448--376831)
Fix<y>? yes
Free blocks count wrong for group #11 (5172, counted=21556).
Fix<y>? yes
Free blocks count wrong (9960550, counted=9976934).
Fix<y>? yes
Recreate journal<y>? yes

Creating journal (32768 blocks):  Done.

*** journal has been re-created - filesystem is now ext3 again ***

F11-Preview-x86_: ***** FILE SYSTEM WAS MODIFIED *****

 1216944 inodes used (18.05%)
    3931 non-contiguous files (0.3%)
     583 non-contiguous directories (0.0%)
         # of inodes with ind/dind/tind blocks: 0/0/0
         Extent depth histogram: 1196523/579/4
17005979 blocks used (63.10%)
       0 bad blocks
       6 large files

 1074638 regular files
  121817 directories
       9 character device files
       0 block device files
       0 fifos
  118280 links
   20131 symbolic links (19478 fast symbolic links)
     340 sockets
 1335215 files

Comment 1 Eric Sandeen 2010-06-01 18:00:13 UTC
can you look at the dd image of the pre-fsck filesystem, and use debugfs to do a 

debugfs> stat <8>

to see what that journal inode looks like?

An e2image of the original fs might be useful too, then I can poke at it myself.


Comment 2 Dan Williams 2010-06-01 22:24:38 UTC
debugfs 1.41.10 (10-Feb-2009)
debugfs:  stat <8>
Inode: 8   Type: regular    Mode:  0600   Flags: 0x80000
Generation: 0    Version: 0x00000000
User:     0   Group:     0   Size: 67108864
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 131072
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x49f21d55 -- Fri Apr 24 13:13:09 2009
atime: 0x00000000 -- Wed Dec 31 16:00:00 1969
mtime: 0x49f21d55 -- Fri Apr 24 13:13:09 2009
Size of extra inode fields: 0
(0-16383): 360448-376831

/etc/fstab is mounting this volume like so:

UUID=f89dd491-d9d6-43ba-a553-ab1c938c5ba0 /  ext4 defaults  1 1

Comment 3 Dan Williams 2010-06-01 22:32:26 UTC
Trying to do an e2image of the FS resulted in:

[dcbw@localhost ~]$ sudo e2image  /dev/sdb1 - | bzip2 -9 > sdb1.e2i.bz2
e2image 1.41.10 (10-Feb-2009)
lseek while writing header: Illegal seek
[dcbw@localhost ~]$ dmesg | tail
scsi 4:0:0:0: Direct-Access     Seagate  Portable         0130 PQ: 0 ANSI: 4
sd 4:0:0:0: Attached scsi generic sg2 type 0
sd 4:0:0:0: [sdb] 625142448 512-byte logical blocks: (320 GB/298 GiB)
sd 4:0:0:0: [sdb] Write Protect is off
sd 4:0:0:0: [sdb] Mode Sense: 2f 08 00 00
sd 4:0:0:0: [sdb] Assuming drive cache: write through
sd 4:0:0:0: [sdb] Assuming drive cache: write through
 sdb: sdb1
sd 4:0:0:0: [sdb] Assuming drive cache: write through
sd 4:0:0:0: [sdb] Attached SCSI disk

note that the partition I dd-ed my original fs onto is 320GB; the original fs itself is only 110GB.  I just did:

[root@localhost ~]# dd if=/dev/sda5 of=/dev/sdb1 bs=4096
26950144+0 records in
26950144+0 records out
110387789824 bytes (110 GB) copied, 4140.14 s, 26.7 MB/s

more or less, so the /dev/sdb1 is obviously larger than the original FS.

Not sure if that's something that e2image is complaining about.

Comment 4 Eric Sandeen 2010-06-02 00:51:35 UTC
Ok, thanks.  For some reason I thought it was corrupt, but no ....

> Journal inode is not in use, but contains data.  Clear<y>? yes

The problem is that the inode bitmap thought the journal inode was free.  That's ... odd.

I assume that:

debugfs> testi <8>

shows it not in use?

I wonder if it's possible that something overwrote the first part of your partition.

Maybe you can:

dd if=/dev/$WHATEVER bs=4k count=256 of=first-one-meg

bzip2 that, and attach it?  I can see what the first part of the block device looks like, maybe we'll recognize something that stomped on it.

However, I guess in the total scope of the repair, not much looked damaged.  So maybe that theory is off base ...


Comment 5 Eric Sandeen 2010-06-02 00:55:20 UTC
for e2image you'll need to do the -r option to zip it etc:

> e2image -r /dev/hda1 - | bzip2 > hda1.e2i.bz2


Comment 6 Dan Williams 2010-06-02 08:36:01 UTC
(In reply to comment #4)
> Ok, thanks.  For some reason I thought it was corrupt, but no ....
> > Journal inode is not in use, but contains data.  Clear<y>? yes
> The problem is that the inode bitmap thought the journal inode was free. 
> That's ... odd.
> I assume that:
> debugfs> testi <8>
> shows it not in use?

[dcbw@localhost ~]$ sudo debugfs /dev/sdb1
debugfs 1.41.10 (10-Feb-2009)
debugfs:  testi <8>
Inode 8 is marked in use

Comment 9 Eric Sandeen 2010-06-04 12:58:21 UTC
I looked at this a bit yesterday, and it seems that nothing other than the journal was affected; somehow the first part appears to have been overwritten by 0s.  The 2 blocks preceding the journal weren't listed as belonging to any existing file, so there was no clue there.  Since we have no reproducer, and this has never been seen before, I don't know that we can resolve this one with the information we have now.

So, sadly closing CANTFIX.  If you see this again, though, please speak up!


Note You need to log in before you can comment on or make changes to this bug.