Bug 585282 - ext2_check_page: bad entry in directory / ext2_readdir: bad page after large copy
Summary: ext2_check_page: bad entry in directory / ext2_readdir: bad page after large ...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Eric Sandeen
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-23 15:27 UTC by Doug Kelly
Modified: 2011-05-05 19:35 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-05 19:35:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Doug Kelly 2010-04-23 15:27:29 UTC
Description of problem:
It appears that ext4 is showing signs of a corrupt FS after a rather large series of file copy operations.  Right now, I'm in the process of moving roughly 5 or 6TB of data with an extremely large number of small files (~1kb) to an ext4 fs using rsync, and after about 3TB were copied, the filesystem began showing unusual errors, as shown by my syslog output.  This seemed notable since the filesystem was checked (with the latest e4fsprogs from 5.5) before the copy operations, and some errors which had existed due to drive failures were repaired.

The RAID card being used in this instance is a 3ware 9650SE-8LPML.  Currently, the RAID itself is not showing any errors, and it is using the built-in kernel modules in RHEL along with the latest firmware from LSI.

I understand the technology preview of ext4, which is why I'm curious if we can investigate if there is some unknown bug in ext4.

Version-Release number of selected component (if applicable):
Linux r42sgao 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:
Unknown.  I don't have a test environment capable of reproducing this.

Additional info:

Syslog output:
Apr 22 23:12:15 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22324902: rec_len is s
maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0
Apr 22 23:12:15 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22365414: rec_len is s
maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0
Apr 22 23:12:15 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22382248: rec_len is s
maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0
Apr 22 23:12:16 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22415192: rec_len is s
maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0
Apr 22 23:12:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22430806: rec_len is s
maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0
Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22324902: rec_len is s
maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0
Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_readdir: bad page in #22324902
Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22365414: rec_len is s
maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0
Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_readdir: bad page in #22365414
Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22382248: rec_len is s
maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0
Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_readdir: bad page in #22382248
Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22415192: rec_len is s
maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0
Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_readdir: bad page in #22415192
Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22430806: rec_len is s
maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0
Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_readdir: bad page in #22430806

Comment 1 Ric Wheeler 2010-04-23 15:41:30 UTC
Hi Doug,

Is this a freshly made ext4 file system, or an older one that you converted?

Happy to help here, but please keep in mind that ext4 in 5.5 is tech preview so we do not recommend using it for production work loads.

Thanks!

Comment 2 Eric Sandeen 2010-04-23 15:47:03 UTC
> Syslog output:
> Apr 22 23:12:15 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page:
> bad entry in directory #22324902: rec_len is s
> maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0

ext2 errors?  I thought this was ext4?  Can you double check that what you think is mounted is really mounted?

Also, how big is the fs?  If it's somehow ext2, and if the fs is > 8T, we may have problems there.

Thanks,
-Eric

Comment 3 Doug Kelly 2010-04-23 15:52:36 UTC
This is a freshly-made ext4 fs, since mke2fs will bail out, stating the filesystem is too large.  Otherwise, I probably would have gone with ext3....  I understand and accept the risk that comes with it being a tech preview.  I could have created it as several independent volumes again (mirroring the existing setup), but preferred to have one large volume.

Also, you're totally right.  It is an ext4 volume, but it apparently let me mount it ext2(?), and I totally didn't catch it.  You're correct, the FS is roughly 11TB (as reported by df output).  I was thinking I was crazy seeing ext2 errors, but apparently not.  Apparently, I really had managed to do something weird.

Comment 4 Doug Kelly 2010-04-23 15:58:59 UTC
Mounting as ext4 works (just verified briefly the directory structure exists), and running an e4fsck now to clean up after my stupidity.

It does seem that the ext2/ext3 mount helpers should do some testing to ensure the fs size is not >8TB, preventing this "shoot yourself in the foot" operation.

Comment 5 Eric Sandeen 2010-04-23 16:01:17 UTC
How did you invoke mkfs?  Just want to make sure there's no crazy pitfall we're not aware of.  :)

(hm, maybe we -should- actively prevent ext2 from mounting anything over 8T... ext3 should be ok, though, I worked hard to fix that way back when)

Unless you have critical bits on this fs I'd strongly suggest re-mkfs'ing it with mkfs.ext4.

Thanks,
-Eric

Comment 6 Doug Kelly 2010-04-23 16:11:23 UTC
From my bash history:
time mkfs.ext4 -L data /dev/sdk1

Theoretically, any data that's currently on the drive could be recovered by copying from my source drives again, but I'd rather not run down that road unless e4fsck absolutely bombs out on me.  Throwing two days of work away isn't exactly what I had in mind (though, I suppose ensuring data integrity would make it worth it).  Definitely considering just giving up instead of spending 3+ hours trying to run fsck.

Thanks for your help!

--Doug

Comment 7 Eric Sandeen 2010-04-23 16:16:55 UTC
Hm, well that is really odd; you did create it properly.  Why on earth did ext2 mount it?  And mkfs.ext4 should have written features that ext2 -cannot- mount.

I'm somewhere between confused and concerned.  :)

Anything else weird in between mkfs & now?  Something had to have changed...

The reason I had suggested re-mkfs was because if it -had- been mkfs'd as ext2, you are missing goodies that mkfs.ext4 would have laid out for you at mkfs time...

-Eric

Comment 8 Eric Sandeen 2010-04-23 16:19:19 UTC
Can you maybe attach dumpe2fs -h of the filesystem?

Thanks,
-Eric

Comment 9 Doug Kelly 2010-04-23 16:31:20 UTC
Only other command I ran was to turn off automatic fsck (horrible, yes, but we've had problems in the past with fsck at boot--namely, this is a non-system critical volume, and it'd be nice to fsck it in the background, as some of the BSDs do).
tune4fs -c 0 -i 0 /dev/sdk1

Here's dumpe2fs:
Filesystem volume name:   data
Last mounted on:          <not available>
Filesystem UUID:          4bc05c9f-8823-4eaa-a22e-ffab80e64a79
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      ext_attr resize_inode dir_index filetype sparse_super large_file
Default mount options:    (none)
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              732422144
Block count:              2929671159
Reserved block count:     146483557
Free blocks:              1990246966
Free inodes:              686980963
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      325
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Filesystem created:       Thu Mar 11 12:19:15 2010
Last mount time:          Fri Apr 23 10:56:08 2010
Last write time:          Fri Apr 23 10:56:17 2010
Mount count:              3
Maximum mount count:      -1
Last checked:             Wed Apr 21 16:42:36 2010
Check interval:           0 (<none>)
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Default directory hash:   half_md4
Directory Hash Seed:      63a05020-ad4b-4085-a024-36b310e426c3

For comparison, dumpe4fs, too:
dumpe4fs 1.41.9 (22-Aug-2009)
Filesystem volume name:   data
Last mounted on:          <not available>
Filesystem UUID:          4bc05c9f-8823-4eaa-a22e-ffab80e64a79
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      ext_attr resize_inode dir_index filetype sparse_super large_file
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              732422144
Block count:              2929671159
Reserved block count:     146483557
Free blocks:              1990246966
Free inodes:              686980963
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      325
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Filesystem created:       Thu Mar 11 12:19:15 2010
Last mount time:          Fri Apr 23 10:56:08 2010
Last write time:          Fri Apr 23 10:56:17 2010
Mount count:              3
Maximum mount count:      -1
Last checked:             Wed Apr 21 16:42:36 2010
Check interval:           0 (<none>)
Lifetime writes:          12 kB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Default directory hash:   half_md4
Directory Hash Seed:      63a05020-ad4b-4085-a024-36b310e426c3

(Looks mostly similar, just has a few extras.)

Grepping through my system logs, I do see this as happening earlier (before the copy started).  I do recall having issues with the RAID controller the drives were attached to, but it has since been fixed, and I don't see how it would cause something like this...

Comment 10 Eric Sandeen 2010-04-23 16:50:49 UTC
(In reply to comment #9)
> Only other command I ran was to turn off automatic fsck (horrible, yes, but
> we've had problems in the past with fsck at boot--namely, this is a non-system
> critical volume, and it'd be nice to fsck it in the background, as some of the
> BSDs do).

I have no reason to criticize you for that move.  :)  I'd personally always prefer to do fsck under admin control.

> tune4fs -c 0 -i 0 /dev/sdk1
> 
> Here's dumpe2fs:
> Filesystem volume name:   data
> Last mounted on:          <not available>
> Filesystem UUID:          4bc05c9f-8823-4eaa-a22e-ffab80e64a79
> Filesystem magic number:  0xEF53
> Filesystem revision #:    1 (dynamic)
> Filesystem features:      ext_attr resize_inode dir_index filetype sparse_super large_file

hm so that's missing all of the interesting ext4 features.

...

> Inode size:               256

However, an inode size of 256 indicates that it was created with mkfs.ext4; mkfs.ext[23] in rhel5 defaults to 128 byte inodes.

...

> For comparison, dumpe4fs, too:

(ah right; if it had had ext4 features dumpe2fs wouldn't have touched it)

...

> Grepping through my system logs, I do see this as happening earlier (before the
> copy started).  I do recall having issues with the RAID controller the drives
> were attached to, but it has since been fixed, and I don't see how it would
> cause something like this...    

well, this is worrisome.

I'll try creating an ext4 fs under rhel5 with the same geometry as above, and then run tune4fs to see if somehow that lost the extra features... most plausible theory I have so far.

Thanks,
-Eric

Comment 11 Eric Sandeen 2010-04-23 16:54:38 UTC
Can you confirm the version of your e4fsprogs? (rpm -q e4fsprogs)

Thanks,
-Eric

Comment 12 Eric Sandeen 2010-04-23 16:59:36 UTC
I guess the other thing I'd ask is the exact sequence of events here.

mkfs, tune2fs, mount, copy, corruption, fsck?

Or were there other mounts/copies/tune2fs's/unmounts in between?

Thanks,
-Eric

Comment 13 Doug Kelly 2010-04-23 17:33:57 UTC
(In reply to comment #12)
> I guess the other thing I'd ask is the exact sequence of events here.
> 
> mkfs, tune2fs, mount, copy, corruption, fsck?
> 
> Or were there other mounts/copies/tune2fs's/unmounts in between?
> 
> Thanks,
> -Eric    

I have an idea.... there were other mounts/copies in between (and crashes), and I may have accidentally run e2fsck out of habit (instead of e4fsck).  It wasn't run with -y (and I don't think I told it to make any changes), though, and it didn't finish.  This was several weeks ago, so my recollection may be fuzzy.

e4fsprogs is e4fsprogs-1.41.9-3.el5, though it has been updated since mkfs/tune4fs were run (with the version from 5.4, probably around early to mid March).

Comment 14 Eric Sandeen 2010-04-23 17:39:08 UTC
(In reply to comment #13)

> I have an idea.... there were other mounts/copies in between (and crashes), and

crashed how?  fs related?

> I may have accidentally run e2fsck out of habit (instead of e4fsck).  It wasn't

well, e2fsck should have refused to touch it due to the incompatible features ...

> run with -y (and I don't think I told it to make any changes), though, and it
> didn't finish.  This was several weeks ago, so my recollection may be fuzzy.

ok.

> e4fsprogs is e4fsprogs-1.41.9-3.el5, though it has been updated since
> mkfs/tune4fs were run (with the version from 5.4, probably around early to mid
> March).    

hmm ok... will re-start my mkfs for testing w/ an older version :)

Thanks,
-Eric

Comment 15 Doug Kelly 2010-04-23 17:52:58 UTC
Crashes were actually RAID controller issues--imagine the entire volume just disappearing out from under the system.  More or less, the controller was faulting, causing it to reset and the RAID to be marked as inoperable.  The system remained running no oopses or other messages in syslog, other than the 3ware kernel module telling me about the controller's issues, and if I reset the drives marked as faulted, everything would chug along again for a few minutes.  I've checked my history a few times just to make sure nothing else strange happened (that I have a record of).

Other than that, I don't have any recollection of it behaving strangely.  I believe I remember after first mounting it that the filesystem reported ext4, but I could be wrong.  Still, no idea why I didn't get any of the ext4 magic features...

Thank you,
--Doug

Comment 16 Doug Kelly 2010-04-23 18:18:57 UTC
Okay, now I just looked some more... what's perhaps really amazing is the fact that there's not even a journal!  This in itself makes me ready to just start over with this filesystem, forgetting about time spent already.

Looking at versions of e4fsprogs in RHN, I would bet that the version I was using before was e4fsprogs-1.41.5-3.el5.x86_64.

I guess my question to you is, how should I proceed?  Start over from scratch, try to add the missing features as if I were migrating the filesystem, or hold off until we get a little further into this?  I'd like to get you as much data as possible, but I also need to keep the machine (and data) available...

Thanks,
--Doug

Comment 17 Eric Sandeen 2010-04-23 18:32:59 UTC
I guess I would start over from scratch.  The raid crashes make me very nervous, and I don't know how much damage was done... were any other fscks done prior to this?  I'm getting less worried about gathering data, since I am leaning towards blaming this on severe corruption from your raid card ...

-Eric

Comment 18 Doug Kelly 2010-04-23 18:50:17 UTC
Okay, I've got the filesystem building from scratch again.  I was feeling nervous, too...

There was one complete fsck on Wednesday that found (and fixed) errors...

I will agree, the RAID card is likely to blame here in some aspect.  In any case, back to the original report, I'd say a check to make sure ext2 isn't trying to mount huge filesystems that it will fail with would be nice, but other than that, this issue seems pretty much resolved.

Comment 19 Eric Sandeen 2010-04-23 18:54:56 UTC
Ok, thanks for all the info you provided!  I'll do a sanity check mkfs & tune2fs to be sure we're not clobbering anything that way.

(if you have the logs of the first fsck that might shed some light)

I guess I'll keep this bug open for now, as a reminder that >8T really is unsafe on ext2, and we should probably prevent it.

thanks,
-Eric

Comment 20 Doug Kelly 2010-04-23 19:33:53 UTC
Thanks again for your help troubleshooting this one!  I've gone ahead and recreated the RAID container, the disk label, partition, and filesystem.  I verified that dumpe4fs recognizes all the cool new options.

I don't have any older fsck logs, nor do I know exactly when this issue manifested.

Anyway, thanks again!

--Doug

Comment 21 Eric Sandeen 2011-03-15 15:55:19 UTC
Are these problems still occurring or should this bug be closed at this point?  It seems like we weren't able to gather enough information to identify the problem.

Thanks,
-Eric

Comment 22 Eric Sandeen 2011-05-05 19:35:42 UTC
Closing based on no answer to the needinfo.  Reopen as necessary...


Note You need to log in before you can comment on or make changes to this bug.