Description of problem: It appears that ext4 is showing signs of a corrupt FS after a rather large series of file copy operations. Right now, I'm in the process of moving roughly 5 or 6TB of data with an extremely large number of small files (~1kb) to an ext4 fs using rsync, and after about 3TB were copied, the filesystem began showing unusual errors, as shown by my syslog output. This seemed notable since the filesystem was checked (with the latest e4fsprogs from 5.5) before the copy operations, and some errors which had existed due to drive failures were repaired. The RAID card being used in this instance is a 3ware 9650SE-8LPML. Currently, the RAID itself is not showing any errors, and it is using the built-in kernel modules in RHEL along with the latest firmware from LSI. I understand the technology preview of ext4, which is why I'm curious if we can investigate if there is some unknown bug in ext4. Version-Release number of selected component (if applicable): Linux r42sgao 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux How reproducible: Unknown. I don't have a test environment capable of reproducing this. Additional info: Syslog output: Apr 22 23:12:15 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22324902: rec_len is s maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0 Apr 22 23:12:15 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22365414: rec_len is s maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0 Apr 22 23:12:15 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22382248: rec_len is s maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0 Apr 22 23:12:16 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22415192: rec_len is s maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0 Apr 22 23:12:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22430806: rec_len is s maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0 Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22324902: rec_len is s maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0 Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_readdir: bad page in #22324902 Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22365414: rec_len is s maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0 Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_readdir: bad page in #22365414 Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22382248: rec_len is s maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0 Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_readdir: bad page in #22382248 Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22415192: rec_len is s maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0 Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_readdir: bad page in #22415192 Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: bad entry in directory #22430806: rec_len is s maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0 Apr 23 08:10:17 r42sgao kernel: EXT2-fs error (device sdk1): ext2_readdir: bad page in #22430806
Hi Doug, Is this a freshly made ext4 file system, or an older one that you converted? Happy to help here, but please keep in mind that ext4 in 5.5 is tech preview so we do not recommend using it for production work loads. Thanks!
> Syslog output: > Apr 22 23:12:15 r42sgao kernel: EXT2-fs error (device sdk1): ext2_check_page: > bad entry in directory #22324902: rec_len is s > maller than minimal - offset=4096, inode=0, rec_len=0, name_len=0 ext2 errors? I thought this was ext4? Can you double check that what you think is mounted is really mounted? Also, how big is the fs? If it's somehow ext2, and if the fs is > 8T, we may have problems there. Thanks, -Eric
This is a freshly-made ext4 fs, since mke2fs will bail out, stating the filesystem is too large. Otherwise, I probably would have gone with ext3.... I understand and accept the risk that comes with it being a tech preview. I could have created it as several independent volumes again (mirroring the existing setup), but preferred to have one large volume. Also, you're totally right. It is an ext4 volume, but it apparently let me mount it ext2(?), and I totally didn't catch it. You're correct, the FS is roughly 11TB (as reported by df output). I was thinking I was crazy seeing ext2 errors, but apparently not. Apparently, I really had managed to do something weird.
Mounting as ext4 works (just verified briefly the directory structure exists), and running an e4fsck now to clean up after my stupidity. It does seem that the ext2/ext3 mount helpers should do some testing to ensure the fs size is not >8TB, preventing this "shoot yourself in the foot" operation.
How did you invoke mkfs? Just want to make sure there's no crazy pitfall we're not aware of. :) (hm, maybe we -should- actively prevent ext2 from mounting anything over 8T... ext3 should be ok, though, I worked hard to fix that way back when) Unless you have critical bits on this fs I'd strongly suggest re-mkfs'ing it with mkfs.ext4. Thanks, -Eric
From my bash history: time mkfs.ext4 -L data /dev/sdk1 Theoretically, any data that's currently on the drive could be recovered by copying from my source drives again, but I'd rather not run down that road unless e4fsck absolutely bombs out on me. Throwing two days of work away isn't exactly what I had in mind (though, I suppose ensuring data integrity would make it worth it). Definitely considering just giving up instead of spending 3+ hours trying to run fsck. Thanks for your help! --Doug
Hm, well that is really odd; you did create it properly. Why on earth did ext2 mount it? And mkfs.ext4 should have written features that ext2 -cannot- mount. I'm somewhere between confused and concerned. :) Anything else weird in between mkfs & now? Something had to have changed... The reason I had suggested re-mkfs was because if it -had- been mkfs'd as ext2, you are missing goodies that mkfs.ext4 would have laid out for you at mkfs time... -Eric
Can you maybe attach dumpe2fs -h of the filesystem? Thanks, -Eric
Only other command I ran was to turn off automatic fsck (horrible, yes, but we've had problems in the past with fsck at boot--namely, this is a non-system critical volume, and it'd be nice to fsck it in the background, as some of the BSDs do). tune4fs -c 0 -i 0 /dev/sdk1 Here's dumpe2fs: Filesystem volume name: data Last mounted on: <not available> Filesystem UUID: 4bc05c9f-8823-4eaa-a22e-ffab80e64a79 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: ext_attr resize_inode dir_index filetype sparse_super large_file Default mount options: (none) Filesystem state: clean with errors Errors behavior: Continue Filesystem OS type: Linux Inode count: 732422144 Block count: 2929671159 Reserved block count: 146483557 Free blocks: 1990246966 Free inodes: 686980963 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 325 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Filesystem created: Thu Mar 11 12:19:15 2010 Last mount time: Fri Apr 23 10:56:08 2010 Last write time: Fri Apr 23 10:56:17 2010 Mount count: 3 Maximum mount count: -1 Last checked: Wed Apr 21 16:42:36 2010 Check interval: 0 (<none>) Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Default directory hash: half_md4 Directory Hash Seed: 63a05020-ad4b-4085-a024-36b310e426c3 For comparison, dumpe4fs, too: dumpe4fs 1.41.9 (22-Aug-2009) Filesystem volume name: data Last mounted on: <not available> Filesystem UUID: 4bc05c9f-8823-4eaa-a22e-ffab80e64a79 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: ext_attr resize_inode dir_index filetype sparse_super large_file Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: clean with errors Errors behavior: Continue Filesystem OS type: Linux Inode count: 732422144 Block count: 2929671159 Reserved block count: 146483557 Free blocks: 1990246966 Free inodes: 686980963 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 325 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Filesystem created: Thu Mar 11 12:19:15 2010 Last mount time: Fri Apr 23 10:56:08 2010 Last write time: Fri Apr 23 10:56:17 2010 Mount count: 3 Maximum mount count: -1 Last checked: Wed Apr 21 16:42:36 2010 Check interval: 0 (<none>) Lifetime writes: 12 kB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Default directory hash: half_md4 Directory Hash Seed: 63a05020-ad4b-4085-a024-36b310e426c3 (Looks mostly similar, just has a few extras.) Grepping through my system logs, I do see this as happening earlier (before the copy started). I do recall having issues with the RAID controller the drives were attached to, but it has since been fixed, and I don't see how it would cause something like this...
(In reply to comment #9) > Only other command I ran was to turn off automatic fsck (horrible, yes, but > we've had problems in the past with fsck at boot--namely, this is a non-system > critical volume, and it'd be nice to fsck it in the background, as some of the > BSDs do). I have no reason to criticize you for that move. :) I'd personally always prefer to do fsck under admin control. > tune4fs -c 0 -i 0 /dev/sdk1 > > Here's dumpe2fs: > Filesystem volume name: data > Last mounted on: <not available> > Filesystem UUID: 4bc05c9f-8823-4eaa-a22e-ffab80e64a79 > Filesystem magic number: 0xEF53 > Filesystem revision #: 1 (dynamic) > Filesystem features: ext_attr resize_inode dir_index filetype sparse_super large_file hm so that's missing all of the interesting ext4 features. ... > Inode size: 256 However, an inode size of 256 indicates that it was created with mkfs.ext4; mkfs.ext[23] in rhel5 defaults to 128 byte inodes. ... > For comparison, dumpe4fs, too: (ah right; if it had had ext4 features dumpe2fs wouldn't have touched it) ... > Grepping through my system logs, I do see this as happening earlier (before the > copy started). I do recall having issues with the RAID controller the drives > were attached to, but it has since been fixed, and I don't see how it would > cause something like this... well, this is worrisome. I'll try creating an ext4 fs under rhel5 with the same geometry as above, and then run tune4fs to see if somehow that lost the extra features... most plausible theory I have so far. Thanks, -Eric
Can you confirm the version of your e4fsprogs? (rpm -q e4fsprogs) Thanks, -Eric
I guess the other thing I'd ask is the exact sequence of events here. mkfs, tune2fs, mount, copy, corruption, fsck? Or were there other mounts/copies/tune2fs's/unmounts in between? Thanks, -Eric
(In reply to comment #12) > I guess the other thing I'd ask is the exact sequence of events here. > > mkfs, tune2fs, mount, copy, corruption, fsck? > > Or were there other mounts/copies/tune2fs's/unmounts in between? > > Thanks, > -Eric I have an idea.... there were other mounts/copies in between (and crashes), and I may have accidentally run e2fsck out of habit (instead of e4fsck). It wasn't run with -y (and I don't think I told it to make any changes), though, and it didn't finish. This was several weeks ago, so my recollection may be fuzzy. e4fsprogs is e4fsprogs-1.41.9-3.el5, though it has been updated since mkfs/tune4fs were run (with the version from 5.4, probably around early to mid March).
(In reply to comment #13) > I have an idea.... there were other mounts/copies in between (and crashes), and crashed how? fs related? > I may have accidentally run e2fsck out of habit (instead of e4fsck). It wasn't well, e2fsck should have refused to touch it due to the incompatible features ... > run with -y (and I don't think I told it to make any changes), though, and it > didn't finish. This was several weeks ago, so my recollection may be fuzzy. ok. > e4fsprogs is e4fsprogs-1.41.9-3.el5, though it has been updated since > mkfs/tune4fs were run (with the version from 5.4, probably around early to mid > March). hmm ok... will re-start my mkfs for testing w/ an older version :) Thanks, -Eric
Crashes were actually RAID controller issues--imagine the entire volume just disappearing out from under the system. More or less, the controller was faulting, causing it to reset and the RAID to be marked as inoperable. The system remained running no oopses or other messages in syslog, other than the 3ware kernel module telling me about the controller's issues, and if I reset the drives marked as faulted, everything would chug along again for a few minutes. I've checked my history a few times just to make sure nothing else strange happened (that I have a record of). Other than that, I don't have any recollection of it behaving strangely. I believe I remember after first mounting it that the filesystem reported ext4, but I could be wrong. Still, no idea why I didn't get any of the ext4 magic features... Thank you, --Doug
Okay, now I just looked some more... what's perhaps really amazing is the fact that there's not even a journal! This in itself makes me ready to just start over with this filesystem, forgetting about time spent already. Looking at versions of e4fsprogs in RHN, I would bet that the version I was using before was e4fsprogs-1.41.5-3.el5.x86_64. I guess my question to you is, how should I proceed? Start over from scratch, try to add the missing features as if I were migrating the filesystem, or hold off until we get a little further into this? I'd like to get you as much data as possible, but I also need to keep the machine (and data) available... Thanks, --Doug
I guess I would start over from scratch. The raid crashes make me very nervous, and I don't know how much damage was done... were any other fscks done prior to this? I'm getting less worried about gathering data, since I am leaning towards blaming this on severe corruption from your raid card ... -Eric
Okay, I've got the filesystem building from scratch again. I was feeling nervous, too... There was one complete fsck on Wednesday that found (and fixed) errors... I will agree, the RAID card is likely to blame here in some aspect. In any case, back to the original report, I'd say a check to make sure ext2 isn't trying to mount huge filesystems that it will fail with would be nice, but other than that, this issue seems pretty much resolved.
Ok, thanks for all the info you provided! I'll do a sanity check mkfs & tune2fs to be sure we're not clobbering anything that way. (if you have the logs of the first fsck that might shed some light) I guess I'll keep this bug open for now, as a reminder that >8T really is unsafe on ext2, and we should probably prevent it. thanks, -Eric
Thanks again for your help troubleshooting this one! I've gone ahead and recreated the RAID container, the disk label, partition, and filesystem. I verified that dumpe4fs recognizes all the cool new options. I don't have any older fsck logs, nor do I know exactly when this issue manifested. Anyway, thanks again! --Doug
Are these problems still occurring or should this bug be closed at this point? It seems like we weren't able to gather enough information to identify the problem. Thanks, -Eric
Closing based on no answer to the needinfo. Reopen as necessary...