Bug 873201
Summary: | Error : EXT4-fs error (device sda8): ext4_ext_search_left: inode #8126473: (comm db2sysc) ix (60672) != EXT_FIRST_INDEX (0) (depth 1)! leads to DB2 instance to crash | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Elvir Kuric <ekuric> | ||||||
Component: | e2fsprogs | Assignee: | Eric Sandeen <esandeen> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Filesystem QE <fs-qe> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 6.3 | CC: | atomlin, cww, eguan, esandeen, fs-maint, jcastillo, jfenal, jharriga, jsvarova, kdudka, lmcilroy, mgandhi, plambri, rwheeler, sauchter, sct, spanjikk, vincent | ||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | e2fsprogs-1.42.12-17.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: |
The resize2fs utility did not properly handle resizing of an ext4 file system to a smaller size. As a consequence, files containing many extents could become corrupted if they were moved during the resize process. With this update, resize2fs now maintains a consistent extent tree when moving files containing many extents, and such files no longer become corrupted in this scenario.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1014062 1022175 (view as bug list) | Environment: | |||||||
Last Closed: | 2013-11-21 22:42:03 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 961026, 994246, 1014062, 1022175, 1023351, 1033201 | ||||||||
Attachments: |
|
Description
Elvir Kuric
2012-11-05 10:38:42 UTC
The filesystem component has nothing to do with the file system implementation in the kernel. I am switching the component back to kernel... I've actually got an image from upstream which reproduces a similar problem, and I'm looking at it now, so I'll take this one. -Eric Can you attach the output from: # debugfs -R "stat <8126473> -R "dump_extents <8126473>" /dev/sda8 please? Making an e2image -r of the filesystem would also capture its complete state, and may be useful for later analysis, if possible. It will be a large (but sparse) file which should compress down fairly well. Thanks, -Eric slight typo. s/b: # debugfs -R "stat <8126473>" -R "dump_extents <8126473>" /dev/sda8 As a workaround, copying that file and then renaming it over the old one will probably "fix" the existing corruption. I've sent an e2fsck patch upstream to detect & fix what I think the resulting problem is. Still not sure how we hit this in the first place, though. I've sent an e2fsck patch upstream to fix this. Still haven't found root cause. There are now 3 e2fsprogs patches upstream to fix this. I can verify the fix w/ the image you've provided. I've been racking my brain trying to find out how we can hit this problem at runtime, and so far, no luck, unless we are hole-punching - but their version of the kernel does not support hole punching. I think there is one more way where it could happen, and that is via e2fsck clearing corrupted blocks and not updating the extent tree properly. It may be too far in the past, but is there any chance they did an e2fsck prior to this problem occurring, and it found some problems? That could possibly explain it... If they've restored from backup, they are probably ok for now, but we can get e2fsprogs updated to handle this corruption in the future. FWIW, if you were trying to look at the image under RHEL5, you need to use "debuge4fs" which is a cloned & updated version of debugfs just for ext4 on RHEL5. Sorry for the late update, -eric Here are the extents: Level Entries Logical Physical Length Flags 0/ 2 1/ 1 0 - 73727 33794 73728 1/ 2 1/ 29 0 - 3246 9250 3247 2/ 2 1/ 3 0 - 0 32538624 - 32538624 1 2/ 2 2/ 3 1 - 15 9255 - 9269 15 Uninit 2/ 2 3/ 3 16 - 3246 9270 - 12500 3231 1/ 2 2/ 29 3247 - 6311 9252 3065 2/ 2 1/ 1 3247 - 6311 12501 - 15565 3065 1/ 2 3/ 29 6312 - 13535 25637 7224 2/ 2 1/ 2 6312 - 8191 15566 - 17445 1880 2/ 2 2/ 2 8192 - 13535 34816 - 40159 5344 1/ 2 4/ 29 13536 - 13567 33795 32 2/ 2 1/ 1 13536 - 13567 40160 - 40191 32 1/ 2 5/ 29 13568 - 14883 33793 1316 2/ 2 1/ 1 13568 - 14883 40192 - 41507 1316 1/ 2 6/ 29 14884 - 21679 33798 6796 2/ 2 1/ 1 14884 - 21679 41508 - 48303 6796 <snip> 1/ 2 25/ 29 57524 - 59011 15538183 1488 2/ 2 1/ 2 57524 - 59011 15556788 - 15558275 1488 2/ 2 2/ 2 59012 - 65535 15558276 - 15564799 6524 Uninit 1/ 2 26/ 29 59012 - 60671 15538184 1660 2/ 2 1/ 2 59012 - 60671 25638 - 27297 1660 2/ 2 2/ 2 60672 - 60689 27298 - 27315 18 Uninit 1/ 2 27/ 29 60672 - 61023 15538185 352 2/ 2 1/ 19 60690 - 60690 27316 - 27316 1 Uninit 2/ 2 2/ 19 60691 - 60703 27317 - 27329 13 Uninit 2/ 2 3/ 19 60704 - 60767 27330 - 27393 64 <snip> And here we see the problem; the level 27/29 indirect block claims to cover from 60672 onward; however the first extent under it starts at 60690, and the prior extent goes from 60672 to 60689; an overlap. Interesting that we have uninit exents sprinkled in here. With the new patches in e2fsprogs, e2fsck finds this corruption: Pass 1: Checking inodes, blocks, and sizes Interior extent node level 1 of inode 8126473: Logical start 60672 does not match logical start 60690 at next level. Fix? no Interior extent node level 1 of inode 8126473: Logical start 63157 does not match logical start 63159 at next level. Fix? no Created attachment 667406 [details]
dump_extents output
Full extent listing. There seem to be other problems in the extent tree as well.
Thanks for the image. Something is going wrong at runtime here, I think, and we don't yet know what. We're going to need to find a testcase for this, most likely. By the time the error is printed, the corruption has already occurred, I believe. Ok, I see that there are 2 attachments of e2images from the customer, one of them is 03. Sorry about that. Were both filesystems resized in the same manner, prior to the error? The 2 attached images from Stuart - were they after an e2fsck? Neither current RHEL6 e2fsck nor upstream e2fsck finds any issues with them. Also: There are 3 cases attached to this bug. Are they all related to DB2? Was the filesystem resized in all 3 cases? Just looking for the common thread here. I'll check with customer about oradata01 and oradata03, but I think Stuart was quite explicit on getting the images before fsck was run. About the three cases, Elvir's (original one) and Milan's (latest one attached) are related to DB2, and is the process db2sysc the one that detects the problem. Stuart's is related to oracle, or at least that is what the error tells us (comm oracle): kernel: EXT4-fs error (device dm-116): ext4_ext_search_left: inode #58904: (comm oracle) ix (2081024) != EXT_FIRST_INDEX (0) (depth 1)! In all three cases (four filesystems affected as far as I know, one in Elvir's and Milan's cases, two in Stuart's case) the filesystem was resized. I'm going to use Milan's e2image to try to reproduce this. My idea is to restore the image, and then reduce the LV exactly as how they did. If the problem is in the e2fstools, then I should be able to reproduce. If I cannot, then I think we can assume that the problem is outside the filesystem and in DB2 and Oracle. Thank you for the summary. I'm sorry I'm not keeping it straight. If resize is to blame, a pre-resize image would be most helpful, but we may still be able to tickle something even with what we've got. Confirmed that all images from the last two cases were created post-fsck. Elvir's is the only image that was obtained the right way, so I'll try to use his for the reproducer. Ok, I'm pretty confident blaming resize2fs, and I think I have a reproducer, at least to get the image to a state where it shows some of the corruptions on Elvir's image (which, for the record, were: Pass 1: Checking inodes, blocks, and sizes Inode 8126473, end of extent exceeds allowed value (logical block 59012, physical block 15558276, len 6524) Clear? no Inode 8126473, end of extent exceeds allowed value (logical block 60672, physical block 27298, len 18) Clear? no Interior extent node level 1 of inode 8126473: Logical start 60672 does not match logical start 60690 at next level. Fix? no Inode 8126473, end of extent exceeds allowed value (logical block 62992, physical block 29618, len 167) Clear? no Interior extent node level 1 of inode 8126473: Logical start 63157 does not match logical start 63159 at next level. Fix? no Inode 8126473, i_blocks is 642256, should be 588584. Fix? no ) Here's my script: #!/bin/bash # Note: sdb1 is 2.6G umount /dev/sdb1 &>/dev/null mkfs.ext4 /dev/sdb1 &>/dev/null || exit mount /dev/sdb1 /mnt/test 2.5G # Mostly-fill the fs, to push fsx IOs towards the end of the fs fallocate -l 1g /mnt/test/fill1 fallocate -l 1g /mnt/test/fill2 SEED=$RANDOM echo "Seed is $SEED" # fsx is slightly hacked from xfstests to remove any truncates; we always grow ltp/fsx -S $SEED -p 1000 -N 10000 -l 301985792 -y -W -H -t 4096 -w 4096 /mnt/test/testfile # Make room to shrink; fragmented fsx file will have to be moved down rm -f /mnt/test/fill1 rm -f /mnt/test/fill2 umount /dev/sdb1 # 2 fscks; first no-op in case it's corrupt pre-resize, 2nd to satisfy resize2fs e2fsck -fn /dev/sdb1 || exit e2fsck -fy /dev/sdb1 || exit # Get a pre-resize image so we can analyze what went wrong if anything e2image -r /dev/sdb1 image-backup.e2i # Shrink it; this will move the complex file written by fsx resize2fs /dev/sdb1 1G e2fsck -fn /dev/sdb1 || exit And I do get corruption: Pass 1: Checking inodes, blocks, and sizes Inode 14, end of extent exceeds allowed value (logical block 7817, physical block 32346, len 18) Clear? no Interior extent node level 1 of inode 14: Logical start 7818 does not match logical start 7835 at next level. Fix? no Inode 14, end of extent exceeds allowed value (logical block 19709, physical block 40763, len 13) Clear? no Interior extent node level 1 of inode 14: Logical start 19710 does not match logical start 19722 at next level. Fix? no Inode 14, end of extent exceeds allowed value (logical block 22990, physical block 32744, len 3) Clear? no Interior extent node level 1 of inode 14: Logical start 22991 does not match logical start 22993 at next level. Fix? no Inode 14, end of extent exceeds allowed value (logical block 49594, physical block 28622, len 6) Clear? no Interior extent node level 1 of inode 14: Logical start 49595 does not match logical start 49600 at next level. Fix? no Inode 14, end of extent exceeds allowed value (logical block 51924, physical block 47719, len 11) Clear? no Interior extent node level 1 of inode 14: Logical start 51925 does not match logical start 51935 at next level. Fix? no Inode 14, i_blocks is 233040, should be 232632. Fix? no Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information So this is a good result; it certainly points to resize2fs having issues with moving complicated extent-based files. Bug persists upstream (kernel + e2fsprogs), and doesn't seem to be related to uninitialized extents, FWIW. It's always a 1-liner. diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c index 65bb099..8f8d294 100644 --- a/lib/ext2fs/extent.c +++ b/lib/ext2fs/extent.c @@ -1430,6 +1430,9 @@ errcode_t ext2fs_extent_set_bmap(ext2_extent_handle_t handle, retval = ext2fs_extent_replace(handle, 0, &extent); if (retval) goto done; + retval = ext2fs_extent_fix_parents(handle); + if (retval) + goto done; } else { __u32 orig_length; So what's happening here is that we started with an extent tree like this (from debuge2fs's "ex" command): Level Entries Logical Physical Length Flags ... 2/ 2 60/ 63 13096 - 13117 650024 - 650045 22 2/ 2 61/ 63 13134 - 13142 650062 - 650070 9 2/ 2 62/ 63 13193 - 13194 650121 - 650122 2 2/ 2 63/ 63 13227 - 13227 650155 - 650155 1 A) 1/ 2 4/ 14 13228 - 17108 655367 3881 B) 2/ 2 1/117 13228 - 13251 650156 - 650179 24 C) 2/ 2 2/117 13275 - 13287 650203 - 650215 13 2/ 2 3/117 13348 - 13353 650276 - 650281 6 ... the 2/ 2 extents are leaves, 1/ 2 are interior nodes. All of these extents are beyond the resize point, so must be moved. Single-block extent A) gets moved to a lower block, and then the code starts moving the next logical blocks one at a time - those in extent C) The userspace extent code tries to merge, so when it finds that logical 13228 can be merged with logical 13227 into a single extent, it does. And so on, all through extent C), up to block 13250 (why not 13251? Dunno). SO we end up with this when all the blocks are moved post-resize: Level Entries Logical Physical Length Flags ... 2/ 2 120/122 13193 - 13193 33220 - 33220 1 2/ 2 121/122 13194 - 13194 33221 - 33221 1 2/ 2 122/122 13227 - 13250 33222 - 33245 24 D) 1/ 2 5/ 19 13228 - 17108 34676 3881 E) 2/ 2 1/222 13251 - 13251 33246 - 33246 1 F) 2/ 2 2/222 13275 - 13286 33247 - 33258 12 All those adjacent blocks got moved into extent D), which is nice - but the next interior node E) was never updated to reflect its new start point - it says the leaf extents beneath it start at 13228, when in fact they start at 13251. (why F) is left with a single block, I'm not sure - that's another minor buglet, but at least it's not corruption). So as we move blocks one by one out of original extent C) above, we need to keep updating C)'s parent node B) for a proper starting point. fix_parents() does this. Once the tree is corrupted like this, I think more corruption can ensue post-resize, because we traverse the tree by interior nodes, relying on their start block to know where we are in the tree. If it gets off, we'll end up inserting blocks into the wrong part of the tree, etc, as we saw on the original customer image. Created attachment 784719 [details] Example fs image Here's an example image which checks clean, but when shrunk to 1G with: # bunzip2 testimage.qcow.bz2 # qemu-img convert -f raw testimage.e2i # resize2fs testimage.e2i 1G will show similar corruption: # e2fsck -fn testimage.e2i Resetting component to e2fsprogs & requesting blocker for RHEL6.5, this is a filesystem corruption case that would be very good to get fixed asap - probably zstream material too. Eric, Do we have a specific version of fsck which detects and fixes this corruption? Yep, * Thu Jun 06 2013 Eric Sandeen <sandeen> 1.41.12-15 - Enhance e2fsck detection of invalid extent trees (#922847) and it went to z-stream as well: e2fsprogs-1.41.12-14.el6_4.2 However, 2 things: Unfortunately that version sees blocks past EOF as corruption; this is not overly problematic, because a) it's a fairly rare situation, and b) blocks past EOF are only preallocated, and don't contain data anyway. Also, e2fsck will fix the fs insofar as to make it consistent again, but if we get to the point where we have overlapping extents due to this bug, fsck is unlikely to correctly recover all file data. It will make a decision about which blocks to keep and which to throw away, but that may not be correct. If the data is critical, it may be better to go to backups or even try some hand recovery, for better chances of success... Fixed in e2fsprogs-1.42.12-17.el6 Reproduced with e2fsprogs-1.41.12-14.el6_4.2, this is the only e2fsprogs version I can find in brew that could detect the corruption and have no fix for this bug. [root@hp-dl388g8-03 bz873201-e2fsprogs]# rpm -q e2fsprogs e2fsprogs-1.41.12-14.el6_4.2.x86_64 # Setup 2.6G loop device dd if=/dev/zero of=test.img bs=1M count=2662 # Hack ltp/fsx.c from xfstests to remove any truncates [root@hp-dl388g8-03 xfstests]# git diff diff --git a/ltp/fsx.c b/ltp/fsx.c index 2f1e3e8..b2d1cd4 100644 --- a/ltp/fsx.c +++ b/ltp/fsx.c @@ -1076,7 +1076,8 @@ test(void) case OP_TRUNCATE: if (!style) size = random() % maxfilelen; - dotruncate(size); + if (size > file_size) + dotruncate(size); break; case OP_FALLOCATE: # Create mount point mkdir mnt # Modify the test script from comment 42 to accept two arguments ./bz873201.sh /dev/loop0 mnt # e2fsck shows corruption Seed is 25696 Seed set to 25696 mapped writes DISABLED skipping zero size read fallocating to largest ever: 0x33bad98 fallocating to largest ever: 0x91ceebb truncating to largest ever: 0x11b51000 fallocating to largest ever: 0x11bd39f9 fallocating to largest ever: 0x11d16d4e fallocating to largest ever: 0x11fce207 2000 write 0xbd54000 thru 0xbd6190a (0xd90b bytes) fallocating to largest ever: 0x11fe8b21 3000 write 0xffcc000 thru 0xffd274e (0x674f bytes) 4000 falloc from 0x2912419 to 0x2919503 (0x70ea bytes) skipping zero length fallocate 5000 mapread 0xd245642 thru 0xd246799 (0x1158 bytes) 6000 write 0x10d69000 thru 0x10d6f04e (0x604f bytes) 7000 write 0x7fd9000 thru 0x7fe5d9a (0xcd9b bytes) 8000 falloc from 0x9d4e7a to 0x9e2cc0 (0xde46 bytes) 9000 write 0xf284000 thru 0xf292bd1 (0xebd2 bytes) 10000 write 0x104e2000 thru 0x104e38e6 (0x18e7 bytes) All operations completed A-OK! e2fsck 1.41.12 (17-May-2010) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/loop0: 14/170688 files (7.1% non-contiguous), 57538/681472 blocks e2fsck 1.41.12 (17-May-2010) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/loop0: 14/170688 files (7.1% non-contiguous), 57538/681472 blocks e2image 1.41.12 (17-May-2010) resize2fs 1.41.12 (17-May-2010) Resizing the filesystem on /dev/loop0 to 262144 (4k) blocks. The filesystem on /dev/loop0 is now 262144 blocks long. e2fsck 1.41.12 (17-May-2010) Pass 1: Checking inodes, blocks, and sizes Inode 14, end of extent exceeds allowed value (logical block 209, physical block 40989, len 18) Clear? no Interior extent node level 1 of inode 14: Logical start 210 does not match logical start 227 at next level. Fix? no Inode 14, end of extent exceeds allowed value (logical block 13491, physical block 34401, len 11) Clear? no Interior extent node level 1 of inode 14: Logical start 13492 does not match logical start 13502 at next level. Fix? no Inode 14, end of extent exceeds allowed value (logical block 33046, physical block 36229, len 13) Clear? no Interior extent node level 1 of inode 14: Logical start 33047 does not match logical start 33059 at next level. Fix? no Inode 14, end of extent exceeds allowed value (logical block 63511, physical block 49921, len 3) Clear? no Interior extent node level 1 of inode 14: Logical start 63512 does not match logical start 63514 at next level. Fix? no Inode 14, i_blocks is 235496, should be 235136. Fix? no Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: -(34401--34411) -(36229--36241) -(40989--41006) -(49921--49923) Fix? no /dev/loop0: ********** WARNING: Filesystem still has errors ********** /dev/loop0: 14/65024 files (14.3% non-contiguous), 50752/262144 blocks Verified with e2fsprogs-1.41.12-18.el6 [root@hp-dl388g8-03 bz873201-e2fsprogs]# rpm -q e2fsprogs e2fsprogs-1.41.12-18.el6.x86_64 [root@hp-dl388g8-03 bz873201-e2fsprogs]# ./bz873201.sh /dev/loop0 mnt Seed is 743 Seed set to 743 mapped writes DISABLED skipping zero size read fallocating to largest ever: 0x33bad98 fallocating to largest ever: 0x91ceebb truncating to largest ever: 0x11b51000 fallocating to largest ever: 0x11bd39f9 fallocating to largest ever: 0x11d16d4e fallocating to largest ever: 0x11fce207 2000 write 0xbd54000 thru 0xbd6190a (0xd90b bytes) fallocating to largest ever: 0x11fe8b21 3000 write 0xffcc000 thru 0xffd274e (0x674f bytes) 4000 falloc from 0x2912419 to 0x2919503 (0x70ea bytes) skipping zero length fallocate 5000 mapread 0xd245642 thru 0xd246799 (0x1158 bytes) 6000 write 0x10d69000 thru 0x10d6f04e (0x604f bytes) 7000 write 0x7fd9000 thru 0x7fe5d9a (0xcd9b bytes) 8000 falloc from 0x9d4e7a to 0x9e2cc0 (0xde46 bytes) 9000 write 0xf284000 thru 0xf292bd1 (0xebd2 bytes) 10000 write 0x104e2000 thru 0x104e38e6 (0x18e7 bytes) All operations completed A-OK! e2fsck 1.41.12 (17-May-2010) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/loop0: 14/170688 files (7.1% non-contiguous), 57538/681472 blocks e2fsck 1.41.12 (17-May-2010) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/loop0: 14/170688 files (7.1% non-contiguous), 57538/681472 blocks e2image 1.41.12 (17-May-2010) resize2fs 1.41.12 (17-May-2010) Resizing the filesystem on /dev/loop0 to 262144 (4k) blocks. The filesystem on /dev/loop0 is now 262144 blocks long. e2fsck 1.41.12 (17-May-2010) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/loop0: 14/65024 files (14.3% non-contiguous), 50752/262144 blocks Set to VERIFIED *** Bug 1022175 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1689.html The patch here is to e2fsprogs to recover from the corruption. Is there any kernel side patch to prevent the corruption in the first place? Yes, I believe it was the patch that fixes parents on an extent split, let me find the bug... Wait, fix_parents was userspace. ;) Hang on, still looking/remembering. Sorry, I had my bugs confused. IIRC this corruption was _caused_ by e2fsprogs, in this case by an offline shrink. See comment #44. The resolution for this bug fixes the root cause. This fix: - Enhance e2fsck detection of invalid extent trees (#922847) enhances e2fsck to find & fix the result of this bug (overlapping extents). -Eric Thanks Eric. I have a customer reporting fs corruption with the same messages in this BZ but they have not done a shrink (a few grows but nothing recent). Still gathering info on the case but it looks like something kernel side has caused it. |