873201 – Error : EXT4-fs error (device sda8): ext4_ext_search_left: inode #8126473: (comm db2sysc) ix (60672) != EXT_FIRST_INDEX (0) (depth 1)! leads to DB2 instance to crash

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 873201 - Error : EXT4-fs error (device sda8): ext4_ext_search_left: inode #8126473: (comm db2sysc) ix (60672) != EXT_FIRST_INDEX (0) (depth 1)! leads to DB2 instance to crash

Summary: Error : EXT4-fs error (device sda8): ext4_ext_search_left: inode #8126473: (c...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	e2fsprogs
Sub Component:
Version:	6.3
Hardware:	All
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Eric Sandeen
QA Contact:	Filesystem QE
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1022175 (view as bug list)
Depends On:
Blocks:	961026 994246 1014062 1022175 1023351 1033201
TreeView+	depends on / blocked

Reported:	2012-11-05 10:38 UTC by Elvir Kuric
Modified:	2018-12-03 17:55 UTC (History)
CC List:	18 users (show)
Fixed In Version:	e2fsprogs-1.42.12-17.el6
Doc Type:	Bug Fix
Doc Text:	The resize2fs utility did not properly handle resizing of an ext4 file system to a smaller size. As a consequence, files containing many extents could become corrupted if they were moved during the resize process. With this update, resize2fs now maintains a consistent extent tree when moving files containing many extents, and such files no longer become corrupted in this scenario.
Clone Of:
Clones:	1014062 1022175 (view as bug list)
Environment:
Last Closed:	2013-11-21 22:42:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
dump_extents output (18.10 KB, text/plain) 2012-12-21 20:32 UTC, Eric Sandeen	no flags	Details
Example fs image (776.43 KB, application/x-bzip2) 2013-08-09 05:16 UTC, Eric Sandeen	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	173543	0	None	None	None	Never
Red Hat Product Errata	RHBA-2013:1689	0	normal	SHIPPED_LIVE	e2fsprogs bug fix update	2013-11-20 21:52:23 UTC

Description Elvir Kuric 2012-11-05 10:38:42 UTC

Description of problem:

When starting db2 instance on RHEL 6.3 OS it will crash and will not work as expected 


Version-Release number of selected component (if applicable):

RHEL 6.3 
Kernel : 2.6.32-131.21.1.el6.x86_64 #1 SMP Fri Nov 11 11:50:54 EST 2011 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:

Install DB2 on RHEL 6.3 and try to start it.


  
Actual results:

DB2 instance fails to start with error message showed below in /var/log/messages



Additional info:

In this case issue is visible on one file system mounted as below ( I have full /etc/fstab, here is only part ) 


/dev/sda8 on /tsm/db type ext4 (rw) [tsmdb]

/dev/sda8: LABEL="tsmdb" UUID="eb82e64d-7f22-4627-a630-316b8be85e90" TYPE="ext4" 

/dev/sda8            216277748  90920132 114371288  45% /tsm/db

It was run e2fsck on this file system and file system mount works fine.e2fsck does not report file system errors 

# e2fsck -f /dev/sda8
e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
tsmdb: 48/13737984 files (2.1% non-contiguous), 23592236/54931640 blocks
[root@slx900 mte]# mount -a


Error message in logs is : 

Nov  1 15:19:44 server kernel: EXT4-fs error (device sda8): ext4_ext_search_left: inode #8126473: (comm db2sysc) ix (60672) != EXT_FIRST_INDEX (0) (depth 1)!
Nov  1 15:19:44 server kernel: EXT4-fs error (device sda8): ext4_ext_search_left: inode #8126473: (comm db2sysc) ix (60672) != EXT_FIRST_INDEX (0) (depth 1)!


Error message in logs after  error appear and reboot 

Nov  1 12:00:31 server MR_MONITOR[2678]: <MRMON044> Controller ID: 0  Time
established since power on: Time 2012-11-01,12:00:31 3860 Seconds
Nov  1 13:00:31 server MR_MONITOR[2678]: <MRMON044> Controller ID: 0  Time
established since power on: Time 2012-11-01,13:00:31 7460 Seconds
Nov  1 13:01:02 server kernel: process `grep' is using deprecated sysctl
(syscall) net.ipv6.neigh.default.retrans_time; Use
net.ipv6.neigh.default.retrans_t
ime_ms instead.
Nov  1 13:59:45 server kernel: EXT4-fs error (device sda8):
ext4_ext_search_left: inode #8126473: (comm db2sysc) ix (60672) !=
EXT_FIRST_INDEX (0) (depth 1)
!
Nov  1 13:59:45 server kernel: EXT4-fs error (device sda8):
ext4_ext_search_left: inode #8126473: (comm db2sysc) ix (60672) !=
EXT_FIRST_INDEX (0) (depth 1)
!
Nov  1 14:00:31 server MR_MONITOR[2678]: <MRMON044> Controller ID: 0  Time
established since power on: Time 2012-11-01,14:00:31 11060 Seconds
Nov  1 14:18:24 server kernel: EXT4-fs error (device sda8):
ext4_ext_search_left: inode #8126473: (comm db2sysc) ix (60672) !=
EXT_FIRST_INDEX (0) (depth 1)!
Nov  1 14:18:25 server kernel: EXT4-fs error (device sda8):
ext4_ext_search_left: inode #8126473: (comm db2sysc) ix (60672) !=
EXT_FIRST_INDEX (0) (depth 1)!
Nov  1 14:28:43 server kernel: EXT4-fs error (device sda8):
ext4_ext_search_left: inode #8126473: (comm db2sysc) ix (60672) !=
EXT_FIRST_INDEX (0) (depth 1)!
Nov  1 14:28:44 server kernel: EXT4-fs error (device sda8):
ext4_ext_search_left: inode #8126473: (comm db2sysc) ix (60672) !=
EXT_FIRST_INDEX (0) (depth 1)!
Nov  1 15:00:31 server MR_MONITOR[2678]: <MRMON044> Controller ID: 0  Time
established since power on: Time 2012-11-01,15:00:31 14660 Seconds
Nov  1 15:04:43 server kernel: EXT4-fs (sda8): mounted filesystem with ordered
data mode
Nov  1 15:19:44 server kernel: EXT4-fs error (device sda8):
ext4_ext_search_left: inode #8126473: (comm db2sysc) ix (60672) !=
EXT_FIRST_INDEX (0) (depth 1)!
Nov  1 15:19:44 server kernel: EXT4-fs error (device sda8):
ext4_ext_search_left: inode #8126473: (comm db2sysc) ix (60672) !=
EXT_FIRST_INDEX (0) (depth 1)!
Nov  1 15:25:00 server init: tty (/dev/tty2) main process (2702) killed by
TERM signal
Nov  1 15:25:00 server init: tty (/dev/tty3) main process (2704) killed by
TERM signal
Nov  1 15:25:00 server init: tty (/dev/tty4) main process (2706) killed by
TERM signal
Nov  1 15:25:00 server init: tty (/dev/tty5) main process (2708) killed by
TERM signal
Nov  1 15:25:00 server init: tty (/dev/tty6) main process (2711) killed by
TERM signal
Nov  1 15:25:00 server avahi-daemon[2029]: Got SIGTERM, quitting.
Nov  1 15:25:00 server avahi-daemon[2029]: Leaving mDNS multicast group on
interface eth0.IPv4 with address 10.8.20.217.
Nov  1 15:25:00 server avahi-daemon[2029]: Leaving mDNS multicast group on
interface usb0.IPv4 with address 169.254.95.120.
Nov  1 15:25:03 server abrtd: UnRegistered Analyzer plugin CCpp
Nov  1 15:25:03 server abrtd: UnRegistered Analyzer plugin Kerneloops
Nov  1 15:25:03 server abrtd: UnRegistered Action plugin KerneloopsScanner
Nov  1 15:25:03 server abrtd: UnRegistered Reporter plugin Logger
Nov  1 15:25:03 server abrtd: UnRegistered Reporter plugin Mailx
Nov  1 15:25:03 server abrtd: UnRegistered Analyzer plugin Python
Nov  1 15:25:03 server abrtd: UnRegistered Reporter plugin RHTSupport
Nov  1 15:25:03 server abrtd: UnRegistered Action plugin SOSreport
Nov  1 15:25:03 server abrtd: UnRegistered Database plugin SQLite3
Nov  1 15:25:03 server abrtd: Got signal 15, exiting
Nov  1 15:25:06 server acpid: exiting
Nov  1 15:25:07 server ntpd[2196]: ntpd exiting on signal 15
Nov  1 15:25:07 server lin_taped[2008]: lin_taped terminated.
Nov  1 15:25:09 server rpcbind: rpcbind terminating on signal. Restart with
"rpcbind -w"
Nov  1 15:25:09 server init: Disconnected from system bus
Nov  1 15:25:09 server auditd[3061]: The audit daemon is exiting.


Nov  1 15:25:09 server kernel: type=1305 audit(1351779909.381:54268):
audit_pid=0 old=3061 auid=4294967295 ses=4294967295
subj=system_u:system_r:auditd_t:s0 res=1

Comment 4 Kamil Dudka 2012-11-05 12:02:57 UTC

The filesystem component has nothing to do with the file system implementation in the kernel.  I am switching the component back to kernel...

Comment 5 Eric Sandeen 2012-11-13 16:01:56 UTC

I've actually got an image from upstream which reproduces a similar problem, and I'm looking at it now, so I'll take this one.

-Eric

Comment 6 Eric Sandeen 2012-11-13 16:08:27 UTC

Can you attach the output from:

# debugfs -R "stat <8126473> -R "dump_extents <8126473>" /dev/sda8

please?

Making an e2image -r of the filesystem would also capture its complete state, and may be useful for later analysis, if possible.  It will be a large (but sparse) file which should compress down fairly well.

Thanks,
-Eric

Comment 7 Eric Sandeen 2012-11-15 22:02:49 UTC

slight typo.  s/b:

# debugfs -R "stat <8126473>" -R "dump_extents <8126473>" /dev/sda8

Comment 8 Eric Sandeen 2012-11-15 22:04:28 UTC

As a workaround, copying that file and then renaming it over the old one will probably "fix" the existing corruption.

I've sent an e2fsck patch upstream to detect & fix what I think the resulting problem is.

Still not sure how we hit this in the first place, though.

Comment 11 Eric Sandeen 2012-11-29 19:30:51 UTC

I've sent an e2fsck patch upstream to fix this.  Still haven't found root cause.

Comment 14 Eric Sandeen 2012-12-21 04:09:00 UTC

There are now 3 e2fsprogs patches upstream to fix this.  I can verify the fix w/ the image you've provided.

I've been racking my brain trying to find out how we can hit this problem at runtime, and so far, no luck, unless we are hole-punching - but their version of the kernel does not support hole punching.

I think there is one more way where it could happen, and that is via e2fsck clearing corrupted blocks and not updating the extent tree properly.

It may be too far in the past, but is there any chance they did an e2fsck prior to this problem occurring, and it found some problems?  That could possibly explain it...

If they've restored from backup, they are probably ok for now, but we can get e2fsprogs updated to handle this corruption in the future.

FWIW, if you were trying to look at the image under RHEL5, you need to use "debuge4fs" which is a cloned & updated version of debugfs just for ext4 on RHEL5.

Sorry for the late update,
-eric

Comment 15 Eric Sandeen 2012-12-21 20:02:30 UTC

Here are the extents:

Level Entries       Logical            Physical Length Flags
 0/ 2   1/  1     0 - 73727    33794             73728
 1/ 2   1/ 29     0 -  3246     9250              3247
 2/ 2   1/  3     0 -     0 32538624 - 32538624      1 
 2/ 2   2/  3     1 -    15     9255 -     9269     15 Uninit
 2/ 2   3/  3    16 -  3246     9270 -    12500   3231 
 1/ 2   2/ 29  3247 -  6311     9252              3065
 2/ 2   1/  1  3247 -  6311    12501 -    15565   3065 
 1/ 2   3/ 29  6312 - 13535    25637              7224
 2/ 2   1/  2  6312 -  8191    15566 -    17445   1880 
 2/ 2   2/  2  8192 - 13535    34816 -    40159   5344 
 1/ 2   4/ 29 13536 - 13567    33795                32
 2/ 2   1/  1 13536 - 13567    40160 -    40191     32 
 1/ 2   5/ 29 13568 - 14883    33793              1316
 2/ 2   1/  1 13568 - 14883    40192 -    41507   1316 
 1/ 2   6/ 29 14884 - 21679    33798              6796
 2/ 2   1/  1 14884 - 21679    41508 -    48303   6796 
<snip>
 1/ 2  25/ 29 57524 - 59011 15538183              1488
 2/ 2   1/  2 57524 - 59011 15556788 - 15558275   1488 
 2/ 2   2/  2 59012 - 65535 15558276 - 15564799   6524 Uninit

 1/ 2  26/ 29 59012 - 60671 15538184              1660
 2/ 2   1/  2 59012 - 60671    25638 -    27297   1660 
 2/ 2   2/  2 60672 - 60689    27298 -    27315     18 Uninit

 1/ 2  27/ 29 60672 - 61023 15538185               352
 2/ 2   1/ 19 60690 - 60690    27316 -    27316      1 Uninit
 2/ 2   2/ 19 60691 - 60703    27317 -    27329     13 Uninit
 2/ 2   3/ 19 60704 - 60767    27330 -    27393     64 

<snip>

And here we see the problem; the level 27/29 indirect block claims to cover from 60672 onward; however the first extent under it starts at 60690, and the prior extent goes from 60672 to 60689; an overlap.

Interesting that we have uninit exents sprinkled in here.

Comment 16 Eric Sandeen 2012-12-21 20:27:03 UTC

With the new patches in e2fsprogs, e2fsck finds this corruption:

Pass 1: Checking inodes, blocks, and sizes
Interior extent node level 1 of inode 8126473:
Logical start 60672 does not match logical start 60690 at next level.  Fix? no

Interior extent node level 1 of inode 8126473:
Logical start 63157 does not match logical start 63159 at next level.  Fix? no

Comment 17 Eric Sandeen 2012-12-21 20:32:31 UTC

Created attachment 667406 [details]
dump_extents output

Full extent listing.  There seem to be other problems in the extent tree as well.

Comment 20 Eric Sandeen 2013-05-01 15:14:57 UTC

Thanks for the image.  Something is going wrong at runtime here, I think, and we don't yet know what.  We're going to need to find a testcase for this, most likely.

By the time the error is printed, the corruption has already occurred, I believe.

Comment 34 Eric Sandeen 2013-08-05 19:17:35 UTC

Ok, I see that there are 2 attachments of e2images from the customer, one of them is 03.  Sorry about that.  Were both filesystems resized in the same manner, prior to the error?

Comment 38 Eric Sandeen 2013-08-06 18:39:14 UTC

The 2 attached images from Stuart - were they after an e2fsck?  Neither current RHEL6 e2fsck nor upstream e2fsck finds any issues with them.

Also:  There are 3 cases attached to this bug.  Are they all related to DB2?  Was the filesystem resized in all 3 cases?  Just looking for the common thread here.

Comment 39 Jose Castillo 2013-08-07 13:45:48 UTC

I'll check with customer about oradata01 and oradata03, but I think Stuart was quite explicit on getting the images before fsck was run. 

About the three cases, Elvir's (original one) and Milan's (latest one attached) are related to DB2, and is the process db2sysc the one that detects the problem. Stuart's is related to oracle, or at least that is what the error tells us (comm oracle):

 kernel: EXT4-fs error (device dm-116): ext4_ext_search_left: inode #58904: (comm oracle) ix (2081024) != EXT_FIRST_INDEX (0) (depth 1)!

In all three cases (four filesystems affected as far as I know, one in Elvir's and Milan's cases, two in Stuart's case) the filesystem was resized. 

I'm going to use Milan's e2image to try to reproduce this. My idea is to restore the image, and then reduce the LV exactly as how they did. If the problem is in the e2fstools, then I should be able to reproduce. If I cannot, then I think we can assume that the problem is outside the filesystem and in DB2 and Oracle.

Comment 40 Eric Sandeen 2013-08-07 15:59:31 UTC

Thank you for the summary.  I'm sorry I'm not keeping it straight.

If resize is to blame, a pre-resize image would be most helpful, but we may still be able to tickle something even with what we've got.

Comment 41 Jose Castillo 2013-08-07 16:39:41 UTC

Confirmed that all images from the last two cases were created post-fsck. Elvir's is the only image that was obtained the right way, so I'll try to use his for the reproducer.

Comment 42 Eric Sandeen 2013-08-09 01:24:46 UTC

Ok, I'm pretty confident blaming resize2fs, and I think I have a reproducer, at least to get the image to a state where it shows some of the corruptions on Elvir's image (which, for the record, were:

Pass 1: Checking inodes, blocks, and sizes
Inode 8126473, end of extent exceeds allowed value
	(logical block 59012, physical block 15558276, len 6524)
Clear? no

Inode 8126473, end of extent exceeds allowed value
	(logical block 60672, physical block 27298, len 18)
Clear? no

Interior extent node level 1 of inode 8126473:
Logical start 60672 does not match logical start 60690 at next level.  Fix? no

Inode 8126473, end of extent exceeds allowed value
	(logical block 62992, physical block 29618, len 167)
Clear? no

Interior extent node level 1 of inode 8126473:
Logical start 63157 does not match logical start 63159 at next level.  Fix? no

Inode 8126473, i_blocks is 642256, should be 588584.  Fix? no

)

Here's my script:


#!/bin/bash

# Note: sdb1 is 2.6G

umount /dev/sdb1 &>/dev/null
mkfs.ext4 /dev/sdb1 &>/dev/null || exit

mount /dev/sdb1 /mnt/test 2.5G
# Mostly-fill the fs, to push fsx IOs towards the end of the fs
fallocate -l 1g /mnt/test/fill1
fallocate -l 1g /mnt/test/fill2

SEED=$RANDOM
echo "Seed is $SEED"

# fsx is slightly hacked from xfstests to remove any truncates; we always grow
ltp/fsx -S $SEED -p 1000 -N 10000 -l 301985792 -y -W -H -t 4096 -w 4096 /mnt/test/testfile
# Make room to shrink; fragmented fsx file will have to be moved down
rm -f /mnt/test/fill1
rm -f /mnt/test/fill2
umount /dev/sdb1
# 2 fscks; first no-op in case it's corrupt pre-resize, 2nd to satisfy resize2fs
e2fsck -fn /dev/sdb1 || exit 
e2fsck -fy /dev/sdb1 || exit

# Get a pre-resize image so we can analyze what went wrong if anything
e2image -r /dev/sdb1 image-backup.e2i
# Shrink it; this will move the complex file written by fsx
resize2fs /dev/sdb1 1G
e2fsck -fn /dev/sdb1 || exit


And I do get corruption:

Pass 1: Checking inodes, blocks, and sizes
Inode 14, end of extent exceeds allowed value
	(logical block 7817, physical block 32346, len 18)
Clear? no

Interior extent node level 1 of inode 14:
Logical start 7818 does not match logical start 7835 at next level.  Fix? no

Inode 14, end of extent exceeds allowed value
	(logical block 19709, physical block 40763, len 13)
Clear? no

Interior extent node level 1 of inode 14:
Logical start 19710 does not match logical start 19722 at next level.  Fix? no

Inode 14, end of extent exceeds allowed value
	(logical block 22990, physical block 32744, len 3)
Clear? no

Interior extent node level 1 of inode 14:
Logical start 22991 does not match logical start 22993 at next level.  Fix? no

Inode 14, end of extent exceeds allowed value
	(logical block 49594, physical block 28622, len 6)
Clear? no

Interior extent node level 1 of inode 14:
Logical start 49595 does not match logical start 49600 at next level.  Fix? no

Inode 14, end of extent exceeds allowed value
	(logical block 51924, physical block 47719, len 11)
Clear? no

Interior extent node level 1 of inode 14:
Logical start 51925 does not match logical start 51935 at next level.  Fix? no

Inode 14, i_blocks is 233040, should be 232632.  Fix? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information


So this is a good result; it certainly points to resize2fs having issues with moving complicated extent-based files.

Comment 43 Eric Sandeen 2013-08-09 01:47:48 UTC

Bug persists upstream (kernel + e2fsprogs), and doesn't seem to be related to uninitialized extents, FWIW.

Comment 44 Eric Sandeen 2013-08-09 04:06:20 UTC

It's always a 1-liner.

diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index 65bb099..8f8d294 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -1430,6 +1430,9 @@ errcode_t ext2fs_extent_set_bmap(ext2_extent_handle_t handle,
 		retval = ext2fs_extent_replace(handle, 0, &extent);
 		if (retval)
 			goto done;
+		retval = ext2fs_extent_fix_parents(handle);
+		if (retval)
+			goto done;
 	} else {
 		__u32	orig_length;
 

So what's happening here is that we started with an extent tree like this (from debuge2fs's "ex" command):

Level Entries       Logical            Physical Length Flags
...
 2/ 2  60/ 63 13096 - 13117 650024 - 650045     22
 2/ 2  61/ 63 13134 - 13142 650062 - 650070      9
 2/ 2  62/ 63 13193 - 13194 650121 - 650122      2
 2/ 2  63/ 63 13227 - 13227 650155 - 650155      1 A)
 1/ 2   4/ 14 13228 - 17108 655367            3881 B)
 2/ 2   1/117 13228 - 13251 650156 - 650179     24 C)
 2/ 2   2/117 13275 - 13287 650203 - 650215     13
 2/ 2   3/117 13348 - 13353 650276 - 650281      6
... 

the 2/ 2 extents are leaves, 1/ 2 are interior nodes.

All of these extents are beyond the resize point, so must be moved.

Single-block extent A) gets moved to a lower block, and then the code starts moving the next logical blocks one at a time - those in extent C)

The userspace extent code tries to merge, so when it finds that logical 13228 can be merged with logical 13227 into a single extent, it does.  And so on, all through extent C), up to block 13250 (why not 13251?  Dunno).  SO we end up with this when all the blocks are moved post-resize:

Level Entries       Logical            Physical Length Flags
...
 2/ 2 120/122 13193 - 13193  33220 -  33220      1
 2/ 2 121/122 13194 - 13194  33221 -  33221      1
 2/ 2 122/122 13227 - 13250  33222 -  33245     24 D)
 1/ 2   5/ 19 13228 - 17108  34676            3881 E)
 2/ 2   1/222 13251 - 13251  33246 -  33246      1 F)
 2/ 2   2/222 13275 - 13286  33247 -  33258     12

All those adjacent blocks got moved into extent D), which is nice - but the next interior node E) was never updated to reflect its new start point - it says the leaf extents beneath it start at 13228, when in fact they start at 13251.  (why F) is left with a single block, I'm not sure - that's another minor buglet, but at least it's not corruption).

So as we move blocks one by one out of original extent C) above, we need to keep updating C)'s parent node B) for a proper starting point.  fix_parents() does this.

Once the tree is corrupted like this, I think more corruption can ensue post-resize, because we traverse the tree by interior nodes, relying on their start block to know where we are in the tree.  If it gets off, we'll end up inserting blocks into the wrong part of the tree, etc, as we saw on the original customer image.

Comment 45 Eric Sandeen 2013-08-09 05:16:55 UTC

Created attachment 784719 [details]
Example fs image

Here's an example image which checks clean, but when shrunk to 1G with:

# bunzip2 testimage.qcow.bz2
# qemu-img convert -f raw testimage.e2i
# resize2fs testimage.e2i 1G

will show similar corruption:

# e2fsck -fn testimage.e2i

Comment 46 Eric Sandeen 2013-08-09 05:20:15 UTC

Resetting component to e2fsprogs & requesting blocker for RHEL6.5, this is a filesystem corruption case that would be very good to get fixed asap - probably zstream material too.

Comment 48 Stuart Auchterlonie 2013-08-09 09:50:14 UTC

Eric,

Do we have a specific version of fsck which detects and fixes this
corruption?

Comment 49 Eric Sandeen 2013-08-09 15:02:14 UTC

Yep, 

* Thu Jun 06 2013 Eric Sandeen <sandeen> 1.41.12-15
- Enhance e2fsck detection of invalid extent trees (#922847)


and it went to z-stream as well:

e2fsprogs-1.41.12-14.el6_4.2

However, 2 things:

Unfortunately that version sees blocks past EOF as corruption; this is not overly problematic, because a) it's a fairly rare situation, and b) blocks past EOF are only preallocated, and don't contain data anyway.

Also, e2fsck will fix the fs insofar as to make it consistent again, but if we get to the point where we have overlapping extents due to this bug, fsck is unlikely to correctly recover all file data.  It will make a decision about which blocks to keep and which to throw away, but that may not be correct.

If the data is critical, it may be better to go to backups or even try some hand recovery, for better chances of success...

Comment 55 Eric Sandeen 2013-08-12 22:45:41 UTC

Fixed in e2fsprogs-1.42.12-17.el6

Comment 58 Eryu Guan 2013-09-09 09:14:15 UTC

Reproduced with e2fsprogs-1.41.12-14.el6_4.2, this is the only e2fsprogs version I can find in brew that could detect the corruption and have no fix for this bug.

[root@hp-dl388g8-03 bz873201-e2fsprogs]# rpm -q e2fsprogs
e2fsprogs-1.41.12-14.el6_4.2.x86_64

# Setup 2.6G loop device
dd if=/dev/zero of=test.img bs=1M count=2662
# Hack ltp/fsx.c from xfstests to remove any truncates
[root@hp-dl388g8-03 xfstests]# git diff
diff --git a/ltp/fsx.c b/ltp/fsx.c
index 2f1e3e8..b2d1cd4 100644
--- a/ltp/fsx.c
+++ b/ltp/fsx.c
@@ -1076,7 +1076,8 @@ test(void)
        case OP_TRUNCATE:
                if (!style)
                        size = random() % maxfilelen;
-               dotruncate(size);
+               if (size > file_size)
+                       dotruncate(size);
                break;
 
        case OP_FALLOCATE:

# Create mount point
mkdir mnt
# Modify the test script from comment 42 to accept two arguments
./bz873201.sh /dev/loop0 mnt

# e2fsck shows corruption
Seed is 25696
Seed set to 25696
mapped writes DISABLED
skipping zero size read
fallocating to largest ever: 0x33bad98
fallocating to largest ever: 0x91ceebb
truncating to largest ever: 0x11b51000
fallocating to largest ever: 0x11bd39f9
fallocating to largest ever: 0x11d16d4e
fallocating to largest ever: 0x11fce207
2000 write      0xbd54000 thru  0xbd6190a       (0xd90b bytes)
fallocating to largest ever: 0x11fe8b21
3000 write      0xffcc000 thru  0xffd274e       (0x674f bytes)
4000 falloc     from 0x2912419 to 0x2919503 (0x70ea bytes)
skipping zero length fallocate
5000 mapread    0xd245642 thru  0xd246799       (0x1158 bytes)
6000 write      0x10d69000 thru 0x10d6f04e      (0x604f bytes)
7000 write      0x7fd9000 thru  0x7fe5d9a       (0xcd9b bytes)
8000 falloc     from 0x9d4e7a to 0x9e2cc0 (0xde46 bytes)
9000 write      0xf284000 thru  0xf292bd1       (0xebd2 bytes)
10000 write     0x104e2000 thru 0x104e38e6      (0x18e7 bytes)
All operations completed A-OK!
e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/loop0: 14/170688 files (7.1% non-contiguous), 57538/681472 blocks
e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/loop0: 14/170688 files (7.1% non-contiguous), 57538/681472 blocks
e2image 1.41.12 (17-May-2010)
resize2fs 1.41.12 (17-May-2010)
Resizing the filesystem on /dev/loop0 to 262144 (4k) blocks.
The filesystem on /dev/loop0 is now 262144 blocks long.

e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Inode 14, end of extent exceeds allowed value
        (logical block 209, physical block 40989, len 18)
Clear? no

Interior extent node level 1 of inode 14:
Logical start 210 does not match logical start 227 at next level.  Fix? no

Inode 14, end of extent exceeds allowed value
        (logical block 13491, physical block 34401, len 11)
Clear? no

Interior extent node level 1 of inode 14:
Logical start 13492 does not match logical start 13502 at next level.  Fix? no

Inode 14, end of extent exceeds allowed value
        (logical block 33046, physical block 36229, len 13)
Clear? no

Interior extent node level 1 of inode 14:
Logical start 33047 does not match logical start 33059 at next level.  Fix? no

Inode 14, end of extent exceeds allowed value
        (logical block 63511, physical block 49921, len 3)
Clear? no

Interior extent node level 1 of inode 14:
Logical start 63512 does not match logical start 63514 at next level.  Fix? no

Inode 14, i_blocks is 235496, should be 235136.  Fix? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(34401--34411) -(36229--36241) -(40989--41006) -(49921--49923)
Fix? no


/dev/loop0: ********** WARNING: Filesystem still has errors **********

/dev/loop0: 14/65024 files (14.3% non-contiguous), 50752/262144 blocks


Verified with e2fsprogs-1.41.12-18.el6

[root@hp-dl388g8-03 bz873201-e2fsprogs]# rpm -q e2fsprogs
e2fsprogs-1.41.12-18.el6.x86_64
[root@hp-dl388g8-03 bz873201-e2fsprogs]# ./bz873201.sh /dev/loop0 mnt
Seed is 743
Seed set to 743
mapped writes DISABLED
skipping zero size read
fallocating to largest ever: 0x33bad98
fallocating to largest ever: 0x91ceebb
truncating to largest ever: 0x11b51000
fallocating to largest ever: 0x11bd39f9
fallocating to largest ever: 0x11d16d4e
fallocating to largest ever: 0x11fce207
2000 write      0xbd54000 thru  0xbd6190a       (0xd90b bytes)
fallocating to largest ever: 0x11fe8b21
3000 write      0xffcc000 thru  0xffd274e       (0x674f bytes)
4000 falloc     from 0x2912419 to 0x2919503 (0x70ea bytes)
skipping zero length fallocate
5000 mapread    0xd245642 thru  0xd246799       (0x1158 bytes)
6000 write      0x10d69000 thru 0x10d6f04e      (0x604f bytes)
7000 write      0x7fd9000 thru  0x7fe5d9a       (0xcd9b bytes)
8000 falloc     from 0x9d4e7a to 0x9e2cc0 (0xde46 bytes)
9000 write      0xf284000 thru  0xf292bd1       (0xebd2 bytes)
10000 write     0x104e2000 thru 0x104e38e6      (0x18e7 bytes)
All operations completed A-OK!
e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/loop0: 14/170688 files (7.1% non-contiguous), 57538/681472 blocks
e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/loop0: 14/170688 files (7.1% non-contiguous), 57538/681472 blocks
e2image 1.41.12 (17-May-2010)
resize2fs 1.41.12 (17-May-2010)
Resizing the filesystem on /dev/loop0 to 262144 (4k) blocks.
The filesystem on /dev/loop0 is now 262144 blocks long.

e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/loop0: 14/65024 files (14.3% non-contiguous), 50752/262144 blocks

Set to VERIFIED

Comment 62 Lukáš Czerner 2013-10-25 11:46:44 UTC

*** Bug 1022175 has been marked as a duplicate of this bug. ***

Comment 63 errata-xmlrpc 2013-11-21 22:42:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1689.html

Comment 64 Lachlan McIlroy 2014-01-14 00:45:23 UTC

The patch here is to e2fsprogs to recover from the corruption.  Is there any kernel side patch to prevent the corruption in the first place?

Comment 65 Eric Sandeen 2014-01-14 00:47:58 UTC

Yes, I believe it was the patch that fixes parents on an extent split, let me find the bug...

Comment 66 Eric Sandeen 2014-01-14 00:50:13 UTC

Wait, fix_parents was userspace.  ;)  Hang on, still looking/remembering.

Comment 67 Eric Sandeen 2014-01-14 00:56:05 UTC

Sorry, I had my bugs confused.  IIRC this corruption was _caused_ by e2fsprogs, in this case by an offline shrink.  See comment #44.  The resolution for this bug fixes the root cause.

This fix:

- Enhance e2fsck detection of invalid extent trees (#922847)

enhances e2fsck to find & fix the result of this bug (overlapping extents).

-Eric

Comment 68 Lachlan McIlroy 2014-01-14 01:09:14 UTC

Thanks Eric.

I have a customer reporting fs corruption with the same messages in this BZ but they have not done a shrink (a few grows but nothing recent).  Still gathering info on the case but it looks like something kernel side has caused it.

Note You need to log in before you can comment on or make changes to this bug.