Description of problem: Customer reports that fsck is segfaulting every time when checking a particular volume. # fsck.ext3 /dev/VolGroup99/LogVol00 e2fsck 1.39 (29-May-2006) sasroot has been mounted 34 times without being checked, check forced. Pass 1: Checking inodes, blocks, and sizes Segmentation fault Version-Release number of selected component (if applicable): e2fsprogs-1.39-23.el5.x86_64 How reproducible: Everytime the customer runs fsck on the above volume. Additional info: I asked the customer for an e2image of that volume and ran fsck on it but it succeeded without a problem: e2fsck 1.39 (29-May-2006) sasroot has been mounted 34 times without being checked, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information sasroot: 298241/10485760 files (0.8% non-contiguous), 2629160/20971520 blocks I asked them to run fsck under gdb to get a stacktrace: (gdb) run /dev/VolGroup99/LogVol00 Starting program: /sbin/fsck.ext3 /dev/VolGroup99/LogVol00 e2fsck 1.39 (29-May-2006) sasroot has been mounted 34 times without being checked, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Program received signal SIGSEGV, Segmentation fault. 0x000000000041c647 in get_icount_el (icount=0xbe48030, ino=1671252, create=1) at icount.c:257 257 if (ino == icount->list[mid].ino) { (gdb) bt #0 0x000000000041c647 in get_icount_el (icount=0xbe48030, ino=1671252, create=1) at icount.c:257 #1 0x000000000041cd75 in ext2fs_icount_increment (icount=0xbe48030, ino=1671252, ret=0x7fff41c59bce) at icount.c:378 #2 0x000000000040ada4 in check_dir_block (fs=0xbb41e30, db=0x2af49ba21e68, priv_data=0x7fff41c59c50) at pass2.c:1014 #3 0x000000000041a673 in ext2fs_dblist_iterate (dblist=0xbb5dd30, func=0x40a720 <check_dir_block>, priv_data=0x7fff41c59c50) at dblist.c:235 #4 0x000000000040a2d7 in e2fsck_pass2 (ctx=0xbb41b30) at pass2.c:149 #5 0x00000000004030a6 in e2fsck_run (ctx=0xbb41b30) at e2fsck.c:203 #6 0x00000000004023af in main (argc=<value optimized out>, argv=<value optimized out>) at unix.c:1148 It looks like 'mid' has a bad value and we've run off the end of the list array. It corresponds to this code in lib/ext2fs/icount.c:get_icount_el() while (low <= high) { #if 0 mid = (low+high)/2; #else if (low == high) mid = low; else { /* Interpolate for efficiency */ lowval = icount->list[low].ino; highval = icount->list[high].ino; if (ino < lowval) range = 0; else if (ino > highval) range = 1; else range = ((float) (ino - lowval)) / (highval - lowval); mid = low + ((int) (range * (high-low))); } #endif if (ino == icount->list[mid].ino) { icount->cursor = mid+1; return &icount->list[mid]; } if (ino < icount->list[mid].ino) high = mid-1; else low = mid+1; } So looks like we might have a floating point error when calculating mid.
Created attachment 426741 [details] Patch that should fix the problem This patch was sourced from: http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=commitdiff;h=641b66bc7ee0a880b0eb0125dff5f8ed8dd5a160
Created attachment 427842 [details] Patch to prevent floating point precision errors This version of the patch fixes two more cases of the same bug. The first patch was tested by the customer and it allowed e2fsck to run a bit further before hitting the same bug in another bit of code: Program received signal SIGSEGV, Segmentation fault. 0x0000000000413f0f in get_refcount_el (refcount=0x18f4d730, blk=12354051, create=0) at ea_refcount.c:202 202 if (blk == refcount->list[mid].ea_blk) { This patch fixes this case and also a third case found by inspection.
Lachlan, thanks for the patch and the upstream submission
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Built & tagged in e2fsprogs-1.39-27.el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1080.html
*** Bug 1005192 has been marked as a duplicate of this bug. ***