Bug 997982

Summary: fsck crash with corrupted file system
Product: Red Hat Enterprise Linux 7 Reporter: Hubert Kario <hkario>
Component: e2fsprogsAssignee: Lukáš Czerner <lczerner>
Status: CLOSED CURRENTRELEASE QA Contact: Eryu Guan <eguan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0CC: eguan, esandeen, josef, kzak, lczerner, oliver, rwheeler, tthakur
Target Milestone: rcKeywords: Regression, Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: e2fsprogs-1.42.9-4.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 997972 Environment:
Last Closed: 2014-06-13 09:19:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 997972    
Bug Blocks:    
Attachments:
Description Flags
Corrupted file system
none
Patch to fix the real problem none

Description Hubert Kario 2013-08-16 16:49:56 UTC
Created attachment 787363 [details]
Corrupted file system

RHEL-7 crash:

e2fsprogs-1.42.8-2.el7.x86_64

fsck from util-linux 2.22.2
/dev/loop0: recovering journal
fsck.ext4: Bad magic number in super-block while trying to re-open /dev/loop0
Signal (11) SIGSEGV si_code=SEGV_MAPERR fault addr=0x61
fsck.ext4[0x426f20]
/lib64/libc.so.6[0x35e5035cd0]
/lib64/libext2fs.so.2(ext2fs_mmp_stop+0x1c)[0x35e60237cc]
fsck.ext4(fatal_error+0x42)[0x41dc92]
fsck.ext4(e2fsck_run_ext3_journal+0x2d3)[0x41d1d3]
fsck.ext4(main+0x71e)[0x409a3e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x35e5021a05]
fsck.ext4[0x40bc6d]

+++ This bug was initially created as a clone of Bug #997972 +++

Description of problem:
When checking a corrupted file system (attached) fsck crashes

Version-Release number of selected component (if applicable):
e2fsprogs-1.42.8-1.fc20.x86_64

How reproducible:
Always

Steps to Reproduce:
1. gunzip disk.img.00000000000000000006.wrk_7.gz
2. losetup /dev/loop0 disk.img.00000000000000000006.wrk_7
3. fsck -f -p /dev/loop0

Actual results:
fsck from util-linux 2.23.1
/dev/loop0: recovering journal
fsck.ext4: Bad magic number in super-block while trying to re-open /dev/loop0
Signal (11) SIGSEGV si_code=SEGV_MAPERR fault addr=0x61
fsck.ext4[0x426771]
/lib64/libc.so.6(+0x37300)[0x7f20dc272300]
/lib64/libext2fs.so.2(ext2fs_mmp_stop+0xd)[0x359b02316d]
fsck.ext4(fatal_error+0x42)[0x41db52]
fsck.ext4(e2fsck_run_ext3_journal+0x243)[0x41d0c3]
fsck.ext4(main+0x6ef)[0x409b7f]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f20dc25cfa5]
fsck.ext4[0x40be75]

Expected results:
no crash

Additional info:
File system created by truncating writes to disk to 256 bytes

--- Additional comment from Hubert Kario on 2013-08-16 12:20:25 EDT ---

The bug is also present on Fedora 18:

e2fsprogs-1.42.5-1.fc18.x86_64

fsck from util-linux 2.22.2
/dev/loop0: recovering journal
fsck.ext4: Bad magic number in super-block while trying to re-open /dev/loop0
Signal (11) SIGSEGV si_code=SEGV_MAPERR fault addr=0x61
fsck.ext4[0x426f20]
/lib64/libc.so.6[0x35e5035cd0]
/lib64/libext2fs.so.2(ext2fs_mmp_stop+0x1c)[0x35e60237cc]
fsck.ext4(fatal_error+0x42)[0x41dc92]
fsck.ext4(e2fsck_run_ext3_journal+0x2d3)[0x41d1d3]
fsck.ext4(main+0x71e)[0x409a3e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x35e5021a05]
fsck.ext4[0x40bc6d]

--- Additional comment from Eric Sandeen on 2013-08-16 12:34:36 EDT ---

persists upstream too

--- Additional comment from Eric Sandeen on 2013-08-16 12:41:29 EDT ---

Program received signal SIGSEGV, Segmentation fault.
ext2fs_mmp_stop (fs=0x67c3b0) at mmp.c:374
374		if (!(fs->super->s_feature_incompat & EXT4_FEATURE_INCOMPAT_MMP) ||
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64
(gdb) bt
#0  ext2fs_mmp_stop (fs=0x67c3b0) at mmp.c:374
#1  0x00000000004249ce in fatal_error (ctx=0x67c000, msg=<value optimized out>) at util.c:59
#2  0x0000000000423183 in e2fsck_run_ext3_journal (ctx=0x67c000) at journal.c:973
#3  0x0000000000410574 in main (argc=<value optimized out>, argv=<value optimized out>) at unix.c:1500
(gdb) p fs->super
$1 = (struct ext2_super_block *) 0x0

Comment 2 Eric Sandeen 2013-08-16 17:01:33 UTC
Is this from fuzz testing?

Comment 4 Eric Sandeen 2014-02-18 17:59:31 UTC
commit 7ff040f30f0ff3bf5e2c832da3cb577e00a52d60
Author: Eric Sandeen <sandeen>
Date:   Mon Sep 9 10:33:20 2013 -0400

    e2fsck: don't try to stop mmp if there is no superblock set up
    
    Under some failure cases, we can get to fatal_error()
    without even having a superblock set up.  In that case,
    ext2fs_mmp_stop() will segfault when it tries to dereference
    fs->super.
    
    Check for the existence of a superblock before we go
    down the ext2fs_mmp_stop() path to avoid this problem.
    
    Reported-by: Hubert Kario <hkario>
    Addresses-Red-Hat-Bugzilla: #997972
    Signed-off-by: Eric Sandeen <sandeen>
    Signed-off-by: "Theodore Ts'o" <tytso>

Comment 5 Eric Sandeen 2014-02-18 18:06:07 UTC
Ho hum, I guess it doesn't fix it after all.

Comment 6 Eric Sandeen 2014-02-18 18:07:45 UTC
Actually, as of that commit, it does not crash.  Something after that seems to have broken it again.  Fantastico!

Comment 7 Eric Sandeen 2014-02-18 19:52:00 UTC
No, wait ;)  For me it does work in 1.42.9 as well as git upstream.  Hubert, can you re-test w/ latest e2fsprogs-v1.42.9 in RHEL7?

Thanks,
-Eric

Comment 8 Hubert Kario 2014-02-19 10:28:53 UTC
I can confirm that e2fsprogs-1.42.9-3.el7.x86_64 don't crash with this fs image.

Comment 9 Lukáš Czerner 2014-02-19 10:57:04 UTC
I disagree, it is still reproducible for me.

# cp disk.img.00000000000000000006.back disk.img.00000000000000000006
# fsck.ext4 disk.img.00000000000000000006

# fsck.ext4 disk.img.00000000000000000006
e2fsck 1.42.9 (28-Dec-2013)
disk.img.00000000000000000006: obnovuje se žurnál
fsck.ext4: Chybné magické číslo v superbloku při pokusu znovu otevřít disk.img.00000000000000000006
Signal (11) SIGSEGV si_code=SEGV_MAPERR fault addr=0x7f9000000005
fsck.ext4[0x4275c1]
/lib64/libc.so.6(+0x35a00)[0x7f9081598a00]
fsck.ext4(fatal_error+0x50)[0x41e410]
fsck.ext4(e2fsck_run_ext3_journal+0x2d3)[0x41d973]
fsck.ext4(main+0x6ef)[0x409b8f]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f9081584af5]
fsck.ext4[0x40bec9]


yum info e2fsprogs
Loaded plugins: auto-update-debuginfo, product-id, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Installed Packages
Name        : e2fsprogs
Arch        : x86_64
Version     : 1.42.9
Release     : 3.el7
Size        : 2.4 M
Repo        : installed
From repo   : beaker-Server
Summary     : Utilities for managing ext2, ext3, and ext4 filesystems
URL         : http://e2fsprogs.sourceforge.net/
License     : GPLv2
Description : The e2fsprogs package contains a number of utilities for creating,
            : checking, modifying, and correcting any inconsistencies in second,
            : third and fourth extended (ext2/ext3/ext4) filesystems. E2fsprogs
            : contains e2fsck (used to repair filesystem inconsistencies after an
            : unclean shutdown), mke2fs (used to initialize a partition to contain
            : an empty ext2 filesystem), debugfs (used to examine the internal
            : structure of a filesystem, to manually repair a corrupted
            : filesystem, or to create test cases for e2fsck), tune2fs (used to
            : modify filesystem parameters), and most of the other core ext2fs
            : filesystem utilities.
            : 
            : You should install the e2fsprogs package if you need to manage the
            : performance of an ext2, ext3, or ext4 filesystem.


Moreover this is not really fixed upstream. It is just a coincidence that this is not present upstream the real problem is still present both RHEL7 and upstream. The real problem is that while ext2fs_free() will actually free the ext2_filsys structure the caller will still have the pointer set (ctx->fs) which may result in null pointer dereference while accessing (ctx->fs->io->magic) because the io structure has been free properly and it's pointer has been set to NULL.

in e2fsprogs we're not using ext2fs_free() in some places (in other places we do). Since the pointed to fs is not set to NULL some places are setting it to NULL manually. This however is in contrast with ext2fs_free_mem() which will set the pointer to NULL for you (the pointer to the pointer is expected). This probably confused people so the best fix would be for ext2fs_free() to take pointer to pointer as well and set the pointer to NULL for us.

I am testing the fix right now, it should fix the issue for RHEL7 as well as upstream (even though we can not reproduce this particular case). However I am not sure whether we want to get it into RHEL7 since we would have to push it before it will be pulled upstream.

Thanks!
-Lukas

Comment 10 Hubert Kario 2014-02-19 11:20:36 UTC
interesting, in my case the output looks like this:

[root@rhel7-64 tmp]# gunzip disk.img.00000000000000000006.gz 
[root@rhel7-64 tmp]# cp disk.img.00000000000000000006{,.new}
[root@rhel7-64 tmp]# e2fsck -f disk.img.00000000000000000006.new 
e2fsck 1.42.9 (28-Dec-2013)
disk.img.00000000000000000006.new: recovering journal
e2fsck: Bad magic number in super-block while trying to re-open disk.img.00000000000000000006.new

disk.img.00000000000000000006.new: ********** WARNING: Filesystem still has errors **********

[root@rhel7-64 tmp]# echo $?
12

Comment 11 Lukáš Czerner 2014-02-19 11:24:30 UTC
Created attachment 865039 [details]
Patch to fix the real problem

Here is a patch to fix the problem mentioned in the comment above. It seems to work fairly well so I'll test it some more and send it to the list. Then we can think about porting it back to RHEL7 if there is still time to do so.

Comment 12 Lukáš Czerner 2014-02-19 11:28:43 UTC
Hubert,

you might be just lucky enough that one of the pointers was overwritten by 0. when it comes to referencing freed memory you can never know what to find there.

It's 100% reliably reproducible for me.

-Lukas

Comment 13 Hubert Kario 2014-02-19 11:46:37 UTC
Ahh yes, under valgrind I can get it to crash:

# valgrind --free-fill=c0 e2fsck -f -p disk.img.00000000000000000006.new 
==12702== Memcheck, a memory error detector
==12702== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==12702== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==12702== Command: e2fsck -f -p disk.img.00000000000000000006.new
==12702== 
==12702== Warning: noted but unhandled ioctl 0x127c with no size/direction hints
==12702==    This could cause spurious value errors to appear.
==12702==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
disk.img.00000000000000000006.new: recovering journal
==12702== Warning: noted but unhandled ioctl 0x127c with no size/direction hints
==12702==    This could cause spurious value errors to appear.
==12702==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
e2fsck: Bad magic number in super-block while trying to re-open disk.img.00000000000000000006.new
==12702== Invalid read of size 8
==12702==    at 0x41E3F3: fatal_error (in /usr/sbin/e2fsck)
==12702==    by 0x41D972: e2fsck_run_ext3_journal (in /usr/sbin/e2fsck)
==12702==    by 0x409B8E: main (in /usr/sbin/e2fsck)
==12702==  Address 0x5ea5298 is 8 bytes inside a block of size 296 free'd
==12702==    at 0x4C29577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==12702==    by 0x41D797: e2fsck_run_ext3_journal (in /usr/sbin/e2fsck)
==12702==    by 0x409B8E: main (in /usr/sbin/e2fsck)
==12702== 
==12702== Invalid read of size 8
==12702==    at 0x41E3FA: fatal_error (in /usr/sbin/e2fsck)
==12702==    by 0x41D972: e2fsck_run_ext3_journal (in /usr/sbin/e2fsck)
==12702==    by 0x409B8E: main (in /usr/sbin/e2fsck)
==12702==  Address 0x5ea52b0 is 32 bytes inside a block of size 296 free'd
==12702==    at 0x4C29577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==12702==    by 0x41D797: e2fsck_run_ext3_journal (in /usr/sbin/e2fsck)
==12702==    by 0x409B8E: main (in /usr/sbin/e2fsck)
==12702== 
==12702== Invalid read of size 8
==12702==    at 0x4E56906: ext2fs_mmp_stop (in /usr/lib64/libext2fs.so.2.4)
==12702==    by 0x41E408: fatal_error (in /usr/sbin/e2fsck)
==12702==    by 0x41D972: e2fsck_run_ext3_journal (in /usr/sbin/e2fsck)
==12702==    by 0x409B8E: main (in /usr/sbin/e2fsck)
==12702==  Address 0x5ea52b0 is 32 bytes inside a block of size 296 free'd
==12702==    at 0x4C29577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==12702==    by 0x41D797: e2fsck_run_ext3_journal (in /usr/sbin/e2fsck)
==12702==    by 0x409B8E: main (in /usr/sbin/e2fsck)
==12702== 
==12702== Invalid read of size 1
==12702==    at 0x4E5690D: ext2fs_mmp_stop (in /usr/lib64/libext2fs.so.2.4)
==12702==    by 0x41E408: fatal_error (in /usr/sbin/e2fsck)
==12702==    by 0x41D972: e2fsck_run_ext3_journal (in /usr/sbin/e2fsck)
==12702==    by 0x409B8E: main (in /usr/sbin/e2fsck)
==12702==  Address 0xc0c0c0c0c0c0c121 is not stack'd, malloc'd or (recently) free'd
==12702== 
Signal (11) SIGSEGV si_code=SI_KERNEL fault addr=(nil)
e2fsck[0x4275c1]
/lib64/libc.so.6(+0x35a00)[0x58f8a00]
/lib64/libext2fs.so.2(ext2fs_mmp_stop+0xd)[0x4e5690d]
e2fsck(fatal_error+0x49)[0x41e409]
e2fsck(e2fsck_run_ext3_journal+0x2d3)[0x41d973]
e2fsck(main+0x6ef)[0x409b8f]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x58e4af5]
e2fsck[0x40bec9]
==12702== 
==12702== HEAP SUMMARY:
==12702==     in use at exit: 2,628 bytes in 53 blocks
==12702==   total heap usage: 295 allocs, 242 frees, 166,211 bytes allocated
==12702== 
==12702== LEAK SUMMARY:
==12702==    definitely lost: 0 bytes in 0 blocks
==12702==    indirectly lost: 0 bytes in 0 blocks
==12702==      possibly lost: 0 bytes in 0 blocks
==12702==    still reachable: 2,628 bytes in 53 blocks
==12702==         suppressed: 0 bytes in 0 blocks
==12702== Rerun with --leak-check=full to see details of leaked memory
==12702== 
==12702== For counts of detected and suppressed errors, rerun with: -v
==12702== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 3 from 3)

Comment 15 Eric Sandeen 2014-02-19 16:44:51 UTC
Ok, thanks Lukas, and sorry for the premature closing - just trying to get some bugs behind us.  ;)  Thanks for looking into this.

Comment 16 Ric Wheeler 2014-02-25 17:36:28 UTC
Sounds like you have a patch for this and we need to get blocker status set?

Thanks!

Comment 19 Eryu Guan 2014-03-12 10:48:15 UTC
I cannot reproduce the crash either if I run e2fsck test.img directly, but following comment 13 I can hit the crash with e2fsprogs-1.42.9-3.el7

[root@hp-dl388eg8-01 ~]# valgrind --free-fill=c0 e2fsck test.img
==1801== Memcheck, a memory error detector
==1801== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==1801== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==1801== Command: e2fsck test.img
==1801==
e2fsck 1.42.9 (28-Dec-2013)
==1801== Warning: noted but unhandled ioctl 0x127c with no size/direction hints
==1801==    This could cause spurious value errors to appear.
==1801==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
test.img: recovering journal
==1801== Warning: noted but unhandled ioctl 0x127c with no size/direction hints
==1801==    This could cause spurious value errors to appear.
==1801==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
e2fsck: Bad magic number in super-block while trying to re-open test.img
==1801== Invalid read of size 8
==1801==    at 0x41E3F3: fatal_error (in /usr/sbin/e2fsck)
==1801==    by 0x41D972: e2fsck_run_ext3_journal (in /usr/sbin/e2fsck)
==1801==    by 0x409B8E: main (in /usr/sbin/e2fsck)
==1801==  Address 0x5ea5d48 is 8 bytes inside a block of size 296 free'd
==1801==    at 0x4C29577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==1801==    by 0x41D797: e2fsck_run_ext3_journal (in /usr/sbin/e2fsck)
==1801==    by 0x409B8E: main (in /usr/sbin/e2fsck)
==1801==
==1801== Invalid read of size 8
==1801==    at 0x41E3FA: fatal_error (in /usr/sbin/e2fsck)
==1801==    by 0x41D972: e2fsck_run_ext3_journal (in /usr/sbin/e2fsck)
==1801==    by 0x409B8E: main (in /usr/sbin/e2fsck)
==1801==  Address 0x5ea5d60 is 32 bytes inside a block of size 296 free'd
==1801==    at 0x4C29577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==1801==    by 0x41D797: e2fsck_run_ext3_journal (in /usr/sbin/e2fsck)
==1801==    by 0x409B8E: main (in /usr/sbin/e2fsck)
==1801== 
==1801== Invalid read of size 8
==1801==    at 0x4E56906: ext2fs_mmp_stop (in /usr/lib64/libext2fs.so.2.4)
==1801==    by 0x41E408: fatal_error (in /usr/sbin/e2fsck)
==1801==    by 0x41D972: e2fsck_run_ext3_journal (in /usr/sbin/e2fsck)
==1801==    by 0x409B8E: main (in /usr/sbin/e2fsck)
==1801==  Address 0x5ea5d60 is 32 bytes inside a block of size 296 free'd
==1801==    at 0x4C29577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==1801==    by 0x41D797: e2fsck_run_ext3_journal (in /usr/sbin/e2fsck)
==1801==    by 0x409B8E: main (in /usr/sbin/e2fsck)
==1801==
==1801== Invalid read of size 1
==1801==    at 0x4E5690D: ext2fs_mmp_stop (in /usr/lib64/libext2fs.so.2.4)
==1801==    by 0x41E408: fatal_error (in /usr/sbin/e2fsck)
==1801==    by 0x41D972: e2fsck_run_ext3_journal (in /usr/sbin/e2fsck)
==1801==    by 0x409B8E: main (in /usr/sbin/e2fsck)
==1801==  Address 0xc0c0c0c0c0c0c121 is not stack'd, malloc'd or (recently) free'd
==1801==
Signal (11) SIGSEGV si_code=SI_KERNEL fault addr=(nil)
e2fsck[0x4275c1]
/lib64/libc.so.6(+0x35a00)[0x58f8a00]
/lib64/libext2fs.so.2(ext2fs_mmp_stop+0xd)[0x4e5690d]
e2fsck(fatal_error+0x49)[0x41e409]
e2fsck(e2fsck_run_ext3_journal+0x2d3)[0x41d973]
e2fsck(main+0x6ef)[0x409b8f]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x58e4af5]
e2fsck[0x40bec9]
==1801==
==1801== HEAP SUMMARY:   
==1801==     in use at exit: 3,430 bytes in 77 blocks
==1801==   total heap usage: 333 allocs, 256 frees, 167,418 bytes allocated
==1801==
==1801== LEAK SUMMARY:   
==1801==    definitely lost: 0 bytes in 0 blocks
==1801==    indirectly lost: 0 bytes in 0 blocks
==1801==      possibly lost: 0 bytes in 0 blocks
==1801==    still reachable: 3,430 bytes in 77 blocks
==1801==         suppressed: 0 bytes in 0 blocks
==1801== Rerun with --leak-check=full to see details of leaked memory
==1801==
==1801== For counts of detected and suppressed errors, rerun with: -v
==1801== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 3 from 3)

With e2fsprogs-1.42.9-4.el7 I cannot hit the crash

Set to VERIFIED.

Comment 20 Ludek Smid 2014-06-13 09:19:30 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.