Bug 997972 - fsck crash with corrupted file system
Summary: fsck crash with corrupted file system
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: e2fsprogs
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Eric Sandeen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 997982
TreeView+ depends on / blocked
 
Reported: 2013-08-16 16:14 UTC by Hubert Kario
Modified: 2014-02-18 20:02 UTC (History)
4 users (show)

Fixed In Version: e2fsprogs-1.42.9
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 997982 (view as bug list)
Environment:
Last Closed: 2014-02-18 20:02:27 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Corrupted file system (105.02 KB, application/gzip)
2013-08-16 16:14 UTC, Hubert Kario
no flags Details

Description Hubert Kario 2013-08-16 16:14:14 UTC
Created attachment 787358 [details]
Corrupted file system

Description of problem:
When checking a corrupted file system (attached) fsck crashes

Version-Release number of selected component (if applicable):
e2fsprogs-1.42.8-1.fc20.x86_64

How reproducible:
Always

Steps to Reproduce:
1. gunzip disk.img.00000000000000000006.wrk_7.gz
2. losetup /dev/loop0 disk.img.00000000000000000006.wrk_7
3. fsck -f -p /dev/loop0

Actual results:
fsck from util-linux 2.23.1
/dev/loop0: recovering journal
fsck.ext4: Bad magic number in super-block while trying to re-open /dev/loop0
Signal (11) SIGSEGV si_code=SEGV_MAPERR fault addr=0x61
fsck.ext4[0x426771]
/lib64/libc.so.6(+0x37300)[0x7f20dc272300]
/lib64/libext2fs.so.2(ext2fs_mmp_stop+0xd)[0x359b02316d]
fsck.ext4(fatal_error+0x42)[0x41db52]
fsck.ext4(e2fsck_run_ext3_journal+0x243)[0x41d0c3]
fsck.ext4(main+0x6ef)[0x409b7f]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f20dc25cfa5]
fsck.ext4[0x40be75]

Expected results:
no crash

Additional info:
File system created by truncating writes to disk to 256 bytes

Comment 1 Hubert Kario 2013-08-16 16:20:25 UTC
The bug is also present on Fedora 18:

e2fsprogs-1.42.5-1.fc18.x86_64

fsck from util-linux 2.22.2
/dev/loop0: recovering journal
fsck.ext4: Bad magic number in super-block while trying to re-open /dev/loop0
Signal (11) SIGSEGV si_code=SEGV_MAPERR fault addr=0x61
fsck.ext4[0x426f20]
/lib64/libc.so.6[0x35e5035cd0]
/lib64/libext2fs.so.2(ext2fs_mmp_stop+0x1c)[0x35e60237cc]
fsck.ext4(fatal_error+0x42)[0x41dc92]
fsck.ext4(e2fsck_run_ext3_journal+0x2d3)[0x41d1d3]
fsck.ext4(main+0x71e)[0x409a3e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x35e5021a05]
fsck.ext4[0x40bc6d]

Comment 2 Eric Sandeen 2013-08-16 16:34:36 UTC
persists upstream too

Comment 3 Eric Sandeen 2013-08-16 16:41:29 UTC
Program received signal SIGSEGV, Segmentation fault.
ext2fs_mmp_stop (fs=0x67c3b0) at mmp.c:374
374		if (!(fs->super->s_feature_incompat & EXT4_FEATURE_INCOMPAT_MMP) ||
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64
(gdb) bt
#0  ext2fs_mmp_stop (fs=0x67c3b0) at mmp.c:374
#1  0x00000000004249ce in fatal_error (ctx=0x67c000, msg=<value optimized out>) at util.c:59
#2  0x0000000000423183 in e2fsck_run_ext3_journal (ctx=0x67c000) at journal.c:973
#3  0x0000000000410574 in main (argc=<value optimized out>, argv=<value optimized out>) at unix.c:1500
(gdb) p fs->super
$1 = (struct ext2_super_block *) 0x0

Comment 4 Eric Sandeen 2013-08-16 17:01:08 UTC
It's trying to stop the multiple mount protection crud, but there's no super set up (because of the bad magic number failure).

This avoids it, at least:

diff --git a/e2fsck/util.c b/e2fsck/util.c
index 9eaf557..18005f4 100644
--- a/e2fsck/util.c
+++ b/e2fsck/util.c
@@ -55,7 +55,7 @@ void fatal_error(e2fsck_t ctx, const char *msg)
 		fprintf (stderr, "e2fsck: %s\n", msg);
 	if (!fs)
 		goto out;
-	if (fs->io) {
+	if (fs->io && fs->super) {
 		ext2fs_mmp_stop(ctx->fs);
 		if (ctx->fs->io->magic == EXT2_ET_MAGIC_IO_CHANNEL)
 			io_channel_flush(ctx->fs->io);


but then you just get:

# e2fsck/e2fsck.static -fy test.img 
e2fsck 1.43-WIP (20-Jun-2013)
test.img: recovering journal
e2fsck/e2fsck.static: Bad magic number in super-block while trying to re-open test.img

test.img: ********** WARNING: Filesystem still has errors **********

and looking for a backup superblock doesn't work; replaying the journal seems to wipe them all out.

What the heck happened to this filesystem? :)  (mounting -o norecovery,ro yields a filesystem with no files in it)

Are you fuzz-testing here?

Comment 5 Eric Sandeen 2013-08-16 17:12:26 UTC
patch sent upstream:

http://marc.info/?l=linux-ext4&m=137667276009490&w=2

Comment 6 Hubert Kario 2013-08-16 17:17:15 UTC
(In reply to Eric Sandeen from comment #4)
> 
> What the heck happened to this filesystem? :)  (mounting -o norecovery,ro
> yields a filesystem with no files in it)
> 
> Are you fuzz-testing here?

more-or-less, I'm working on a file system checker that records all the writes that go the the file system and then replays them one by one (or sector by sector), possibly with errors. In the end I want to have a tool that can simulate any imaginable HDD (or SSD) failure mode.

In this specific case, it was truncating all writes that go to the file system to 256 bytes, so "a bit" of information was lost

So the answer to "What the heck happened to this filesystem?" would be:
It fell into a Bl**Tec blender :)

Comment 7 Eric Sandeen 2013-08-16 17:19:00 UTC
Oof.

Comment 8 Fedora End Of Life 2013-09-16 16:34:46 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 20 development cycle.
Changing version to '20'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora20

Comment 9 Eric Sandeen 2014-02-18 20:02:27 UTC
Fixed in e2fsprogs-1.42.9


Note You need to log in before you can comment on or make changes to this bug.