997972 – fsck crash with corrupted file system

Bug 997972 - fsck crash with corrupted file system

Summary: fsck crash with corrupted file system

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	e2fsprogs
Sub Component:
Version:	20
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Eric Sandeen
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	997982
TreeView+	depends on / blocked

Reported:	2013-08-16 16:14 UTC by Alicja Kario
Modified:	2014-02-18 20:02 UTC (History)
CC List:	4 users (show)
Fixed In Version:	e2fsprogs-1.42.9
Clone Of:
Clones:	997982 (view as bug list)
Environment:
Last Closed:	2014-02-18 20:02:27 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Corrupted file system (105.02 KB, application/gzip) 2013-08-16 16:14 UTC, Alicja Kario	no flags	Details
View All

Description Alicja Kario 2013-08-16 16:14:14 UTC

Created attachment 787358 [details]
Corrupted file system

Description of problem:
When checking a corrupted file system (attached) fsck crashes

Version-Release number of selected component (if applicable):
e2fsprogs-1.42.8-1.fc20.x86_64

How reproducible:
Always

Steps to Reproduce:
1. gunzip disk.img.00000000000000000006.wrk_7.gz
2. losetup /dev/loop0 disk.img.00000000000000000006.wrk_7
3. fsck -f -p /dev/loop0

Actual results:
fsck from util-linux 2.23.1
/dev/loop0: recovering journal
fsck.ext4: Bad magic number in super-block while trying to re-open /dev/loop0
Signal (11) SIGSEGV si_code=SEGV_MAPERR fault addr=0x61
fsck.ext4[0x426771]
/lib64/libc.so.6(+0x37300)[0x7f20dc272300]
/lib64/libext2fs.so.2(ext2fs_mmp_stop+0xd)[0x359b02316d]
fsck.ext4(fatal_error+0x42)[0x41db52]
fsck.ext4(e2fsck_run_ext3_journal+0x243)[0x41d0c3]
fsck.ext4(main+0x6ef)[0x409b7f]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f20dc25cfa5]
fsck.ext4[0x40be75]

Expected results:
no crash

Additional info:
File system created by truncating writes to disk to 256 bytes

Comment 1 Alicja Kario 2013-08-16 16:20:25 UTC

The bug is also present on Fedora 18:

e2fsprogs-1.42.5-1.fc18.x86_64

fsck from util-linux 2.22.2
/dev/loop0: recovering journal
fsck.ext4: Bad magic number in super-block while trying to re-open /dev/loop0
Signal (11) SIGSEGV si_code=SEGV_MAPERR fault addr=0x61
fsck.ext4[0x426f20]
/lib64/libc.so.6[0x35e5035cd0]
/lib64/libext2fs.so.2(ext2fs_mmp_stop+0x1c)[0x35e60237cc]
fsck.ext4(fatal_error+0x42)[0x41dc92]
fsck.ext4(e2fsck_run_ext3_journal+0x2d3)[0x41d1d3]
fsck.ext4(main+0x71e)[0x409a3e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x35e5021a05]
fsck.ext4[0x40bc6d]

Comment 2 Eric Sandeen 2013-08-16 16:34:36 UTC

persists upstream too

Comment 3 Eric Sandeen 2013-08-16 16:41:29 UTC

Program received signal SIGSEGV, Segmentation fault.
ext2fs_mmp_stop (fs=0x67c3b0) at mmp.c:374
374		if (!(fs->super->s_feature_incompat & EXT4_FEATURE_INCOMPAT_MMP) ||
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64
(gdb) bt
#0  ext2fs_mmp_stop (fs=0x67c3b0) at mmp.c:374
#1  0x00000000004249ce in fatal_error (ctx=0x67c000, msg=<value optimized out>) at util.c:59
#2  0x0000000000423183 in e2fsck_run_ext3_journal (ctx=0x67c000) at journal.c:973
#3  0x0000000000410574 in main (argc=<value optimized out>, argv=<value optimized out>) at unix.c:1500
(gdb) p fs->super
$1 = (struct ext2_super_block *) 0x0

Comment 4 Eric Sandeen 2013-08-16 17:01:08 UTC

It's trying to stop the multiple mount protection crud, but there's no super set up (because of the bad magic number failure).

This avoids it, at least:

diff --git a/e2fsck/util.c b/e2fsck/util.c
index 9eaf557..18005f4 100644
--- a/e2fsck/util.c
+++ b/e2fsck/util.c
@@ -55,7 +55,7 @@ void fatal_error(e2fsck_t ctx, const char *msg)
 		fprintf (stderr, "e2fsck: %s\n", msg);
 	if (!fs)
 		goto out;
-	if (fs->io) {
+	if (fs->io && fs->super) {
 		ext2fs_mmp_stop(ctx->fs);
 		if (ctx->fs->io->magic == EXT2_ET_MAGIC_IO_CHANNEL)
 			io_channel_flush(ctx->fs->io);


but then you just get:

# e2fsck/e2fsck.static -fy test.img 
e2fsck 1.43-WIP (20-Jun-2013)
test.img: recovering journal
e2fsck/e2fsck.static: Bad magic number in super-block while trying to re-open test.img

test.img: ********** WARNING: Filesystem still has errors **********

and looking for a backup superblock doesn't work; replaying the journal seems to wipe them all out.

What the heck happened to this filesystem? :)  (mounting -o norecovery,ro yields a filesystem with no files in it)

Are you fuzz-testing here?

Comment 5 Eric Sandeen 2013-08-16 17:12:26 UTC

patch sent upstream:

http://marc.info/?l=linux-ext4&m=137667276009490&w=2

Comment 6 Alicja Kario 2013-08-16 17:17:15 UTC

(In reply to Eric Sandeen from comment #4)
> 
> What the heck happened to this filesystem? :)  (mounting -o norecovery,ro
> yields a filesystem with no files in it)
> 
> Are you fuzz-testing here?

more-or-less, I'm working on a file system checker that records all the writes that go the the file system and then replays them one by one (or sector by sector), possibly with errors. In the end I want to have a tool that can simulate any imaginable HDD (or SSD) failure mode.

In this specific case, it was truncating all writes that go to the file system to 256 bytes, so "a bit" of information was lost

So the answer to "What the heck happened to this filesystem?" would be:
It fell into a Bl**Tec blender :)

Comment 7 Eric Sandeen 2013-08-16 17:19:00 UTC

Oof.

Comment 8 Fedora End Of Life 2013-09-16 16:34:46 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 20 development cycle.
Changing version to '20'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora20

Comment 9 Eric Sandeen 2014-02-18 20:02:27 UTC

Fixed in e2fsprogs-1.42.9

Note You need to log in before you can comment on or make changes to this bug.