496982 – resize2fs is corrupting ext4 filesystems during livecd creation

Bug 496982 - resize2fs is corrupting ext4 filesystems during livecd creation

Summary: resize2fs is corrupting ext4 filesystems during livecd creation

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	e2fsprogs
Sub Component:
Version:	11
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Eric Sandeen
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	516996
TreeView+	depends on / blocked

Reported:	2009-04-21 22:11 UTC by Eric Sandeen
Modified:	2013-01-10 05:10 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-10-06 14:32:36 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Eric Sandeen 2009-04-21 22:11:06 UTC

During the livecd-creator process, the resizes of the ext4 filesystem are corrupting it.  I added post-resize fscks and got:

e2fsck 1.41.4 (27-Jan-2009)
Pass 1: Checking inodes, blocks, and sizes
Inode 170 has an invalid extent
	(logical block 6144, invalid physical block 753664, len 1)
Clear? yes

Inode 170, i_blocks is 53560, should be 53552.  Fix? yes

Inode 35435, i_blocks is 448, should be 424.  Fix? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(539069--539071)
Fix? yes

Free blocks count wrong for group #16 (0, counted=3).
Fix? yes

Free blocks count wrong (0, counted=3).
Fix? yes


I have patches from ted that fix this; we will certainly want them for F11.

Jeremy, I'd highly suggest adding post-resize fscks (or at least a final, pre-munging-into-iso fsck) to the livecd process?

Comment 1 Eric Sandeen 2009-04-23 17:07:05 UTC

In rawhide & F11 now (and tagged for F11)

Comment 2 Jonathan Underwood 2009-05-12 00:20:09 UTC

Do these patches (or their equivalent) also need adding to livecd-iso-to-disk. The reason I ask is I wonder if this is the problem that is causing BZ #499149 ?

Comment 3 Jeremy Katz 2009-05-12 01:45:41 UTC

(In reply to comment #2)
> Do these patches (or their equivalent) also need adding to livecd-iso-to-disk.
> The reason I ask is I wonder if this is the problem that is causing BZ #499149

livecd-iso-to-disk doesn't do any resizing, just a copy.  So there shouldn't be a need for an explicit fsck there.

Comment 4 Eric Sandeen 2009-05-12 03:47:23 UTC

Ok I have a reproducer now.

http://people.redhat.com/esandeen/livecd-creator-imagefile.bz2

# resize2fs <that image> 578639

fails to resize ... and corrupts it :(

This happens during the binary chop of searching for a minimum size.

Jeremy, remind me again why you're not using the -M option you wanted so badly? :)

-Eric

Comment 5 Eric Sandeen 2009-05-12 03:54:53 UTC

So, this looks really random.  Partly due to pdflush timing on the original image population, most likely, and get_random_bytes() in the inode allocation code (!)

resize2fs really should not be finding itself in a point of no return once it hits ENOSPC, but auditing that is probably going to be really tricky.

I know this is listed as a blocker, but in reality, if you hit it, and you run the tool again, it will probably pass just fine (and really will be fine).  Given how many times you need a good livecd before F11 ships ... once, right?  I wonder if we can live with "oops, got a bad one, try again"

Comment 6 Jonathan Underwood 2009-05-12 11:15:22 UTC

But doesn't this also mean that a user who runs resize2fs on his installed machine for whatever reason faces risk of data loss?

Comment 7 Eric Sandeen 2009-05-12 13:29:44 UTC

If the user tries to shrink it to something extremely small then yes.  resize2fs -M should provide a safe way to do this, though...

Comment 8 Eric Sandeen 2009-05-12 13:36:28 UTC

(I don't mean it shouldn't be fixed, of course, I just wonder about blocker status)

Comment 9 Chris Shoemaker 2009-05-15 21:06:12 UTC

Is this the cause of #499452 ?

Comment 10 Eric Sandeen 2009-05-15 21:15:34 UTC

Gr, I confused these two bugs a bit I think.

This one was for a pretty repeatable corruption which I put patches in for, and it was mostly fixed.

But then bug #499452 is for a much harder to hit, more random corruption during the resize phase of livecd creation...

After about comment #4 in this bug they should have been on bug #499452 ... sigh.

Comment 11 Bug Zapper 2009-06-09 14:21:37 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 12 Eric Sandeen 2009-10-06 14:32:36 UTC

MODIFIED->CLOSED magic doesn't work for rawhide I guess.  :)

Note You need to log in before you can comment on or make changes to this bug.