Bug 499452

Summary: error found with fsck after resizing during live image creation
Product: [Fedora] Fedora Reporter: Jeremy Katz <katzj>
Component: e2fsprogsAssignee: Eric Sandeen <esandeen>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: atigro, bruno, c.shoemaker, dcantrell, dpierce, drfudgeboy, esandeen, kwizart, kzak, oliver, sebastian, traxtopel
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 505339 (view as bug list) Environment:
Last Closed: 2009-05-20 21:41:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 446451, 505339    
Attachments:
Description Flags
test patch
none
Updated
none
resize2fs fixes
none
livecd-creator output with --fstype=ext3
none
livecd-creator output with --fstype=ext4 none

Description Jeremy Katz 2009-05-06 17:59:08 UTC
Building a live image with today's rawhide failed the post-resize fsck on one of two live image builds.

[katzj@erebor livecd]$ rpm -q e2fsprogs
e2fsprogs-0:1.41.4-8.fc11.x86_64
[katzj@erebor livecd]$ uname -a
Linux erebor.bos.redhat.com 2.6.29.2-126.fc11.x86_64 #1 SMP Mon May 4 04:46:15 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux


fsck output:
e2fsck 1.41.4 (27-Jan-2009)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
rawhide-KDE-i686: 95945/196608 files (0.1% non-contiguous), 537603/786432 blocks
e2fsck 1.41.4 (27-Jan-2009)
Pass 1: Checking inodes, blocks, and sizes
Inode 55906, i_blocks is 30328, should be 30304.  Fix? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(530429--530431)
Fix? yes

Free blocks count wrong for group #16 (0, counted=3).
Fix? yes

Free blocks count wrong (0, counted=3).
Fix? yes


rawhide-KDE-i686: ***** FILE SYSTEM WAS MODIFIED *****
rawhide-KDE-i686: 95945/139264 files (0.9% non-contiguous), 534138/534141 blocks

Comment 1 Eric Sandeen 2009-05-06 21:59:14 UTC
Jeremy, if you can reliably reproduce this, it'd be superdupereduper great if you could exit out of the livecd creation just before the resize that breaks it, and create an e2iamge of the filesystem that's about to be resized .... then we'll have the testcase we need ...

if it's not reliable for you I guess I'll just build livecds for a while 'til I hit it ...? :)

-Eric

Comment 2 Eric Sandeen 2009-05-06 22:29:49 UTC
I just did a rawhide livecd on x86_64 and it all worked, though my mirror may be a day behind ...

Comment 3 Sebastian Dziallas 2009-05-11 16:25:02 UTC
I guess I've something reproducible here, doing builds for OLPC: Every build I tried failed here on a Rawhide system. Note that e2fsprogs seems to be up2date...

[sebastian@localhost ~]$ rpm -q e2fsprogs
e2fsprogs-1.41.4-8.fc11.i586

Well, the issue occurs using the kickstart files (specifically, you'll need to build against the olpc-desktop.ks, which includes the olpc-base.ks) in this GIT repo:

http://dev.laptop.org/git/projects/fedora-xo/

In the base kickstart, ext4 usage has not been defined. Downgrading to livecd-tools-023-1.fc11.i586 again solves this temporarily.

Sorry, I'm a bit short of time right now - preparing for travelling - so I suppose I won't be able to help with an e2image.

--Sebastian

Comment 4 Eric Sandeen 2009-05-11 16:35:57 UTC
Sebastian, thanks.  So those kickstart files, unmodified, do it for you?  (if you made any modifications can you attach?)  I'll give it a shot.

-Eric

Comment 5 Eric Sandeen 2009-05-11 17:25:38 UTC
Testing those kickstart images as-is, now, thanks.

-Eric

Comment 6 Eric Sandeen 2009-05-11 17:36:43 UTC
.... and, they worked just fine :(

Comment 7 Eric Sandeen 2009-05-11 18:46:50 UTC
I've sent out a general plea for folks to run with this patch:

--- /usr/lib/python2.6/site-packages/imgcreate/fs.py.orig	2009-05-06 13:09:02.000000000 -0500
+++ /usr/lib/python2.6/site-packages/imgcreate/fs.py	2009-05-11 13:33:25.767280330 -0500
@@ -51,6 +51,8 @@
 
 def resize2fs(fs, size):
     dev_null = os.open("/dev/null", os.O_WRONLY)
+    logging.debug("Creating pre-resize image of %s" % fs)
+    subprocess.call(["/sbin/e2image", "-r", fs, "/tmp/livecd-creator-imagefile"])
     try:
         return subprocess.call(["/sbin/resize2fs", fs, "%sK" % (size / 1024,)],
                                stdout = dev_null, stderr = dev_null)

so that if fsck finds post-resize errors, we'll have a pre-resize image to reproduce with ...

Thanks,
-Eric

Comment 8 Eric Sandeen 2009-05-11 18:48:14 UTC
Created attachment 343490 [details]
test patch

Ugh, should know better than to paste a patch.

Comment 9 Eric Sandeen 2009-05-11 19:07:03 UTC
Created attachment 343492 [details]
Updated

Comment 10 Traxtopel 2009-05-11 19:51:59 UTC
Eric,
on my machine creating a custom image using ext4, works fine.
Using ext3 I see the issue. fsck runs creates a /tmp/livecd-creator-imagefile file and seems to go in to a loop?

umount: /var/tmp/imgcreate-1EOGRM/install_root: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
ioctl: LOOP_CLR_FD: Device or resource busy
umount: /var/tmp/imgcreate-1EOGRM/install_root: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
e2fsck 1.41.4 (27-Jan-2009)
OC-Fedora-11-Net: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Unattached inode 32912
Connect to /lost+found? yes

WARNING: PROGRAMMING BUG IN E2FSCK!
	OR SOME BONEHEAD (YOU) IS CHECKING A MOUNTED (LIVE) FILESYSTEM.
inode_link_info[32912] is 2, inode.i_links_count is 1.  They should be the same!
Inode 32912 ref count is 1, should be 1.  Fix? yes

Pass 5: Checking group summary information
Block bitmap differences:  +140162
Fix? yes

Free blocks count wrong for group #18 (15114, counted=15383).
Fix? yes

Free blocks count wrong (336848, counted=337117).
Fix? yes

Inode bitmap differences:  +32912
Fix? yes

Free inodes count wrong for group #18 (3056, counted=3059).
Fix? yes

Free inodes count wrong (122026, counted=122029).
Fix? yes


OC-Fedora-11-Net: ***** FILE SYSTEM WAS MODIFIED *****
OC-Fedora-11-Net: 74579/196608 files (0.4% non-contiguous), 449315/786432 blocks
e2image 1.41.4 (27-Jan-2009)
e2image 1.41.4 (27-Jan-2009)
e2image 1.41.4 (27-Jan-2009)
e2image 1.41.4 (27-Jan-2009)
e2image 1.41.4 (27-Jan-2009)
e2image 1.41.4 (27-Jan-2009)
e2image 1.41.4 (27-Jan-2009)

Comment 11 Traxtopel 2009-05-12 07:19:50 UTC
So on my system, I added some kill scripts to postinstall.
cupsd for some reason was still running, which did not let the filesystem unmount. Hence the fsck failed.
In the past this worked, irrespective if the fs unmounted or not, also with ext4 it works.

Comment 12 Jeremy Katz 2009-05-15 19:49:04 UTC
*** Bug 501069 has been marked as a duplicate of this bug. ***

Comment 13 Eric Sandeen 2009-05-15 21:19:24 UTC
Wow, it's been quite a week.  I was commenting on the wrong bug.   here we go...

-------  Comment #4 From  Eric Sandeen (esandeen)  2009-05-11 23:47:23 EDT

Ok I have a reproducer now.

http://people.redhat.com/esandeen/livecd-creator-imagefile.bz2

# resize2fs <that image> 578639

fails to resize ... and corrupts it :(

This happens during the binary chop of searching for a minimum size.

Jeremy, remind me again why you're not using the -M option you wanted so badly?
:)

-Eric  

------- Comment #5 From Eric Sandeen (esandeen) 2009-05-11 23:54:53 EDT

So, this looks really random.  Partly due to pdflush timing on the original
image population, most likely, and get_random_bytes() in the inode allocation
code (!)

resize2fs really should not be finding itself in a point of no return once it
hits ENOSPC, but auditing that is probably going to be really tricky.

I know this is listed as a blocker, but in reality, if you hit it, and you run
the tool again, it will probably pass just fine (and really will be fine). 
Given how many times you need a good livecd before F11 ships ... once, right? 
I wonder if we can live with "oops, got a bad one, try again"  

------- Comment #6 From Jonathan Underwood (jonathan.underwood) 2009-05-12 07:15:22 EDT

But doesn't this also mean that a user who runs resize2fs on his installed
machine for whatever reason faces risk of data loss?  

------- Comment #7 From Eric Sandeen (esandeen) 2009-05-12 09:29:44 EDT

If the user tries to shrink it to something extremely small then yes. 
resize2fs -M should provide a safe way to do this, though...  

------- Comment #8 From Eric Sandeen (esandeen) 2009-05-12 09:36:28 EDT

(I don't mean it shouldn't be fixed, of course, I just wonder about blocker
status)

Comment 14 Jesse Keating 2009-05-18 20:42:08 UTC
I agree with Eric, we'd take a fix, but not a blocker.

Comment 15 Eric Sandeen 2009-05-18 20:45:30 UTC
Lest anyone think I'm a total slacker, I am looking for a fix as we speak ;)

Comment 16 Eric Sandeen 2009-05-18 21:39:05 UTC
And I think I have a fix, sent to ext4 list today; ideally I'd like a little review but I think I can sneak this one in.

Comment 17 Jeremy Katz 2009-05-19 18:11:26 UTC
*** Bug 501436 has been marked as a duplicate of this bug. ***

Comment 18 Bruno Wolff III 2009-05-19 18:26:56 UTC
So for custom livedvd's can I just use part --fstype=ext3 to avoid this for the next couple of days?

Comment 19 Eric Sandeen 2009-05-19 18:41:02 UTC
I assume that'll work but in practice I almost never hit the bug on ext4 either.  If the livecd compose works, you're fine.  If it fails, run it again, and you'll probably succeed.  :)  I'm testing fixes for this right now, though.

Comment 20 Bruno Wolff III 2009-05-19 19:11:46 UTC
I am hitting the problem pretty consistantly (I think my last half dozen or so tries over two different machines with different arches have all failed), so I'll use ext3 for the DVDs I want to have for this week.

Comment 21 Eric Sandeen 2009-05-19 19:27:19 UTC
Bruno, would you like to test a couple e2fsprogs patches that should make it work on ext4? :)

Comment 22 Eric Sandeen 2009-05-19 20:09:15 UTC
Created attachment 344676 [details]
resize2fs fixes

This patch doesn't fix the underlying bug - I have a patch for that but I feel it is a bit too invasive for this late date in the cycle.

Instead, it fixes the min-blocks calculation, and then enforces the minimum resize value to that.  Before, we got into trouble with ENOSPC and improper error handling when we tried to resize to something too small.  Failing below the self-calculated minimum should avoid that situation, which I'll fix in rawhide.

Anyway, try livecd building using e2fsprogs w/this patch applied, it should suffice.

Thanks,
-Eric

Comment 23 Bruno Wolff III 2009-05-19 21:20:00 UTC
Yes, I'll try it tonight.
The --fstype=ext3 didn't work for me either and I am going to want some custom DVDs on Thursday, so I'd like to help things work. It'll be a few hours before I get to try it.

Comment 24 Nicolas Chauvet (kwizart) 2009-05-19 22:41:18 UTC
The new push of updates from today (where e2fsprogs is 1.41.4-9.fc11), fixed the exit on fsck resize I've experienced previously.

Just a note to say that In my case, it was not random at all. This failure occurred "every single time" I've tried to build a Live iso earlier today (either I was using ext3/ext4). The last time I've succeed to build a rawhide Live iso was around 14 May. (it was last time I've tried, before today).

Comment 25 Eric Sandeen 2009-05-20 00:18:03 UTC
Nicholas, that's interesting because 1.41.4-9.fc11 fixed a very narrow corruption case, nothing at all related to resize ... :)

Interesting that it's so much easier for you to hit too.  I had to run in a loop 'til I could hit it, and it took a while.  It'd be great if you can try the patch I attached as well.

If it's easier for people, this:

http://people.redhat.com/sandeen/ext4/e2fsprogs-1.41.4-9.fc11.ers1.src.rpm

has the patch in it already, you can just rebuild it.

Comment 26 Nicolas Chauvet (kwizart) 2009-05-20 00:29:31 UTC
(In reply to comment #25)
> Nicholas, that's interesting because 1.41.4-9.fc11 fixed a very narrow
> corruption case, nothing at all related to resize ... :)
Actually, shoe on IRC suggested that was caused by pykickstart.
(which was updated also).

> Interesting that it's so much easier for you to hit too.  I had to run in a
> loop 'til I could hit it, and it took a while.  It'd be great if you can try
> the patch I attached as well.
> 
If it's easier for people, this:
> http://people.redhat.com/sandeen/ext4/e2fsprogs-1.41.4-9.fc11.ers1.src.rpm
^^ 404 
I will try later.

Comment 27 Eric Sandeen 2009-05-20 00:38:48 UTC
Sigh, after 2 years (or is it 3?) I'm clearly still not quite used to how RH made me change my name ;)

try:

http://people.redhat.com/esandeen/ext4/e2fsprogs-1.41.4-9.fc11.ers1.src.rpm

Comment 28 Traxtopel 2009-05-20 05:02:56 UTC
eric, the updated e2fsprogs is allowing me to build again with ext3.

Comment 29 Eric Sandeen 2009-05-20 16:37:54 UTC
re: comment #28, hrm, I would not have expected that.  Nothing I changed should have really affected ext3, I thought.  Your error was w.r.t. the loop device staying mounted, as in comment #10 right?

Has anyone been able to test?  I'd also like to know that the livecd is coming out to an ok size ...

Thanks,
-Eric

Comment 30 Bruno Wolff III 2009-05-20 16:49:06 UTC
Created attachment 344844 [details]
livecd-creator output with --fstype=ext3

I looked more carefully and the ext3 test did have fsck report errors, but the iso was produced. For the ext4 case, fsck also reported an error but the iso creation failed.

Comment 31 Bruno Wolff III 2009-05-20 16:49:51 UTC
Created attachment 344845 [details]
livecd-creator output with --fstype=ext4

Comment 32 Eric Sandeen 2009-05-20 16:53:32 UTC
Bruno, you seem to be running into a different problem:

umount: /home/livecd/tmp/imgcreate-JvImNZ/install_root: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
ioctl: LOOP_CLR_FD: Device or resource busy
umount: /home/livecd/tmp/imgcreate-JvImNZ/install_root: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
e2fsck 1.41.4 (27-Jan-2009)
Fedora-11-Live-g: recovering journal
Clearing orphaned inode 352265 (uid=0, gid=0, mode=0100644, size=1318912)
Clearing orphaned inode 352263 (uid=0, gid=0, mode=0100644, size=24576)
Clearing orphaned inode 352264 (uid=0, gid=0, mode=0100644, size=229376)
Clearing orphaned inode 352266 (uid=0, gid=0, mode=0100644, size=753664)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

Fedora-11-Live-g: ***** FILE SYSTEM WAS MODIFIED *****
...
umount: /home/livecd/tmp/imgcreate-JvImNZ/install_root: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30974) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery
rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30974) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery


The image wasn't unmounted, it seems, or at least not cleanly, so e2fsck replayed the journal & cleared orphan inodes.  Not really related to the corruption this bug was opened for, but indicative of some other problem I think.

-Eric

Comment 33 Jeremy Katz 2009-05-20 17:07:21 UTC
(In reply to comment #30)
> Created an attachment (id=344844) [details]
> livecd-creator output with --fstype=ext3
> 
> I looked more carefully and the ext3 test did have fsck report errors, but the
> iso was produced. For the ext4 case, fsck also reported an error but the iso
> creation failed.  

Bruno -- you're hitting bug 501606 now I believe.

Comment 34 Bruno Wolff III 2009-05-20 17:19:37 UTC
I found that one earlier today and am watching it to. It was hard for me to tell which problems were caused by which bugs. Having 3 different updates break different parts of livecd-tools in a short period of time was confusing. (pykickstart also broke things.)

Comment 35 Bruno Wolff III 2009-05-20 19:27:26 UTC
I retested with the downgraded version of yum and things worked OK. So the test version of e2fsprogs is working OK.

Comment 36 Eric Sandeen 2009-05-20 21:41:56 UTC
Built & tagged, e2fsprogs-1.41.4-10.f11.  Bowed 3 times to the release engineering folks, should make F11 :)

-Eric

Comment 37 James Heather 2009-06-11 12:31:07 UTC
I'm afraid I'm still hitting this, on a fully up to date F11 system, including e2fsprogs-1.41.4-10.fc11. It's happening with both ext3 and ext4. The log file produced reads:

Extending sparse file /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img to 12884901888
Losetup add /dev/loop0 mapping to /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img
Formating ext3 filesystem on /dev/loop0
Tuning filesystem on /dev/loop0
Extending sparse file /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img to 12884901888
Mounting /dev/loop0 at /var/tmp/imgcreate-e04Z0e/install_root
No such package *debuginfo to remove
Unmounting directory /var/tmp/imgcreate-e04Z0e/install_root
Losetup remove /dev/loop0
Checking filesystem /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img
Checking filesystem /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img
Error creating Live CD : fsck returned an error!

I realise this doesn't contain all the info you might need. I ran livecd-creator with -v -d --logfile=blah, in an attempt to capture everything, but that's all I got.

By the way, while the resize was running, I typed 'mount' in a separate window, and the filesystem wasn't mounted.

Comment 38 Eric Sandeen 2009-06-11 14:40:31 UTC
argh, ok.  Thanks for the report.
 
I'll send a patch to save an e2image before each fsck so that we can examine it.

Jeremy, we should probably be logging fsck output at least, so we have some notion of what goes wrong if it does.

Actually I'm going to clone this bug to a new one, it's likely a different resize problem, and this one is closed.

-Eric