Bug 499452
Summary: | error found with fsck after resizing during live image creation | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jeremy Katz <katzj> | ||||||||||||
Component: | e2fsprogs | Assignee: | Eric Sandeen <esandeen> | ||||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||
Priority: | low | ||||||||||||||
Version: | rawhide | CC: | atigro, bruno, c.shoemaker, dcantrell, dpierce, drfudgeboy, esandeen, kwizart, kzak, oliver, sebastian, traxtopel | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | All | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | |||||||||||||||
: | 505339 (view as bug list) | Environment: | |||||||||||||
Last Closed: | 2009-05-20 21:41:56 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 446451, 505339 | ||||||||||||||
Attachments: |
|
Description
Jeremy Katz
2009-05-06 17:59:08 UTC
Jeremy, if you can reliably reproduce this, it'd be superdupereduper great if you could exit out of the livecd creation just before the resize that breaks it, and create an e2iamge of the filesystem that's about to be resized .... then we'll have the testcase we need ... if it's not reliable for you I guess I'll just build livecds for a while 'til I hit it ...? :) -Eric I just did a rawhide livecd on x86_64 and it all worked, though my mirror may be a day behind ... I guess I've something reproducible here, doing builds for OLPC: Every build I tried failed here on a Rawhide system. Note that e2fsprogs seems to be up2date... [sebastian@localhost ~]$ rpm -q e2fsprogs e2fsprogs-1.41.4-8.fc11.i586 Well, the issue occurs using the kickstart files (specifically, you'll need to build against the olpc-desktop.ks, which includes the olpc-base.ks) in this GIT repo: http://dev.laptop.org/git/projects/fedora-xo/ In the base kickstart, ext4 usage has not been defined. Downgrading to livecd-tools-023-1.fc11.i586 again solves this temporarily. Sorry, I'm a bit short of time right now - preparing for travelling - so I suppose I won't be able to help with an e2image. --Sebastian Sebastian, thanks. So those kickstart files, unmodified, do it for you? (if you made any modifications can you attach?) I'll give it a shot. -Eric Testing those kickstart images as-is, now, thanks. -Eric .... and, they worked just fine :( I've sent out a general plea for folks to run with this patch: --- /usr/lib/python2.6/site-packages/imgcreate/fs.py.orig 2009-05-06 13:09:02.000000000 -0500 +++ /usr/lib/python2.6/site-packages/imgcreate/fs.py 2009-05-11 13:33:25.767280330 -0500 @@ -51,6 +51,8 @@ def resize2fs(fs, size): dev_null = os.open("/dev/null", os.O_WRONLY) + logging.debug("Creating pre-resize image of %s" % fs) + subprocess.call(["/sbin/e2image", "-r", fs, "/tmp/livecd-creator-imagefile"]) try: return subprocess.call(["/sbin/resize2fs", fs, "%sK" % (size / 1024,)], stdout = dev_null, stderr = dev_null) so that if fsck finds post-resize errors, we'll have a pre-resize image to reproduce with ... Thanks, -Eric Created attachment 343490 [details]
test patch
Ugh, should know better than to paste a patch.
Created attachment 343492 [details]
Updated
Eric, on my machine creating a custom image using ext4, works fine. Using ext3 I see the issue. fsck runs creates a /tmp/livecd-creator-imagefile file and seems to go in to a loop? umount: /var/tmp/imgcreate-1EOGRM/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) ioctl: LOOP_CLR_FD: Device or resource busy umount: /var/tmp/imgcreate-1EOGRM/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) e2fsck 1.41.4 (27-Jan-2009) OC-Fedora-11-Net: recovering journal Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached inode 32912 Connect to /lost+found? yes WARNING: PROGRAMMING BUG IN E2FSCK! OR SOME BONEHEAD (YOU) IS CHECKING A MOUNTED (LIVE) FILESYSTEM. inode_link_info[32912] is 2, inode.i_links_count is 1. They should be the same! Inode 32912 ref count is 1, should be 1. Fix? yes Pass 5: Checking group summary information Block bitmap differences: +140162 Fix? yes Free blocks count wrong for group #18 (15114, counted=15383). Fix? yes Free blocks count wrong (336848, counted=337117). Fix? yes Inode bitmap differences: +32912 Fix? yes Free inodes count wrong for group #18 (3056, counted=3059). Fix? yes Free inodes count wrong (122026, counted=122029). Fix? yes OC-Fedora-11-Net: ***** FILE SYSTEM WAS MODIFIED ***** OC-Fedora-11-Net: 74579/196608 files (0.4% non-contiguous), 449315/786432 blocks e2image 1.41.4 (27-Jan-2009) e2image 1.41.4 (27-Jan-2009) e2image 1.41.4 (27-Jan-2009) e2image 1.41.4 (27-Jan-2009) e2image 1.41.4 (27-Jan-2009) e2image 1.41.4 (27-Jan-2009) e2image 1.41.4 (27-Jan-2009) So on my system, I added some kill scripts to postinstall. cupsd for some reason was still running, which did not let the filesystem unmount. Hence the fsck failed. In the past this worked, irrespective if the fs unmounted or not, also with ext4 it works. *** Bug 501069 has been marked as a duplicate of this bug. *** Wow, it's been quite a week. I was commenting on the wrong bug. here we go... ------- Comment #4 From Eric Sandeen (esandeen) 2009-05-11 23:47:23 EDT Ok I have a reproducer now. http://people.redhat.com/esandeen/livecd-creator-imagefile.bz2 # resize2fs <that image> 578639 fails to resize ... and corrupts it :( This happens during the binary chop of searching for a minimum size. Jeremy, remind me again why you're not using the -M option you wanted so badly? :) -Eric ------- Comment #5 From Eric Sandeen (esandeen) 2009-05-11 23:54:53 EDT So, this looks really random. Partly due to pdflush timing on the original image population, most likely, and get_random_bytes() in the inode allocation code (!) resize2fs really should not be finding itself in a point of no return once it hits ENOSPC, but auditing that is probably going to be really tricky. I know this is listed as a blocker, but in reality, if you hit it, and you run the tool again, it will probably pass just fine (and really will be fine). Given how many times you need a good livecd before F11 ships ... once, right? I wonder if we can live with "oops, got a bad one, try again" ------- Comment #6 From Jonathan Underwood (jonathan.underwood) 2009-05-12 07:15:22 EDT But doesn't this also mean that a user who runs resize2fs on his installed machine for whatever reason faces risk of data loss? ------- Comment #7 From Eric Sandeen (esandeen) 2009-05-12 09:29:44 EDT If the user tries to shrink it to something extremely small then yes. resize2fs -M should provide a safe way to do this, though... ------- Comment #8 From Eric Sandeen (esandeen) 2009-05-12 09:36:28 EDT (I don't mean it shouldn't be fixed, of course, I just wonder about blocker status) I agree with Eric, we'd take a fix, but not a blocker. Lest anyone think I'm a total slacker, I am looking for a fix as we speak ;) And I think I have a fix, sent to ext4 list today; ideally I'd like a little review but I think I can sneak this one in. *** Bug 501436 has been marked as a duplicate of this bug. *** So for custom livedvd's can I just use part --fstype=ext3 to avoid this for the next couple of days? I assume that'll work but in practice I almost never hit the bug on ext4 either. If the livecd compose works, you're fine. If it fails, run it again, and you'll probably succeed. :) I'm testing fixes for this right now, though. I am hitting the problem pretty consistantly (I think my last half dozen or so tries over two different machines with different arches have all failed), so I'll use ext3 for the DVDs I want to have for this week. Bruno, would you like to test a couple e2fsprogs patches that should make it work on ext4? :) Created attachment 344676 [details]
resize2fs fixes
This patch doesn't fix the underlying bug - I have a patch for that but I feel it is a bit too invasive for this late date in the cycle.
Instead, it fixes the min-blocks calculation, and then enforces the minimum resize value to that. Before, we got into trouble with ENOSPC and improper error handling when we tried to resize to something too small. Failing below the self-calculated minimum should avoid that situation, which I'll fix in rawhide.
Anyway, try livecd building using e2fsprogs w/this patch applied, it should suffice.
Thanks,
-Eric
Yes, I'll try it tonight. The --fstype=ext3 didn't work for me either and I am going to want some custom DVDs on Thursday, so I'd like to help things work. It'll be a few hours before I get to try it. The new push of updates from today (where e2fsprogs is 1.41.4-9.fc11), fixed the exit on fsck resize I've experienced previously. Just a note to say that In my case, it was not random at all. This failure occurred "every single time" I've tried to build a Live iso earlier today (either I was using ext3/ext4). The last time I've succeed to build a rawhide Live iso was around 14 May. (it was last time I've tried, before today). Nicholas, that's interesting because 1.41.4-9.fc11 fixed a very narrow corruption case, nothing at all related to resize ... :) Interesting that it's so much easier for you to hit too. I had to run in a loop 'til I could hit it, and it took a while. It'd be great if you can try the patch I attached as well. If it's easier for people, this: http://people.redhat.com/sandeen/ext4/e2fsprogs-1.41.4-9.fc11.ers1.src.rpm has the patch in it already, you can just rebuild it. (In reply to comment #25) > Nicholas, that's interesting because 1.41.4-9.fc11 fixed a very narrow > corruption case, nothing at all related to resize ... :) Actually, shoe on IRC suggested that was caused by pykickstart. (which was updated also). > Interesting that it's so much easier for you to hit too. I had to run in a > loop 'til I could hit it, and it took a while. It'd be great if you can try > the patch I attached as well. > If it's easier for people, this: > http://people.redhat.com/sandeen/ext4/e2fsprogs-1.41.4-9.fc11.ers1.src.rpm ^^ 404 I will try later. Sigh, after 2 years (or is it 3?) I'm clearly still not quite used to how RH made me change my name ;) try: http://people.redhat.com/esandeen/ext4/e2fsprogs-1.41.4-9.fc11.ers1.src.rpm eric, the updated e2fsprogs is allowing me to build again with ext3. re: comment #28, hrm, I would not have expected that. Nothing I changed should have really affected ext3, I thought. Your error was w.r.t. the loop device staying mounted, as in comment #10 right? Has anyone been able to test? I'd also like to know that the livecd is coming out to an ok size ... Thanks, -Eric Created attachment 344844 [details]
livecd-creator output with --fstype=ext3
I looked more carefully and the ext3 test did have fsck report errors, but the iso was produced. For the ext4 case, fsck also reported an error but the iso creation failed.
Created attachment 344845 [details]
livecd-creator output with --fstype=ext4
Bruno, you seem to be running into a different problem: umount: /home/livecd/tmp/imgcreate-JvImNZ/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) ioctl: LOOP_CLR_FD: Device or resource busy umount: /home/livecd/tmp/imgcreate-JvImNZ/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) e2fsck 1.41.4 (27-Jan-2009) Fedora-11-Live-g: recovering journal Clearing orphaned inode 352265 (uid=0, gid=0, mode=0100644, size=1318912) Clearing orphaned inode 352263 (uid=0, gid=0, mode=0100644, size=24576) Clearing orphaned inode 352264 (uid=0, gid=0, mode=0100644, size=229376) Clearing orphaned inode 352266 (uid=0, gid=0, mode=0100644, size=753664) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Fedora-11-Live-g: ***** FILE SYSTEM WAS MODIFIED ***** ... umount: /home/livecd/tmp/imgcreate-JvImNZ/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) rpmdb: PANIC: fatal region error detected; run recovery error: db4 error(-30974) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery rpmdb: PANIC: fatal region error detected; run recovery error: db4 error(-30974) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery The image wasn't unmounted, it seems, or at least not cleanly, so e2fsck replayed the journal & cleared orphan inodes. Not really related to the corruption this bug was opened for, but indicative of some other problem I think. -Eric (In reply to comment #30) > Created an attachment (id=344844) [details] > livecd-creator output with --fstype=ext3 > > I looked more carefully and the ext3 test did have fsck report errors, but the > iso was produced. For the ext4 case, fsck also reported an error but the iso > creation failed. Bruno -- you're hitting bug 501606 now I believe. I found that one earlier today and am watching it to. It was hard for me to tell which problems were caused by which bugs. Having 3 different updates break different parts of livecd-tools in a short period of time was confusing. (pykickstart also broke things.) I retested with the downgraded version of yum and things worked OK. So the test version of e2fsprogs is working OK. Built & tagged, e2fsprogs-1.41.4-10.f11. Bowed 3 times to the release engineering folks, should make F11 :) -Eric I'm afraid I'm still hitting this, on a fully up to date F11 system, including e2fsprogs-1.41.4-10.fc11. It's happening with both ext3 and ext4. The log file produced reads: Extending sparse file /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img to 12884901888 Losetup add /dev/loop0 mapping to /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img Formating ext3 filesystem on /dev/loop0 Tuning filesystem on /dev/loop0 Extending sparse file /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img to 12884901888 Mounting /dev/loop0 at /var/tmp/imgcreate-e04Z0e/install_root No such package *debuginfo to remove Unmounting directory /var/tmp/imgcreate-e04Z0e/install_root Losetup remove /dev/loop0 Checking filesystem /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img Checking filesystem /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img Error creating Live CD : fsck returned an error! I realise this doesn't contain all the info you might need. I ran livecd-creator with -v -d --logfile=blah, in an attempt to capture everything, but that's all I got. By the way, while the resize was running, I typed 'mount' in a separate window, and the filesystem wasn't mounted. argh, ok. Thanks for the report. I'll send a patch to save an e2image before each fsck so that we can examine it. Jeremy, we should probably be logging fsck output at least, so we have some notion of what goes wrong if it does. Actually I'm going to clone this bug to a new one, it's likely a different resize problem, and this one is closed. -Eric |