Bug 505339 - error still found with fsck after resizing during live image creation
Summary: error still found with fsck after resizing during live image creation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: e2fsprogs
Version: 11
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Eric Sandeen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 499452
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-11 14:48 UTC by Eric Sandeen
Modified: 2009-08-08 19:25 UTC (History)
11 users (show)

Fixed In Version: 1.41.4-12.fc11
Doc Type: Bug Fix
Doc Text:
Clone Of: 499452
Environment:
Last Closed: 2009-07-23 19:12:00 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
debugging patch (1008 bytes, patch)
2009-06-11 15:28 UTC, Eric Sandeen
no flags Details | Diff
Gzipped filesystem image from failed livecd creation (92 bytes, text/plain)
2009-06-11 17:40 UTC, James Heather
no flags Details

Description Eric Sandeen 2009-06-11 14:48:19 UTC
+++ This bug was initially created as a clone of Bug #499452 +++

<snip>

--- Additional comment from j.heather.uk on 2009-06-11 08:31:07 EDT ---

I'm afraid I'm still hitting this, on a fully up to date F11 system, including e2fsprogs-1.41.4-10.fc11. It's happening with both ext3 and ext4. The log file produced reads:

Extending sparse file /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img to 12884901888
Losetup add /dev/loop0 mapping to /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img
Formating ext3 filesystem on /dev/loop0
Tuning filesystem on /dev/loop0
Extending sparse file /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img to 12884901888
Mounting /dev/loop0 at /var/tmp/imgcreate-e04Z0e/install_root
No such package *debuginfo to remove
Unmounting directory /var/tmp/imgcreate-e04Z0e/install_root
Losetup remove /dev/loop0
Checking filesystem /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img
Checking filesystem /var/tmp/imgcreate-e04Z0e/tmp-atrv1b/ext3fs.img
Error creating Live CD : fsck returned an error!

I realise this doesn't contain all the info you might need. I ran livecd-creator with -v -d --logfile=blah, in an attempt to capture everything, but that's all I got.

By the way, while the resize was running, I typed 'mount' in a separate window, and the filesystem wasn't mounted.

--- Additional comment from esandeen on 2009-06-11 10:40:31 EDT ---

argh, ok.  Thanks for the report.
 
I'll send a patch to save an e2image before each fsck so that we can examine it.

Jeremy, we should probably be logging fsck output at least, so we have some notion of what goes wrong if it does.

Actually I'm going to clone this bug to a new one, it's likely a different resize problem, and this one is closed.

-Eric

Comment 1 Eric Sandeen 2009-06-11 15:28:13 UTC
Created attachment 347427 [details]
debugging patch

Here's a patch to fs.py in python-imgcreate (/usr/lib/python2.6/site-packages/imgcreate/fs.py) which should run e2fsck after each resize iteration, and create a metadata image before each resize.

So, if fsck fails, you'll have a metadata image created in /tmp/livecd-creator-imagefile and you can zip that up and get it to me for analysis.

Please also let me know exactly which version of e2fsprogs you're testing...

Thanks,
-Eric

Comment 2 James Heather 2009-06-11 15:35:51 UTC
Ta. I'll give that a go. In the meantime, I have discovered something that might help, and might explain why no-one else was still having trouble.

What I was creating was a customized live iso, essentially just the standard one with some more packages chucked in. I have now found out that it only fails when building mine; it's just built the default one with no problems.

The only thing I can think of is that I am pushing my luck with the partition size. I'm going for 12GB--maybe that's what's doing it. I'm now trying to rebuild my customized one with a 6GB partition to see if that gets anywhere.

You're welcome to the kickstart file if that would help; but we're only 15 mins away from finding out whether the partition size is the killer.

Comment 3 James Heather 2009-06-11 15:41:02 UTC
Yes, it's worked, and it's now compressing the filesystem.

So it looks like it's fine up to a certain size, somewhere between 6GB and 12GB, and above that point it all goes wrong. 8GB is the obvious guess for the cut-off.

Do I still need to apply the patch, or does that give you all you need to go on? I should imagine that building the standard image with --size 12288 will give the same error.

I'll see if I can find out exactly where the limit is.

Comment 4 Eric Sandeen 2009-06-11 15:47:57 UTC
Most of these have been tickling corner cases in resize2fs so I expect that it is size-sensitive.  But no reason to do a binary search on sizes, if you apply that patch and provide the pre-resize image that it creates, I should be able to get it sorted out.

Hm one thing I'm missing from the patch is something to print out the full resize2fs cmdline (unless it's already logged) - can you check that and add it yourself, or do we need a new patch from me?

Thanks,
-Eric

Comment 5 James Heather 2009-06-11 15:58:01 UTC
OK, patch applied. When it's finished doing the test I'd already started on (size=8100), I'll go back to 12GB and watch it die again.

I'll pull out the resize cmdline and add it here shortly.

Comment 6 James Heather 2009-06-11 16:51:21 UTC
All done.

 1205 pts/0    R+     0:00 /sbin/e2fsck -f -y /var/tmp/imgcreate-f7MgVM/tmp-6PTtXK/ext3fs.img

and later

 1212 pts/0    D+     0:03 /sbin/resize2fs /var/tmp/imgcreate-f7MgVM/tmp-6PTtXK/ext3fs.img 6291456K

Anyway, I now have the 12GB image left over from the patch. I am somewhat reluctant to attach 12GB to this bug report...

Comment 7 Eric Sandeen 2009-06-11 17:02:33 UTC
If you bzip2 it it should get pretty small; maybe you can put it on some url if it's still too big?

so it was resizing to 6291456K that caused the corruption?

IOW if I take that image, do resize2fs <image> 6291456K; e2fsck <image> I should see it, right?  (just not sure how you gathered it; but as long as you saw the fsck error, the image in /tmp was the pre-resize image, and the only question is what subsequent resize command caused it to get corrupted).

Thanks,
-Eric

Comment 8 James Heather 2009-06-11 17:15:18 UTC
OK, I've gzipped it, and now I'll attach it. It's 32MB after compressing.

What I got at the end of the attempt to create the live cd was as below. It seems to be complaining that something's modified the filesystem during the check, which seems unlikely considering that it's not mounted.

What follows was not captured on the same run as generated the image I'm about to attach, but I think there's enough here to give you the general idea.





  Installing: gnome-bluetooth-libs         ################### [1398/1399] 
  Installing: kdegames                     ################### [1399/1399]Unmounting directory /var/tmp/imgcreate-aCu2lb/install_root
Losetup remove /dev/loop0
Checking filesystem /var/tmp/imgcreate-aCu2lb/tmp-T6GZiV/ext3fs.img
e2fsck 1.41.4 (27-Jan-2009)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
JAH-live-0.2-i68: 189358/786432 files (0.5% non-contiguous), 1402878/3145728 blocks
Checking filesystem /var/tmp/imgcreate-aCu2lb/tmp-T6GZiV/ext3fs.img
e2fsck 1.41.4 (27-Jan-2009)
Backing up journal inode block information.

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

JAH-live-0.2-i68: ***** FILE SYSTEM WAS MODIFIED *****
JAH-live-0.2-i68: 189358/344064 files (0.6% non-contiguous), 1373584/1373584 blocks
/usr/lib/python2.6/site-packages/imgcreate/errors.py:45: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  return unicode(self.message)
Error creating Live CD : fsck returned an error!

Comment 9 Eric Sandeen 2009-06-11 17:25:09 UTC
Hm this may be it:  "Backing up journal inode block information."

What I really need is the exact resize command that corrupted it.  I'll look at your image but I may need to give you another patch to print it out explicitly.

I don't know if bz will take a 32M attachment, if not maybe you can host it somewhere...

Comment 10 James Heather 2009-06-11 17:40:23 UTC
Created attachment 347444 [details]
Gzipped filesystem image from failed livecd creation

Comment 11 James Heather 2009-06-11 17:43:35 UTC
Well, I've just run resize2fs <image> 6291456K on a copy of the image, and then done an e2fsck, and it produced no errors!

I thought that ps ax would give me the exact command line. Maybe not--maybe it involves doing something else to the image that corrupts it.

Thanks for all the help.

Comment 12 Eric Sandeen 2009-06-11 18:13:29 UTC
Do you still have the logs from the run that produced the problem?  Perhaps you can attach those as well.

You could modify your patch to do something like:

+    logging.debug("Creating pre-resize image of %s at /tmp/livecd-creator-imagefile" % fs)
+    subprocess.call(["/sbin/e2image", "-r", fs, "/tmp/livecd-creator-imagefile"])
+    logging.debug("Attempting resize to %sK" % (size / 1024,))
+    logging.debug("Attempting resize to %sK" % (size / 1024,))
     try:
         return subprocess.call(["/sbin/resize2fs", fs, "%sK" % (size / 1024,)],
                                stdout = dev_null, stderr = dev_null)

to give the exact size of the last resize command.

Comment 13 James Heather 2009-06-11 19:38:33 UTC
Ah--interesting. So does it iteratively resize it, then? I'd assumed it worked out the minimum size it could get away with, and then just did one resize operation.

I'm afraid I don't have the logs from that run, but I'll add in those lines and also keep the logs for another go at it.

Comment 14 Eric Sandeen 2009-06-11 19:45:26 UTC
yep, it does a binary search for some reason.

to save bugzila storage space ... ;)

next time when it fails, copy that image, try a resize to the last reported size, then run fsck against that, see if you get an error.  If so, provide the original and the resize2fs commandline, and off we go ....

-eric

Comment 15 James Heather 2009-06-11 20:38:30 UTC
It just did one resize, to 6291456K. Unfortunately, the copy of the pre-resize image resizes just fine!

I've edited the copy line so that it copies the whole file. Obviously that's not going to compress quite so well... but it will at least give me a file that gets broken if I resize it (I hope).

I've added the line

  subprocess.call(["/bin/cp", "", fs, "/tmp/livecd-creator-imagefile-big"])

Do tell me if that's rubbish... I am a coder but I don't speak python at all.

Comment 16 Eric Sandeen 2009-06-11 20:42:01 UTC
I too only code python by example .... I'm not sure you need the null argument in there but I doubt it hurts.

-Eric

Comment 17 James Heather 2009-06-11 21:39:08 UTC
Well, I'm now completely stuck. I have a copy of the filesystem image that gets resized to 6291456K and then won't fsck; but when I resize it to 6291456K, I can then fsck it just fine.

[root@derek tmp]# resize2fs livecd-creator-imagefile-big 6291456K
resize2fs 1.41.4 (27-Jan-2009)
Resizing the filesystem on livecd-creator-imagefile-big to 1572864 (4k) blocks.
The filesystem on livecd-creator-imagefile-big is now 1572864 blocks long.

[root@derek tmp]# e2fsck livecd-creator-imagefile-big
e2fsck 1.41.4 (27-Jan-2009)
Backing up journal inode block information.

JAH-live-0.2-i68: clean, 180195/393216 files, 1359477/1572864 blocks
[root@derek tmp]# 

I now have not the faintest idea what is going on.

Comment 18 Eric Sandeen 2009-06-11 21:44:55 UTC
(In reply to comment #17)
> Well, I'm now completely stuck. I have a copy of the filesystem image that gets
> resized to 6291456K and then won't fsck; but when I resize it to 6291456K, I
> can then fsck it just fine.
> 
> [root@derek tmp]# resize2fs livecd-creator-imagefile-big 6291456K
> resize2fs 1.41.4 (27-Jan-2009)
> Resizing the filesystem on livecd-creator-imagefile-big to 1572864 (4k) blocks.
> The filesystem on livecd-creator-imagefile-big is now 1572864 blocks long.
> 
> [root@derek tmp]# e2fsck livecd-creator-imagefile-big
> e2fsck 1.41.4 (27-Jan-2009)
> Backing up journal inode block information.
> 
> JAH-live-0.2-i68: clean, 180195/393216 files, 1359477/1572864 blocks

well, e2fsck is run with "-f -y" in the scripts

I bet if you do that on the pristine unrepaired image you gathered, and check the result ("echo $?" if my bash is any good) you'll see it.

I'll pull down the image attached to the bug if you think it's the right one, and have a look.

Thanks for keeping on top of this one :)

-Eric

Comment 19 Eric Sandeen 2009-06-11 22:10:12 UTC
Aurgh! ok, so I grabbed your image....

[root@inode scratch]# e2fsck -f -y livecd-creator-imagefile
e2fsck 1.41.4 (27-Jan-2009)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
JAH-live-0.2-i68: 189689/786432 files (0.2% non-contiguous), 1394833/3145728 blocks
[root@inode scratch]# echo $?
0

so it is clean.

[root@inode scratch]# resize2fs  livecd-creator-imagefile 6291456K
resize2fs 1.41.6 (30-May-2009)
Resizing the filesystem on livecd-creator-imagefile to 1572864 (4k) blocks.
The filesystem on livecd-creator-imagefile is now 1572864 blocks long.

[root@inode scratch]# echo $?
0

so far so good

# e2fsck -f -n livecd-creator-imagefile | more
e2fsck 1.41.4 (27-Jan-2009)
Pass 1: Checking inodes, blocks, and sizes
Inode 50 has a extra size (31335) which is invalid
Fix? no

Inode 117287 has a extra size (31335) which is invalid
Fix? no

Inode 117360 has a extra size (31335) which is invalid
Fix? no

Inode 117364 has a extra size (31335) which is invalid
Fix? no

Inode 117365 has a extra size (31335) which is invalid
Fix? no

...

now not so good.

Ok, I think I have what I need, thanks!

-Eric

Comment 20 Eric Sandeen 2009-06-11 22:43:44 UTC
Hmm and on a different run:

[root@inode scratch]# e2fsck -f -y livecd-creator-imagefile.hack
e2fsck 1.41.4 (27-Jan-2009)
Backing up journal inode block information.

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

JAH-live-0.2-i68: ***** FILE SYSTEM WAS MODIFIED *****
JAH-live-0.2-i68: 189689/393216 files (0.5% non-contiguous), 1368898/1572864 blocks
[root@inode scratch]# echo $?
1

so that first "Backing up" is what set it to a non-0 return I guess.

Comment 21 James Heather 2009-06-12 08:07:21 UTC
D'oh! Can't believe I didn't put in the '-f'. I suppose I naively assume most of the time that if there's been any corruption then the filesystem won't be clean, but that's a daft assumption if you've just resized it.

Btw the extra "" in my cp line did make a difference... It meant I had three parameters to cp, which meant it insisted on taking the third one as a directory, which didn't exist...

Comment 22 Thomas J. Baker 2009-06-12 20:26:22 UTC
I'm experiencing this too on much smaller (less than 100MB) images. I do see the same journal inode message described in #20.

Comment 23 Simon Everts 2009-06-15 19:47:43 UTC
I'm hitting this too. Updated e2fsprogs to 1.41.6 but doesn't help.
Also both ext3 and ext4

When omitting the e2fsck the resulting image has a size of 296M and seems to run fine on qemu.

using a kickstart file for livecd-creator with only the groups @base and @core from the fedora 11 repository

note: the kickstart file /usr/share/doc/livecd-tools-024/livecd-fedora-minimal.ks seems to run with no problem

Comment 24 Mads Kiilerich 2009-06-17 16:54:14 UTC
FWIW, I just added a misplaced reproducible test case for this on bug 506105#c2 ...

Comment 25 Eric Sandeen 2009-06-17 19:48:40 UTC
Ok, thanks to the testcases I have this one sorted.  Simple in the end, but then aren't they all.  :)

Fairly recently the journal creation was changed so that it's made in the middle of the filesystem, so if you shrink the fs by half or so the journal gets moved.

However, the backup journal blocks in the superblock aren't updated, and that's what fsck is finding.  I've sent a rough patch to the list, pending resolution of a couple questions of the best way to do this I'll have it fixed soon.

Thanks guys!

-Eric

Comment 26 Eric Sandeen 2009-06-17 21:06:53 UTC
Spoke too soon, the original testcase in this bug still fails.  Yay, resize2fs!

Comment 27 Eric Sandeen 2009-06-17 23:18:06 UTC
Arrright, got it now.  Fix in rawhide soon, Ted may just release a new e2fsprogs, if folks can wait 'til then.

So James, your bug was related to the superblock journal backup, and the other bug I hit was only present in e2fsprogs-1.41.6, so you hadn't seen that bug yet, but it's fixed now anyway ;)

-Eric

Comment 28 James Heather 2009-06-18 08:39:31 UTC
Brilliant--thanks for sorting that! And well done for squashing the extra bug before it got widely distributed.

Will we see the fixed version in updates-testing very soon?

Comment 29 Eric Sandeen 2009-06-18 13:26:37 UTC
I hope to get at least the F11 fix pushed today

-Eric

Comment 30 Fedora Update System 2009-06-18 14:03:32 UTC
e2fsprogs-1.41.4-11.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/e2fsprogs-1.41.4-11.fc11

Comment 31 Eric Sandeen 2009-06-18 14:21:38 UTC
Rawhide is building with these and related fixes as well.

Comment 32 Mads Kiilerich 2009-06-18 14:35:40 UTC
I confirm that it seems to work for me on F11. Thanks!

But I still see another issue I found while hunting this one - see bug 506740

Comment 33 Thomas J. Baker 2009-06-18 14:41:36 UTC
I installed the updates and was able to build a live cd that wasn't working before. Looks good from here.

Comment 34 James Heather 2009-06-18 16:36:03 UTC
Works for me too!

Comment 35 Fedora Update System 2009-06-19 13:42:09 UTC
e2fsprogs-1.41.4-11.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update e2fsprogs'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-6756

Comment 36 Fedora Update System 2009-06-24 17:48:53 UTC
e2fsprogs-1.41.4-12.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/e2fsprogs-1.41.4-12.fc11

Comment 37 Fedora Update System 2009-06-27 02:53:33 UTC
e2fsprogs-1.41.4-6.fc10 has been pushed to the Fedora 10 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update e2fsprogs'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-7005

Comment 38 Fedora Update System 2009-06-30 15:38:02 UTC
e2fsprogs-1.41.4-12.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/e2fsprogs-1.41.4-12.fc11

Comment 39 Fedora Update System 2009-07-02 05:53:45 UTC
e2fsprogs-1.41.4-12.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update e2fsprogs'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-7278

Comment 40 Jim McEvoy 2009-07-10 19:38:23 UTC
(In reply to comment #39)
> e2fsprogs-1.41.4-12.fc11 has been pushed to the Fedora 11 testing repository. 
> If problems still persist, please make note of it in this bug report.
>  If you want to test the update, you can install it with 
>  su -c 'yum --enablerepo=updates-testing update e2fsprogs'.  You can provide
> feedback for this update here:
> http://admin.fedoraproject.org/updates/F11/FEDORA-2009-7278  

Installed the x64 version of e2fsprogs-1.41.4-12.fc11 and it fixed the problem I had building LiveCD iso's...  Thanks for the bug fix

Comment 41 Fedora Update System 2009-07-23 19:11:42 UTC
e2fsprogs-1.41.4-6.fc10 has been pushed to the Fedora 10 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 42 Fedora Update System 2009-08-08 19:25:23 UTC
e2fsprogs-1.41.4-12.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.