Bug 619020 - dumpe2fs says Journal superblock magic number invalid! Ext4 filesystem from compose host's livecds
Summary: dumpe2fs says Journal superblock magic number invalid! Ext4 filesystem from c...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: squashfs-tools
Version: rawhide
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
Assignee: Bruno Wolff III
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 615443 (view as bug list)
Depends On:
Blocks: F14Alpha, F14AlphaBlocker 615443
TreeView+ depends on / blocked
 
Reported: 2010-07-28 11:46 UTC by Jasper O'neal Hartline
Modified: 2013-01-09 01:35 UTC (History)
18 users (show)

Fixed In Version: squashfs-tools-4.0-4.fc12
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-08-05 06:49:55 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
livecd creation & test session (10.49 KB, text/plain)
2010-07-28 22:54 UTC, Eric Sandeen
no flags Details

Description Jasper O'neal Hartline 2010-07-28 11:46:47 UTC
Description of problem:
dumpe2fs says Journal superblock magic number invalid! Ext4 filesystem from compose host's livecds


Version-Release number of selected component (if applicable):
e2fsprogs-1.41.12-5.fc14.i686.rpm

How reproducible:
Spin a live disc on the RAWHIDE compose host

Steps to Reproduce:
1.
2.
3.
  
Actual results:
ext3fs.img: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge files)

ext3fs.img is produced where dumpe2fs says this about the filesystem:
[root@localhost tmp]# dumpe2fs ext3fs.img 
dumpe2fs 1.41.10 (10-Feb-2009)
Filesystem volume name:   _desktop-i386-20
Last mounted on:          /var/tmp/imgcreate-c2iaYH/install_root
Filesystem UUID:          b58bc2a5-c70b-4ee6-8c72-4dd8367fd941
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              196608
Block count:              786432
Reserved block count:     7863
Free blocks:              252857
Free inodes:              116471
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      191
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Tue Jul 27 21:32:01 2010
Last mount time:          Tue Jul 27 21:32:06 2010
Last write time:          Tue Jul 27 21:52:04 2010
Mount count:              0
Maximum mount count:      -1
Last checked:             Tue Jul 27 21:52:04 2010
Check interval:           0 (<none>)
Lifetime writes:          597 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      059ef793-ed97-4322-ae50-e24387831f55
Journal backup:           inode blocks
Journal superblock magic number invalid!
[root@localhost tmp]#

Expected results:


Additional info:

Comment 1 Eric Sandeen 2010-07-28 14:53:54 UTC
Any chance this is a big-endian arch?  Hm no i686.

Can you do:

# debugfs ext3fs.img
debugfs: stat <8>

(attach or paste that output)

debugfs: dump <8> /some/path/to/file

which will dump out the journal file, and either attach that file if not too big, or hexdump -C the first part of it and attach the first few lines?  Curious to see what's there.

Comment 2 Eric Sandeen 2010-07-28 14:54:49 UTC
I'm a little rusty on livecd generation, if you can either point me to docs, or tell me what commands to run to reproduce, that'd be helpful too.

Thanks,
-Eric

Comment 3 Kevin Fenzi 2010-07-28 15:46:39 UTC
I can provide info on the nightly compose machine... 

from the last image made there: 


debugfs:  stat <8>
Inode: 8   Type: regular    Mode:  0600   Flags: 0x80000
Generation: 0    Version: 0x00000000:00000000
User:     0   Group:     0   Size: 67108864
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 131072
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x4c4fb2c2:00000000 -- Tue Jul 27 22:32:02 2010
 atime: 0x4c4fb2c2:00000000 -- Tue Jul 27 22:32:02 2010
 mtime: 0x4c4fb2c2:00000000 -- Tue Jul 27 22:32:02 2010
crtime: 0x4c4fb2c2:00000000 -- Tue Jul 27 22:32:02 2010
Size of extra inode fields: 28
EXTENTS:
(0-16383): 360448-376831

The file is indeed pretty large, but looks like nulls. 
hexdump -C shows: 

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
04000000

Comment 4 François Cami 2010-07-28 15:53:57 UTC
Eric, to create a livecd:
$ yum -y install livecd-tools
$ git clone git://git.fedorahosted.org/spin-kickstarts.git
# livecd-creator -c fedora-livecd-desktop.ks -f desktop-20100728

Comment 5 satellitgo 2010-07-28 16:10:45 UTC
soas-i386-20100727.16.iso burned to CD:
Boots and does initial grub then: 

mount: wrong fs type, bad option,bad superblock on /dev/mapper/live-rw sleeping forever can't mount root filesystem = bug 619020 [8] 

It looks like bug 615443 got fixed as CD does boot to grub

Comment 6 Eric Sandeen 2010-07-28 16:14:34 UTC
OK thanks guys, will look into it.  Why is it that every new release of e2fsprogs breaks livecd-tools (or is it vice-versa?)  ;)

Comment 7 Kevin Fenzi 2010-07-28 16:21:58 UTC
I did try both: 

e2fsprogs-1.41.12-3.fc14.x86_64.rpm
and
e2fsprogs-1.41.12-4.fc14.x86_64.rpm

instead of the current 
e2fsprogs-1.41.12-5.fc14.x86_64.rpm

And got the same results as far as I could tell. ;( 

This problem may have started on 2010-07-03 or 2010-07-17... it's hard to pinpoint. 
:(

Comment 8 Eric Sandeen 2010-07-28 16:40:54 UTC
Getting livecd build failures in the repo it seems:

Retrieving http://download.fedora.devel.redhat.com/pub/fedora/linux/development/rawhide/x86_64/os/repodata/repomd.xml ...OK
Retrieving http://download.fedora.devel.redhat.com/pub/fedora/linux/development/rawhide/x86_64/os/repodata/9cb7284f28f18da5200736822748af32795899f71aa54faa7eeb0232471c7087-primary.sqlite.bz2 ...OK
Retrieving http://download.fedora.devel.redhat.com/pub/fedora/linux/development/rawhide/x86_64/os/repodata/f55403032212d1990822018c3401d1480b2e8c466ae74f31c8ec3aa2351983de-comps-rawhide.xml.gz ...OK
/usr/lib/python2.6/site-packages/imgcreate/errors.py:45: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  return unicode(self.message)
Error creating Live CD : Failed to build transaction : gnote-0.7.2-1.fc14.x86_64 requires libboost_system-mt.so.1.41.0()(64bit)
totem-2.30.2-2.fc14.x86_64 requires libpython2.6.so.1.0()(64bit)
gnome-dvb-daemon-0.1.20-1.fc14.x86_64 requires python(abi) = 2.6
notify-python-0.1.1-8.fc12.x86_64 requires python(abi) = 2.6
gnote-0.7.2-1.fc14.x86_64 requires libboost_filesystem-mt.so.1.41.0()(64bit)


now what?

Comment 9 François Cami 2010-07-28 16:51:34 UTC
I'm going to try and see if I can reproduce on the F13 kickstarts, I'll let you know.

Comment 10 Kevin Fenzi 2010-07-28 16:58:31 UTC
Yeah, rawhide is broken now since boost just landed. ;(

Comment 11 Jasper O'neal Hartline 2010-07-28 18:57:51 UTC
The machine it is being run on is x86_64 which is what Kevin mentioned.
The target machine the ISO is built for is i686.

I will attach this information you requested.

Comment 12 Jasper O'neal Hartline 2010-07-28 19:07:45 UTC
Here is the file and data:
[root@localhost tmp]# debugfs ext3fs.img
debugfs 1.41.10 (10-Feb-2009)
debugfs:  stat <8>
Inode: 8   Type: regular    Mode:  0600   Flags: 0x80000
Generation: 0    Version: 0x00000000:00000000
User:     0   Group:     0   Size: 67108864
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 131072
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x4c4fb2c2:00000000 -- Tue Jul 27 21:32:02 2010
 atime: 0x4c4fb2c2:00000000 -- Tue Jul 27 21:32:02 2010
 mtime: 0x4c4fb2c2:00000000 -- Tue Jul 27 21:32:02 2010
crtime: 0x4c4fb2c2:00000000 -- Tue Jul 27 21:32:02 2010
Size of extra inode fields: 28
EXTENTS:
(0-16383): 360448-376831
debugfs:  dump <8> /tmp/debugfs-dump-8.txt
debugfs:  quit
[root@localhost tmp]# ls -l /tmp/debugfs-dump-8.txt 
-rw-r--r-- 1 root root 67108864 Jul 28 11:52 /tmp/debugfs-dump-8.txt
[root@localhost tmp]#

debugfs dump of <8>:
http://autopsy.liveprojects.info/external/extra/debugfs-dump-8.txt

Comment 13 Eric Sandeen 2010-07-28 20:46:14 UTC
Jasper, ok, same thing - totally zeroed log (perhaps you should have zipped it, I bet it'd compress nicely ;)

I added dumpe2fs to the e2fsck() function in the python-imgcreate code:

def e2fsck(fs):
    logging.debug("Checking filesystem %s" % fs)
    rc = subprocess.call(["/sbin/e2fsck", "-f", "-y", fs])
    if rc != 0:
        return rc
    rc = subprocess.call(["/sbin/dumpe2fs", "-h", fs])
    return rc

and ran against f13 since rawhide is busted.  I don't see the problem here, yet, but then I'm having a hard time updating my rawhide system, as well.  I do have the e2fsprogs version in question installed, though.

Comment 14 Eric Sandeen 2010-07-28 22:54:09 UTC
Created attachment 435140 [details]
livecd creation & test session

From the attachment you can see that the process did fsck & dumpe2fs w/o error, but the image inside squashfs is corrupt; I'm inclined to blame squashfs, esp. since I could not hit this when composing under a rhel6 kernel.  I'll do some further investigation.

Comment 15 Jasper O'neal Hartline 2010-07-28 23:21:39 UTC
Interesting find. I was looking at the possibility of it being the SquashFS at the beginning, but staring at two screens of hexdump on the machine with a good and a bad image, I didn't know much more to make of it.

Thanks for looking into it further.

Comment 16 Jasper O'neal Hartline 2010-07-28 23:34:09 UTC
Here is the Koji package changelog listing for reference purposes.
http://koji.fedoraproject.org/koji/buildinfo?buildID=186681

Comment 17 Eric Sandeen 2010-07-29 18:35:39 UTC
I'm now inclined to blame squashfs userspace rather than the kernel, but testing it...

Comment 18 Eric Sandeen 2010-07-29 19:29:18 UTC
Grr scratch that, older squashfs-tools works fine too.

And yet new kernel / new squashfs-tools / new e2fsprogs works just fine on -my- test box...

Comment 19 Jasper O'neal Hartline 2010-07-29 22:24:16 UTC
I was unable to reproduce the issue also, and I know sometimes bugs shouldn't be filed if the bug filer cannot reproduce, but so far I have only seen it reliably reproduced on the compose host.

The spins of July 27 were a specific run tested to reproduce this issue.
The reason the bug was filed is because there might be a bug roaming around that isn't letting it's self known.

I wonder now if there is something specific to the nightly compose machine that is corrupted. 

I tried reproducing on Fedora 13 i686 with squashfs-tools and e2fsprogs compiled from RAWHIDE and couldn't reproduce the issue, I also tested with a completely RAWHIDE system which is Fedora RAWHIDE x86_64 on 64 bit hardware, could not reproduce. 

Perhaps someone should investigate the compose host directly.

Comment 20 Jasper O'neal Hartline 2010-07-29 22:26:52 UTC
Sorry, left out important information I assume Eric or others may have known already, Kevin says he may be able to get more information from the compose host, he may also be the one to talk to about investigating, I believe he can reproduce the issue reliably.

Comment 21 Eric Sandeen 2010-07-29 22:36:33 UTC
I've also only been able to repro on the compose host.

And I retract my allegations against squashfs, that doesn't seem to be the problem.

Comment 22 Eric Sandeen 2010-07-30 02:41:31 UTC
I retract my retraction ;)

[root@spin01 tmp]# md5sum squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img 
be9cb2dfe5d61aca6614759175eb7df3  squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img

[root@spin01 tmp]# mksquashfs squashdir/ mysquashfs.img &>/dev/null

[root@spin01 tmp]# mount -o loop mysquashfs.img mnt/
mount: warning: mnt/ seems to be mounted read-only.

[root@spin01 tmp]# md5sum mnt/tmp-ZKZYsE/LiveOS/ext3fs.img 
28ea62d79234e4e458d96179f12f3190  mnt/tmp-ZKZYsE/LiveOS/ext3fs.img

so the squashed & unsquashed images have different md5sums ...!

[root@spin01 tmp]# ls -lh squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img
-rwxr-xr-x. 1 root root 4.0G Jul 30 02:13 squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img

[root@spin01 tmp]# du -hc squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img
1.5G	squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img
1.5G	total

ok so it's sparse ... let's make a non-sparse copy and see if it fares better:

[root@spin01 tmp]# cp --sparse=never squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img squashdir/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img

[root@spin01 tmp]# md5sum squashdir/tmp-ZKZYsE/LiveOS/ext3fs*
be9cb2dfe5d61aca6614759175eb7df3  squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img
be9cb2dfe5d61aca6614759175eb7df3  squashdir/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img

ok same...

[root@spin01 tmp]# mksquashfs squashdir/ mysquashfs2.img &>/dev/null

[root@spin01 tmp]# mount -o loop mysquashfs2.img  mnt/
mount: warning: mnt/ seems to be mounted read-only.

[root@spin01 tmp]# md5sum mnt/tmp-ZKZYsE/LiveOS/ext3fs*
a0ff1a8402e65cbafe1b7abfa0d595f3  mnt/tmp-ZKZYsE/LiveOS/ext3fs.img
28ea62d79234e4e458d96179f12f3190  mnt/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img

different!

uh... ok now I'm totally confused.  the original image now has the correct md5sum, while the new nonsparse image has the prior bad md5sum?

Anyway... looks like a squashfs bug to me.

-Eric

Comment 23 Jasper O'neal Hartline 2010-07-30 03:52:04 UTC
Ok can we reassign this to the squashfs-tools maintainer?

Comment 24 Jasper O'neal Hartline 2010-07-30 03:58:05 UTC
I have reassigned it to squashfs-tools.

Comment 25 Eric Sandeen 2010-07-30 04:33:30 UTC
It could possibly be the kernel code too I suppose, need a bit more investigation, maybe some testing of older code, I guess.

Comment 26 Eric Sandeen 2010-07-30 04:53:58 UTC
unsquashfsing also gives us a corrupted file:

[root@spin01 tmp]# sudo unsquashfs mysquashfs2.img 
Parallel unsquashfs: Using 4 processors
3 inodes (65546 blocks) to write
[================================================================================================================-] 65546/65546 100%
created 3 files
created 3 directories
created 0 symlinks
created 0 devices
created 0 fifos

[root@spin01 tmp]# md5sum squashdir/tmp-ZKZYsE/LiveOS/ext3fs* squashfs-root/tmp-ZKZYsE/LiveOS/ext3fs*
be9cb2dfe5d61aca6614759175eb7df3  squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img
be9cb2dfe5d61aca6614759175eb7df3  squashdir/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img
a0ff1a8402e65cbafe1b7abfa0d595f3  squashfs-root/tmp-ZKZYsE/LiveOS/ext3fs.img
28ea62d79234e4e458d96179f12f3190  squashfs-root/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img

Comment 27 Bruno Wolff III 2010-07-30 04:57:56 UTC
There have been squashfs related changes going into the kernel during this time period. squashfs-tools appeared to work without problem for a month. So while there could be a bug in it triggered by another change, I'd be more inclined to think kernel.
I did do a sync up to the latest upstream development just before the branch. It's possible if there was some incompatible change that was made to the kernel support (Lougher added xattr support for 2.6.35) that syncing up might help.

Comment 28 Jasper O'neal Hartline 2010-07-30 05:22:16 UTC
In reply to comment 27 can you provide the git commands to clone the latest development branch.

Comment 29 Bruno Wolff III 2010-07-30 05:24:54 UTC
Is this only happening on x86_64?
I can compress stuff and loop mount it or uncompress it and things look fine?
That might provide a clue as to where to look for problems.

Comment 30 Bruno Wolff III 2010-07-30 05:27:58 UTC
If you are talking upstream its:
cvs -d:pserver:anonymous.sourceforge.net:/cvsroot/squashfs export
 -D 2010-07-27 squashfs
I haven't done a git check out since the switch yet. Things were still cvs earlier in the week.

Comment 31 Phillip Lougher 2010-07-30 05:52:57 UTC
Eric

Can you give me a link to that ext3fs.img that gives bad md5sums on Squashfs?

Also the output from mksquashfs -version would be useful as that gives me the approximate date the code was checked into CVS.

From the info it looks like a squashfs-tools bug because the md5sum should never differ between the original and the squashfs version, especially so for the sparse and non-sparse file in the Squashfs filesystem.

Incidentally, the fact that the md5sum switches in the different squashfs filesystems (so that the original image now has the correct md5sum in the second filesystem) is significant.  It points to a bug in the code that determines whether an inode is an "extended file inode" or not, and that code has changed in the last couple of months (the code changed to accommodate the fact that a file with xattrs is an extended file inode).

Thanks

Phillip

Comment 32 Bruno Wolff III 2010-07-30 05:58:08 UTC
I also tried the f14 mksquashfs, unsquashfs and loop mount on an otherwise f13 x86_64 system and I got the same sha1sum for the original, the unsquashfs version, and the loop mount version.
I tested on a 700Mb file.

Comment 33 Bruno Wolff III 2010-07-30 06:03:54 UTC
I checked out versions on June 7th and July 27th. The latest in rawhide is from July 27th. I think that this is currently the same as the latest version in your cvs repo.
Two patches are applied. One to use the Fedora standard gcc options and the other to use xz for lzma support.

Comment 34 Jasper O'neal Hartline 2010-07-30 10:01:39 UTC
Looking at logs from the compose host, and from my testing locally I do notice an error message which eminates from lines 137 and 154 of xattr.c from that CVS checkout of squashfs in squashfs/squashfs-tools/xattr.c where it prints llistxattr failed in read_attrs right after the image files are produced. However not knowing enough about the xattr code here this is just a reference in case it seems important. You can see it in the logs of the nighly-compose host on any of them that fail, I notice on my machines I do not see that message, and the images are fine, and can be mounted correctly. Here also are the URLs for download of a broken image of squashfs.img and ext3fs.img of security spin from the nightly-compose host around the 27th of July: http://autopsy.liveprojects.info/external/extra/squashfs.img and http://autopsy.liveprojects.info/external/extra/ext3fs.img

Comment 35 Bruno Wolff III 2010-07-30 13:18:25 UTC
I see that warning at least some of the time, but am still get good images.
I am not sure how to reproduce the problem that people are seeing.
Are people still having the ext3 image mounted when they try squashing it?

Comment 36 Jasper O'neal Hartline 2010-07-30 14:08:23 UTC
In response to Comment 35
Not sure. Try creating a sparse Ext4 filesystem, loset it up, (copy some /etc/yum.repos.d to $mntpoint/etc/yum.repos.d/ and hardcode the $releasever and/or $basearch to a valid value, rawhide or otherwise, populate it with yum --installroot=/mnt/ext4 install kernel bash filesystem and run resize2fs -M on it. Then squash it with SquashFS tools 4.1 from the CVS checkout.

Does it appear corrupt, or does mounting it as type squashfs and running dumpe2fs show an invalid journal magic for it?

Comment 37 Adam Williamson 2010-07-30 16:21:30 UTC
In so far as this bug is the cause of 615443, please be aware that this needs to be resolved in some way or another - we need to be able to generate working live images for x86-64 and i686 - by Tuesday 2010-08-03, or the Alpha will slip.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 38 Eric Sandeen 2010-07-30 16:24:39 UTC
(In reply to comment #37)
> In so far as this bug is the cause of 615443, please be aware that this needs
> to be resolved in some way or another - we need to be able to generate working
> live images for x86-64 and i686 - by Tuesday 2010-08-03, or the Alpha will
> slip.

Given that other hosts seem to work, maybe doing the compose elsewhere could be a stopgap measure, unless there is some reason that fedora livecds need to be self-hosted/composed... :)

Comment 39 Kevin Fenzi 2010-07-30 16:32:55 UTC
Yes, I will see about getting a kvm or other based box somewhere today to test/try out.

Comment 40 Phillip Lougher 2010-07-30 16:45:11 UTC
Can someone gzip/bzip2 this ext3.img, otherwise it's going to take a long time
to download on my link

http://autopsy.liveprojects.info/external/extra/ext3fs.img

Comment 41 Bruno Wolff III 2010-07-30 16:49:40 UTC
I think I was able to duplicate this (I get the same error message) by loop mounting a good ext3 image, updating it and then running mksquashfs on the backing ext3 image file. When I later loop mounted the ext3 image inside the squashfs image, I got:
[root@bruno sq]# mount -o loop test.img /mnt/iso
mount: wrong fs type, bad option, bad superblock on /dev/loop3,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

Comment 42 Eric Sandeen 2010-07-30 16:51:02 UTC
(In reply to comment #31)
> Eric
> 
> Can you give me a link to that ext3fs.img that gives bad md5sums on Squashfs?

http://alt.fedoraproject.org/pub/alt/nightly-composes/ext3fs.img.bz2

Not sure the image is the problem but maybe we can at least verify that.

Thanks,
-Eric

Comment 43 Kevin Fenzi 2010-07-30 19:04:49 UTC
I dowgraded spin01 to: 

squashfs-tools-4.0-4.fc14.x86_64

and still see the issue: 

$ md5sum squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img 
be9cb2dfe5d61aca6614759175eb7df3  squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img
$ mksquashfs /tmp/squashdir/ mysquashfs.img
$ sudo mount -o loop mysquashfs.img mnt/
$ md5sum mnt/tmp-ZKZYsE/LiveOS/ext3fs.img 
45fcb40fdcf9ac6a2f94f677807c73d4  mnt/tmp-ZKZYsE/LiveOS/ext3fs.img

Comment 44 Phillip Lougher 2010-07-30 19:55:18 UTC
Cannot reproduce here (Ubuntu 9.10, x86_64)

Ext3.img file, stored both sparsely and non-sparsely

root@logopolis:/stripe/redhat# ls -hs LiveOS/*
1.5G LiveOS/ext3fs.img  4.1G LiveOS/ext3fs-nosparse.img

They both have the expected md5sum

root@logopolis:/stripe/redhat# md5sum LiveOS/*
be9cb2dfe5d61aca6614759175eb7df3  LiveOS/ext3fs.img
be9cb2dfe5d61aca6614759175eb7df3  LiveOS/ext3fs-nosparse.img

Verify we're using the latest CVS version of Mksquashfs

root@logopolis:/stripe/redhat# /mksquashfs -version| head -1
mksquashfs version 4.1-CVS (2010/07/19)

Do the squashing...

root@logopolis:/stripe/redhat# /mksquashfs LiveOS test.sqsh -keep-as-directory -no-progress
Parallel mksquashfs: Using 4 processors
Creating 4.0 filesystem on test.sqsh, block size 131072.
llistxattr failed in read_attrs

Exportable Squashfs 4.0 filesystem, gzip compressed, data block size 131072
	compressed data, compressed metadata, compressed fragments, compressed xattrs
	duplicates are removed
Filesystem size 470680.46 Kbytes (459.65 Mbytes)
	5.61% of uncompressed filesystem size (8388864.48 Kbytes)
Inode table size 64534 bytes (63.02 Kbytes)
	24.60% of uncompressed inode table size (262386 bytes)
Directory table size 74 bytes (0.07 Kbytes)
	76.29% of uncompressed directory table size (97 bytes)
Number of duplicate files found 1
Number of inodes 4
Number of files 2
Number of fragments 0
Number of symbolic links  0
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 2
Number of ids (unique uids + gids) 1
Number of uids 1
	root (0)
Number of gids 1
	root (0)

Note Mksquashfs determines that the non-sparse file is a duplicate of the sparse file (Number of duplicate files found 1)

root@logopolis:/stripe/redhat# mount -t squashfs test.sqsh mnt -o loop

Check sparse handling is correct ...

root@logopolis:/stripe/redhat# ls -hs mnt/LiveOS/*
1.5G mnt/LiveOS/ext3fs.img  4.0G mnt/LiveOS/ext3fs-nosparse.img

Md5sums are correct too ...

root@logopolis:/stripe/redhat# md5sum mnt/LiveOS/*
be9cb2dfe5d61aca6614759175eb7df3  mnt/LiveOS/ext3fs.img
be9cb2dfe5d61aca6614759175eb7df3  mnt/LiveOS/ext3fs-nosparse.img

Double check by seeing what Unsquashfs makes of the file system

root@logopolis:/stripe/redhat# /unsquashfs -no-progress test.sqsh
Parallel unsquashfs: Using 4 processors
2 inodes (65536 blocks) to write

created 2 files
created 2 directories
created 0 symlinks
created 0 devices
created 0 fifos

root@logopolis:/stripe/redhat# ls -hs squashfs-root/LiveOS/*
1.5G squashfs-root/LiveOS/ext3fs.img  4.1G squashfs-root/LiveOS/ext3fs-nosparse.img

root@logopolis:/stripe/redhat# md5sum squashfs-root/LiveOS/*
be9cb2dfe5d61aca6614759175eb7df3  squashfs-root/LiveOS/ext3fs.img
be9cb2dfe5d61aca6614759175eb7df3  squashfs-root/LiveOS/ext3fs-nosparse.img

Everything works for Unsquashfs too...

Comment 45 Eric Sandeen 2010-07-30 21:13:18 UTC
(In reply to comment #41)
> I think I was able to duplicate this (I get the same error message) by loop
> mounting a good ext3 image, updating it and then running mksquashfs on the
> backing ext3 image file.

Bruno, that is actually very much expected, but it's not the bug we're seeing here I think.  (a mounted backing file is not consistent unless you freeze the filesystem).

The reproducer of squashfs'ing/unsquashfs'ing an unmounted image file is what's relevant here, I think.

Comment 46 Jasper O'neal Hartline 2010-07-30 21:47:06 UTC
In response to the request to bzip2 the ext3fs.img in Comment 40 by Phillip, this is the ext3fs.img bzipped, should be about 57 MB.

http://autopsy.liveprojects.info/external/extra/ext3fs.img.bz2

Comment 47 Jasper O'neal Hartline 2010-07-30 21:58:19 UTC
I would like to remind everyone this is going to be Ext4 filesystem.
The name is ext3fs.img but do not let that confuses you.
This is dealing with Ext4. Not Ext3.

Also with that, Kevin you might try spinning up a few discs or at least one, using --fstype ext3 in the spin kickstarts and see if those are corrupt, also as a side measure to get some data and/or even alleviate the immediate problem. However results from the KVM would also be desireable on the report here, care to give that a try so we can at least narrow it down to Ext4/SquashFS?
(The compose hosts kickstart files use --fstype ext4 in them, I'd imagine)

Comment 48 Eric Sandeen 2010-07-30 22:24:56 UTC
Jasper, I don't think the contents of the file matter, so whether it's ext3, ext4, or leaked army documents probably doesn't change the outcome.  :)

FWIW sparseness doesn't seem to matter at all.  An image from a dir with just my non-sparse ext3 image shows the same problem on the spin01 host.

Kevin also tested on a kvm host w/ no problems.

Comment 49 Kevin Fenzi 2010-07-30 22:28:48 UTC
The contents of the file seem to not matter at all. 

I am unable to duplicate this with the same kernel/arch/squashfs-tools on a machine here thats a kvm guest. :( This is really starting to look like corruption or problems on spin01 (possibly xen guest issues?)

I'm going to try and get another compose machine setup and test from there.

Comment 50 Eric Sandeen 2010-07-30 22:34:53 UTC
FWIW The first instance of 0s in the unsquashed corrupt file are at an interesting offset, and continue to the end of the file:

# cmp -bl squashfs-root/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img squashdir/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img  | more
 264241153   0 ^@    42 "

[root@spin01 tmp]# bc
obase=16
264241153
FC00001

Looking at a hexdump of the corrupt unsquashed file, it's all 0s from that point on:

0fc00000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
100000000

that's a rather large swath of 0s ...

Maybe this rings a bell for Phillip.

Comment 51 Phillip Lougher 2010-07-30 23:13:50 UTC
Eric

Previously you wrote about creating a Squashfs file with both sparse and non-sparse versions of the ext3fs.img...

======================
[root@spin01 tmp]# mksquashfs squashdir/ mysquashfs2.img &>/dev/null

[root@spin01 tmp]# mount -o loop mysquashfs2.img  mnt/
mount: warning: mnt/ seems to be mounted read-only.

[root@spin01 tmp]# md5sum mnt/tmp-ZKZYsE/LiveOS/ext3fs*
a0ff1a8402e65cbafe1b7abfa0d595f3  mnt/tmp-ZKZYsE/LiveOS/ext3fs.img
28ea62d79234e4e458d96179f12f3190  mnt/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img
=======================

It would be interesting to know what that output of Mksquashfs was, i.e. whether it found any duplicates.  That will help to indicate what's going wrong - we know  in the filesystem that the files are different (md5sums differ), but we don't know if Mksquashfs reported any duplicates.  If it *did* then we know the files when read into and compressed by Mksquashfs were identical (pointing to filesystem output problem), but if it didn't find any duplicates it points to a problem at read time...

Thanks

Phillip

Comment 52 Phillip Lougher 2010-07-30 23:21:51 UTC
Forgot to mention...  The point is both files should be duplicates and therefore they point to the *same* data.  Which makes the different md5sums impossible unless the inode contents (i.e. data block start, block-list) are corrupt, and this would be easy to verify if I had the image here....

Phillip

Comment 53 Bruno Wolff III 2010-07-31 01:16:02 UTC
Where I was going in comment 41 was that maybe there is a bug in livecd-creator where the ext file system doesn't get unmounted before mksquashfs is run under some circumstances.

Comment 54 Jasper O'neal Hartline 2010-07-31 03:08:27 UTC
In reply to Comment 53.. I thought about that also, I did test this to see if I could not only not mount it, but also get an invalid journal error from dumpe2fs

What I found was that it in fact couldn't be mounted. dmesg states recovery required on readonly filesystem
Then write access not available.

This is a possible cause for the inability to mount.

However, what it didn't show was the dumpe2fs invalid superblock journal magic error. So let me send you a patch for that on teh list Bruno to try to unmount normally, and if we fall back at least print to logging and verbosely that some directories were unmountable.

Comment 55 Phillip Lougher 2010-07-31 03:21:38 UTC
Jasper O'neal Hartline wrote:

===========
Looking at logs from the compose host, and from my testing locally I do notice
an error message which eminates from lines 137 and 154 of xattr.c from that CVS
checkout of squashfs in squashfs/squashfs-tools/xattr.c where it prints
llistxattr failed in read_attrs right after the image files are produced.
However not knowing enough about the xattr code here this is just a reference
in case it seems important
===========

I've looked into this, and it's a non issue.  It is a bug but nothing to do with anything here.

For the curious mksquashfs creates a dummy top-level directory for the cases where there are multiple sources specified on the command line (or the -keep-as-directory option is set).  This directory obviously doesn't exist, and therefore on being passed to llistxattr it was failing.

A fix is in CVS if anyone wants to pick it up.

Comment 56 Bruno Wolff III 2010-07-31 03:34:04 UTC
Thanks for the fix. I am not going to do a new build right now just for that. It's a bad time due to the pending alpha and my pending vacation. When I get back I'll grab it and probably anything else you have committed by then. If we do find a squashfs bug for the ext image issue, then I'll also probably grab it, since I'll need to do another build anyway.

Comment 57 Phillip Lougher 2010-07-31 03:37:17 UTC
Jasper O'neal Hartline wrote:

=========================
Here also are
the URLs for download of a broken image of squashfs.img and ext3fs.img of
security spin from the nightly-compose host around the 27th of July:
http://autopsy.liveprojects.info/external/extra/squashfs.img and
http://autopsy.liveprojects.info/external/extra/ext3fs.img    
=========================

The ext3fs.img and squashfs.img checkout as identical

root@logopolis:/stripe/redhat/bad# mount -t squashfs squashfs.img mnt -o loop
root@logopolis:/stripe/redhat/bad# md5sum ext3fs.img mnt/LiveOS/ext3fs.img
a840f7180545ba8ee60a9ecb9657ba1c  ext3fs.img
a840f7180545ba8ee60a9ecb9657ba1c  mnt/LiveOS/ext3fs.img

Comment 58 Jasper O'neal Hartline 2010-07-31 04:02:46 UTC
In my test referenced in Comment 54, the unmountable Ext4 image had a different md5sum:

[root@localhost tmp]# md5sum squash-mountedext4.img 
985297f03fc668103461df07c6a15a20  squash-mountedext4.img
[root@localhost tmp]# mount -t squashfs -oloop squash-mountedext4.img squash-mounted
[root@localhost tmp]# md5sum squash-mounted/ext3.img 
d1c24fcdc4998079307949a308061699  squash-mounted/ext3.img
[root@localhost tmp]# mount -t ext4 -oloop squash-mounted/ext3.img /tmp/ext4-mounted/
mount: wrong fs type, bad option, bad superblock on /dev/loop3,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

[root@localhost tmp]#


Furthermore the test I did where the can be mounted works fine, but also has a differing checksum:

[root@localhost tmp]# mount -t squashfs -oloop squash.img squash
[root@localhost tmp]# md5sum squash/ext3.img
f2d4592af3590070070eb0efbfb20a91  squash/ext3.img
[root@localhost tmp]# mount -t ext4 -oloop squash/ext3.img /tmp/ext3
[root@localhost tmp]#

Is this strange?

Comment 59 Jasper O'neal Hartline 2010-07-31 04:14:49 UTC
I was in a hurry to post my results, clipped off the md5sum there, here is what I found, on both the md5sum differs using
[root@localhost tmp]# rpm -qa squashfs-tools
squashfs-tools-4.1-0.2.20100727.fc13.i686
[root@localhost tmp]#

Look strange yet:
[root@localhost tmp]# file squash.img squash-mountedext4.img 
squash.img:             Squashfs filesystem, little endian, version 4.0, 59657153280 bytes, 55112 inodes, blocksize: 14 bytes, created: Fri Mar 20 21:48:00 1970
squash-mountedext4.img: Squashfs filesystem, little endian, version 4.0, 59657291776 bytes, 55115 inodes, blocksize: 14 bytes, created: Sat Mar 21 22:41:20 1970
[root@localhost tmp]# md5sum squash.img squash-mountedext4.img
25c37cbdf568c8f46f1f93d2eb2b4c1a  squash.img
985297f03fc668103461df07c6a15a20  squash-mountedext4.img
[root@localhost tmp]# mount -t squashfs -oloop squash.img squash
[root@localhost tmp]# mount -t squashfs -oloop squash-mountedext4.img squash-mounted
[root@localhost tmp]# file squash/ext3.img squash-mounted/ext3.img 
squash/ext3.img:         Linux rev 1.0 ext4 filesystem data (extents) (huge files)
squash-mounted/ext3.img: Linux rev 1.0 ext4 filesystem data (needs journal recovery) (extents) (huge files)
[root@localhost tmp]# md5sum squash/ext3.img squash-mounted/ext3.img
f2d4592af3590070070eb0efbfb20a91  squash/ext3.img
d1c24fcdc4998079307949a308061699  squash-mounted/ext3.img
[root@localhost tmp]#

Comment 60 Eric Sandeen 2010-07-31 04:22:45 UTC
(In reply to comment #51)
> Eric
> 
> Previously you wrote about creating a Squashfs file with both sparse and
> non-sparse versions of the ext3fs.img...

...

> It would be interesting to know what that output of Mksquashfs was, i.e.
> whether it found any duplicates.  That will help to indicate what's going wrong
> - we know  in the filesystem that the files are different (md5sums differ), but
> we don't know if Mksquashfs reported any duplicates.  If it *did* then we know
> the files when read into and compressed by Mksquashfs were identical (pointing
> to filesystem output problem), but if it didn't find any duplicates it points
> to a problem at read time...

Phillip, this is what it said when compressing the image containing sparse &
non-sparse:

# mksquashfs squashdir testquash.img
Parallel mksquashfs: Using 4 processors
Creating 4.0 filesystem on testquash.img, block size 131072.
[===================-] 65546/65546 100%
Exportable Squashfs 4.0 filesystem, data block size 131072
 compressed data, compressed metadata, compressed fragments
 duplicates are removed
Filesystem size 470635.78 Kbytes (459.61 Mbytes)
 5.61% of uncompressed filesystem size (8390032.58 Kbytes)
Inode table size 10887 bytes (10.63 Kbytes)
 4.15% of uncompressed inode table size (262490 bytes)
Directory table size 109 bytes (0.11 Kbytes)
 71.71% of uncompressed directory table size (152 bytes)
Number of duplicate files found 1
Number of inodes 6
Number of files 3
Number of fragments 0
Number of symbolic links  0
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 3
Number of ids (unique uids + gids) 1
Number of uids 1
 root (0)
Number of gids 1
 root (0)

So I guess it read matching files.

-Eric

Comment 61 Jasper O'neal Hartline 2010-07-31 04:26:32 UTC
Further more I am not sure Phillip if you have checked the md5sums of the wrong files or if I uploaded the wrong files, they differ in size significantly so I am not sure I uploaded the wrong files. 

Here is my test on a security ISO from the 27th:
[root@localhost images]# mount -t iso9660 security-i386-20100727.16.iso -oloop iso
[root@localhost images]# md5sum iso/LiveOS/squashfs.img 
b2d8200283828cd49dbf2b34318acfb0  iso/LiveOS/squashfs.img
[root@localhost images]# mount -t squashfs -oloop iso/LiveOS/squashfs.img squash
[root@localhost images]# md5sum squash/LiveOS/ext3fs.img 
fad58f2eb57e4e99becc62f7aee2530b  squash/LiveOS/ext3fs.img
[root@localhost images]# mount -t ext4 -oloop squash/LiveOS/ext3fs.img ext3
mount: wrong fs type, bad option, bad superblock on /dev/loop4,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

[root@localhost images]#


The squash file and ext3 files md5sum differ.. and also cannot be mounted.

Comment 62 Jasper O'neal Hartline 2010-07-31 04:32:15 UTC
(uiserver):u40872554:~/LIVE/external/extra > md5sum squashfs.img 
6bb831c9dcf67af2a0597946a32bd94b  squashfs.img
(uiserver):u40872554:~/LIVE/external/extra > cp ext3fs.img.bz2 ext3.img.bz2
(uiserver):u40872554:~/LIVE/external/extra > bzip2 -d ext3.img.bz2 
(uiserver):u40872554:~/LIVE/external/extra > md5sum ext3.img 
a840f7180545ba8ee60a9ecb9657ba1c  ext3.img
(uiserver):u40872554:~/LIVE/external/extra >

Comment 63 Phillip Lougher 2010-07-31 04:50:35 UTC
(In reply to comment #61)
> Further more I am not sure Phillip if you have checked the md5sums of the wrong
> files or if I uploaded the wrong files, they differ in size significantly so I
> am not sure I uploaded the wrong files. 
> 

Thanks a lot - nothing's to be gained by assuming I've made a mistake simply because my results differs from yours.  So far all my tests on Squashfs have proved negative, but I've still spent my time continuing to look into your problem.

Comment 64 Phillip Lougher 2010-07-31 05:05:22 UTC
(In reply to comment #62)
> (uiserver):u40872554:~/LIVE/external/extra > md5sum squashfs.img 
> 6bb831c9dcf67af2a0597946a32bd94b  squashfs.img
> (uiserver):u40872554:~/LIVE/external/extra > cp ext3fs.img.bz2 ext3.img.bz2
> (uiserver):u40872554:~/LIVE/external/extra > bzip2 -d ext3.img.bz2 
> (uiserver):u40872554:~/LIVE/external/extra > md5sum ext3.img 
> a840f7180545ba8ee60a9ecb9657ba1c  ext3.img
> (uiserver):u40872554:~/LIVE/external/extra >    

root@slackware:/mnt/redhat/bad# md5sum ext3fs.img squashfs.img 
a840f7180545ba8ee60a9ecb9657ba1c  ext3fs.img
6bb831c9dcf67af2a0597946a32bd94b  squashfs.img


root@slackware:/mnt/redhat/bad# mount -t squashfs squashfs.img mnt -o loop

The files within the squashfs.img have an selinux label

root@slackware:/mnt/redhat/bad# xattr mnt/LiveOS/ext3fs.img  
0: name security.selinux, size 17, value unconfined_u:object_r:livecd_tmp_t:s0, vsize 38

This couldn't have come from anywhere but from the Redhat build host.  It certainly didn't come from any of my systems, because they don't use selinux.

And yes, now on a Slackware virtual guest, the md5sum still checks out OK.

root@slackware:/mnt/redhat/bad# md5sum mnt/LiveOS/ext3fs.img 
a840f7180545ba8ee60a9ecb9657ba1c  mnt/LiveOS/ext3fs.img

Cheers

Comment 65 Jasper O'neal Hartline 2010-07-31 05:08:49 UTC
It's not actually a problem I can reproduce using livecd-tools which uses
squashfs-tools or ext4 filesystems. So far I see only the compose host with the
issue, and Bruno and my test results show that the image cannot be mounted,
when mksquashfs runs on a mounted filesystem which could be a fault of
livecd-creator and more specifically a lazy unmounting method we need so that
failures don't eat up loop devices, eventually causing livecd-creator to never
be able to run again.

However the testing I did shows the image cannot be mounted, but dumpe2fs does
not give me an error about the journal superblock magic, which the images from
the compose host do give me. 

I think there may be several bugs looming around this collective problem.


In reply to comment 64, that is exactly where they came from. 
This is the initial reason this bug report was created.

Comment 66 Phillip Lougher 2010-07-31 05:31:39 UTC
(In reply to comment #60)
> (In reply to comment #51)
s
> Creating 4.0 filesystem on testquash.img, block size 131072.
> [===================-] 65546/65546 100%
> Exportable Squashfs 4.0 filesystem, data block size 131072
>  compressed data, compressed metadata, compressed fragments
>  duplicates are removed
> Filesystem size 470635.78 Kbytes (459.61 Mbytes)
>  5.61% of uncompressed filesystem size (8390032.58 Kbytes)
> Inode table size 10887 bytes (10.63 Kbytes)
>  4.15% of uncompressed inode table size (262490 bytes)
> Directory table size 109 bytes (0.11 Kbytes)
>  71.71% of uncompressed directory table size (152 bytes)
> Number of duplicate files found 1
> Number of inodes 6
> Number of files 3
> Number of fragments 0
> Number of symbolic links  0
> Number of device nodes 0
> Number of fifo nodes 0
> Number of socket nodes 0
> Number of directories 3
> Number of ids (unique uids + gids) 1
> Number of uids 1
>  root (0)
> Number of gids 1
>  root (0)
> 
> So I guess it read matching files.

And yet on your system they give different md5sums within the Squashfs file system, and mksquashfs has definately only stored one copy of the data otherwise the image would be much larger.  Very strange.    

I'd like that image please.

Thanks

> 
> -Eric

Comment 67 Eric Sandeen 2010-07-31 05:36:51 UTC
Jasper, if the scripts are squashfsing a mounted backing file that's a bug, but I don't think it's this bug.  From the first investigations, we're seeing large swaths of 0s in the file; this isn't what we'd see with an inconsistent filesystem image (as would happen if we squashfs'd an image while it was mounted).

-Eric

Comment 68 Phillip Lougher 2010-07-31 05:38:55 UTC
One obvious comment.  I've only recently added xattr support to Squashfs, and that's been the only major change since last year.  I notice some of the squashfs.img images being generated have selinux labels on the files, and obviously before Squashfs xattr support they wouldn't have had them.

Has anyone tried to generate the squashfs images with xattr support disabled ( -no-xattrs option on mksquashfs), and does this make any difference?

Phillip

Comment 69 Phillip Lougher 2010-07-31 05:55:37 UTC
(In reply to comment #67)
> Jasper, if the scripts are squashfsing a mounted backing file that's a bug, but
> I don't think it's this bug.  From the first investigations, we're seeing large
> swaths of 0s in the file; this isn't what we'd see with an inconsistent
> filesystem image (as would happen if we squashfs'd an image while it was
> mounted).
> 

The way you'd get large runs of 0's in the files is either because

  1. Mksquashfs read them from the source file (obviously), or
  2. The inode block list is being corrupted, and a block is being incorrectly marked as sparse, which would obviously cause kernel squashfs and unsquashfs to zero fill the block.

Incidently 0xFC00001 may be a significant number.  0xFC00000 is on a 128K block boundary which might or might not be significant (i.e. one of the blocks in the block list could be bad and interpreted as sparse).

This is why it would be useful to have an image generated by RedHat which shows this problem.  So far none of my generated images have shown up anything wrong, and the one I've got from RedHat checks out OK on my system.

I'm not saying there isn't a bug somewhere in mksquashfs, but so far I've had nothing to debug.

Cheers


  
> -Eric

Comment 70 Eric Sandeen 2010-07-31 06:01:18 UTC
Phillip, I'll find a place to host that image, right now I've got nowhere w/ enough quota...

Comment 71 Eric Sandeen 2010-07-31 06:03:20 UTC
Phillip, regarding xattrs ... I'm seeing this even with

# mksquashfs -version
mksquashfs version 4.0 (2009/04/05)

which has no xattr support AFAICT ... really weird, I'm not sure what's going on here.

Comment 72 Jasper O'neal Hartline 2010-07-31 06:13:12 UTC
In reply to comment 69
This is a broken ISO:
http://alt.fedoraproject.org/pub/alt/nightly-composes/security/security-i386-20100727.16.iso

Not sure if it is worth it to download that at 680MB.. it's your call.
Booting this image says EXT4-FS(0) error loading journal

Comment 73 Phillip Lougher 2010-07-31 06:35:19 UTC
(In reply to comment #72)
> In reply to comment 69
> This is a broken ISO:
> http://alt.fedoraproject.org/pub/alt/nightly-composes/security/security-i386-20100727.16.iso
> 
> Not sure if it is worth it to download that at 680MB.. it's your call.
> Booting this image says EXT4-FS(0) error loading journal    

It's worth a try - as it's 7.32 AM in the morning here I don't think I'm going to do much more whilst it's downloading.

Comment 74 Eric Sandeen 2010-07-31 15:24:07 UTC
Phillip, I suppose Jasper's image should contain a corrupted squashfs image you can look at; if not ping me and I'll find a spot to put the squashfs image I've been referring to in the bug (the one with the large swath of 0s).  Just be sure Jasper's has a large zeroed range, I think, to be sure it's not the mounted-image-file problem.

Comment 75 Jasper O'neal Hartline 2010-07-31 15:32:09 UTC
Comment 3 confirms the compose host's images are the ones with the large block of 0 data, which are the only ones I've been linking to in my posts, so he should have the corrupt images.

Comment 76 Eric Sandeen 2010-07-31 16:30:19 UTC
Jasper, ok good deal, thanks.

Comment 77 Jasper O'neal Hartline 2010-08-02 00:39:29 UTC
Was anyone able to come up with any new information on this?

Comment 78 Phillip Lougher 2010-08-02 06:52:11 UTC
(In reply to comment #77)
> Was anyone able to come up with any new information on this?    

The problem images after analysis show there is a bug in Mksquashfs.
I think I have found the bug - it is in legacy code unchanged since
2002, why the bug has never shown up until now is a mystery...

I'll check a fix in and write an analysis later, it is rather late
here.  Due to the unreproducible nature of the bug, it has
taken a considerable amount of analytical effort to track this down,
in fact all weekend.

Comment 79 Bruno Wolff III 2010-08-02 07:35:03 UTC
Thanks!
I'll probably be able to do another squashfs-tools update Monday very late.

Comment 80 Eric Sandeen 2010-08-02 15:37:09 UTC
Phillip,

http://squashfs.cvs.sourceforge.net/viewvc/squashfs/squashfs/squashfs-tools/mksquashfs.c?view=patch&r1=1.194&r2=1.195

I guess?  I can run a test with that in place.

Comment 81 Phillip Lougher 2010-08-02 18:25:20 UTC
(In reply to comment #80)
> Phillip,
> 
> http://squashfs.cvs.sourceforge.net/viewvc/squashfs/squashfs/squashfs-tools/mksquashfs.c?view=patch&r1=1.194&r2=1.195
> 
> I guess?  I can run a test with that in place.    

No, that was just to fix the 'llistxattr failed' warning that
people were seeing.

I have just checked the file system corruption fix into CVS now.

Comment 82 Jasper O'neal Hartline 2010-08-03 00:43:32 UTC
Thanks Phillip.
We have been testing some spins using the changes from CVS.
It seems to have corrected the problem. 

I think I will give this one a few days and then close it up.
I appreciate your help and sandeen.

Comment 83 Jasper O'neal Hartline 2010-08-03 14:43:32 UTC
Bruno, can you rebuild squashfs-tools and provide it as an update.
We need it to close this bug.

Comment 84 satellitgo 2010-08-03 14:55:58 UTC
http://alt.fedoraproject.org/pub/alt/nightly-composes/xfce/xfce-i386-20100802.23.iso works. I think nightly-composes got new squashfs-tools last night

Comment 85 Bruno Wolff III 2010-08-03 15:23:13 UTC
I was too tired to safely do the package update last night. I have a light day here today and will get an update done today.

Comment 86 Fedora Update System 2010-08-03 22:40:24 UTC
squashfs-tools-4.1-0.3.20100803.fc14 has been submitted as an update for Fedora 14.
http://admin.fedoraproject.org/updates/squashfs-tools-4.1-0.3.20100803.fc14

Comment 87 Jens Petersen 2010-08-04 03:33:19 UTC
Nirik says this works for him on the nightly spin compose box.

Comment 88 Kevin Fenzi 2010-08-04 04:15:15 UTC
Yep. Seems to work here fine on the compose host. 

I still am puzzled as to why it was only really happening on our compose machine. 
Perhaps it's due to it being a xen guest and some strange memory handling there. 

In any case, this patch fixes things here. Job well done everyone in helping to track this down. 

Thanks for all the testing Jasper, and thanks for finding this old and hard to reproduce bug Phillip!

Comment 89 Jens Petersen 2010-08-04 05:59:54 UTC
I also tested on F14 alpha TC2 x86_64 on kvm guest and successfully
spun a F14 Desktop Live image which booted ok.  Thanks everyone!

Comment 90 Jasper O'neal Hartline 2010-08-05 06:51:09 UTC
*** Bug 615443 has been marked as a duplicate of this bug. ***

Comment 91 Fedora Update System 2010-08-14 20:19:33 UTC
squashfs-tools-4.0-5.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/squashfs-tools-4.0-5.fc13

Comment 92 Bruno Wolff III 2010-08-14 20:30:30 UTC
I'll also be backporting a fix to F12 once my commit access to that branch is approved.

Comment 93 Fedora Update System 2010-08-16 14:14:14 UTC
squashfs-tools-4.0-4.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/squashfs-tools-4.0-4.fc12

Comment 94 Fedora Update System 2010-08-19 01:06:11 UTC
squashfs-tools-4.1-0.3.20100803.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 95 Fedora Update System 2010-08-20 02:15:08 UTC
squashfs-tools-4.0-5.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 96 Fedora Update System 2010-10-11 19:22:37 UTC
squashfs-tools-4.0-4.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 97 Alan Pevec 2011-02-17 22:30:26 UTC
For the record, patch is:
--- squashfs-tools/mksquashfs.c.orig	2009-04-05 16:22:48.000000000 -0500
+++ squashfs-tools/mksquashfs.c	2010-08-14 14:07:28.000000000 -0500
@@ -938,7 +938,7 @@
 			(unsigned short *) (inode_table + inode_bytes), 1);
 		inode_bytes += SQUASHFS_COMPRESSED_SIZE(c_byte) + BLOCK_OFFSET;
 		total_inode_bytes += SQUASHFS_METADATA_SIZE + BLOCK_OFFSET;
-		memcpy(data_cache, data_cache + SQUASHFS_METADATA_SIZE,
+		memmove(data_cache, data_cache + SQUASHFS_METADATA_SIZE,
 			cache_bytes - SQUASHFS_METADATA_SIZE);
 		cache_bytes -= SQUASHFS_METADATA_SIZE;
 	}
@@ -1579,7 +1579,7 @@
 		directory_bytes += SQUASHFS_COMPRESSED_SIZE(c_byte) +
 			BLOCK_OFFSET;
 		total_directory_bytes += SQUASHFS_METADATA_SIZE + BLOCK_OFFSET;
-		memcpy(directory_data_cache, directory_data_cache +
+		memmove(directory_data_cache, directory_data_cache +
 			SQUASHFS_METADATA_SIZE, directory_cache_bytes -
 			SQUASHFS_METADATA_SIZE);
 		directory_cache_bytes -= SQUASHFS_METADATA_SIZE;

Comment 98 Alan Pevec 2011-02-17 22:31:09 UTC
> The problem images after analysis show there is a bug in Mksquashfs.
> I think I have found the bug - it is in legacy code unchanged since
> 2002, why the bug has never shown up until now is a mystery...

Try valgrind.

==24196== Source and destination overlap in memcpy(0x4dcbfb0, 0x4dcdfb0, 24688)
==24196==    at 0x4A06A3A: memcpy (mc_replace_strmem.c:497)
==24196==    by 0x40719B: get_inode (string3.h:52)
==24196==    by 0x407631: create_inode (mksquashfs.c:1319)
==24196==    by 0x407ED0: write_dir (mksquashfs.c:1593)
==24196==    by 0x40B081: dir_scan3 (mksquashfs.c:3590)
==24196==    by 0x40C7DB: dir_scan (mksquashfs.c:3327)
==24196==    by 0x40DC8D: main (mksquashfs.c:4754)
==24196==


Note You need to log in before you can comment on or make changes to this bug.