Description of problem: dumpe2fs says Journal superblock magic number invalid! Ext4 filesystem from compose host's livecds Version-Release number of selected component (if applicable): e2fsprogs-1.41.12-5.fc14.i686.rpm How reproducible: Spin a live disc on the RAWHIDE compose host Steps to Reproduce: 1. 2. 3. Actual results: ext3fs.img: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge files) ext3fs.img is produced where dumpe2fs says this about the filesystem: [root@localhost tmp]# dumpe2fs ext3fs.img dumpe2fs 1.41.10 (10-Feb-2009) Filesystem volume name: _desktop-i386-20 Last mounted on: /var/tmp/imgcreate-c2iaYH/install_root Filesystem UUID: b58bc2a5-c70b-4ee6-8c72-4dd8367fd941 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 196608 Block count: 786432 Reserved block count: 7863 Free blocks: 252857 Free inodes: 116471 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 191 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Flex block group size: 16 Filesystem created: Tue Jul 27 21:32:01 2010 Last mount time: Tue Jul 27 21:32:06 2010 Last write time: Tue Jul 27 21:52:04 2010 Mount count: 0 Maximum mount count: -1 Last checked: Tue Jul 27 21:52:04 2010 Check interval: 0 (<none>) Lifetime writes: 597 MB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 059ef793-ed97-4322-ae50-e24387831f55 Journal backup: inode blocks Journal superblock magic number invalid! [root@localhost tmp]# Expected results: Additional info:
Any chance this is a big-endian arch? Hm no i686. Can you do: # debugfs ext3fs.img debugfs: stat <8> (attach or paste that output) debugfs: dump <8> /some/path/to/file which will dump out the journal file, and either attach that file if not too big, or hexdump -C the first part of it and attach the first few lines? Curious to see what's there.
I'm a little rusty on livecd generation, if you can either point me to docs, or tell me what commands to run to reproduce, that'd be helpful too. Thanks, -Eric
I can provide info on the nightly compose machine... from the last image made there: debugfs: stat <8> Inode: 8 Type: regular Mode: 0600 Flags: 0x80000 Generation: 0 Version: 0x00000000:00000000 User: 0 Group: 0 Size: 67108864 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 131072 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x4c4fb2c2:00000000 -- Tue Jul 27 22:32:02 2010 atime: 0x4c4fb2c2:00000000 -- Tue Jul 27 22:32:02 2010 mtime: 0x4c4fb2c2:00000000 -- Tue Jul 27 22:32:02 2010 crtime: 0x4c4fb2c2:00000000 -- Tue Jul 27 22:32:02 2010 Size of extra inode fields: 28 EXTENTS: (0-16383): 360448-376831 The file is indeed pretty large, but looks like nulls. hexdump -C shows: 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 04000000
Eric, to create a livecd: $ yum -y install livecd-tools $ git clone git://git.fedorahosted.org/spin-kickstarts.git # livecd-creator -c fedora-livecd-desktop.ks -f desktop-20100728
soas-i386-20100727.16.iso burned to CD: Boots and does initial grub then: mount: wrong fs type, bad option,bad superblock on /dev/mapper/live-rw sleeping forever can't mount root filesystem = bug 619020 [8] It looks like bug 615443 got fixed as CD does boot to grub
OK thanks guys, will look into it. Why is it that every new release of e2fsprogs breaks livecd-tools (or is it vice-versa?) ;)
I did try both: e2fsprogs-1.41.12-3.fc14.x86_64.rpm and e2fsprogs-1.41.12-4.fc14.x86_64.rpm instead of the current e2fsprogs-1.41.12-5.fc14.x86_64.rpm And got the same results as far as I could tell. ;( This problem may have started on 2010-07-03 or 2010-07-17... it's hard to pinpoint. :(
Getting livecd build failures in the repo it seems: Retrieving http://download.fedora.devel.redhat.com/pub/fedora/linux/development/rawhide/x86_64/os/repodata/repomd.xml ...OK Retrieving http://download.fedora.devel.redhat.com/pub/fedora/linux/development/rawhide/x86_64/os/repodata/9cb7284f28f18da5200736822748af32795899f71aa54faa7eeb0232471c7087-primary.sqlite.bz2 ...OK Retrieving http://download.fedora.devel.redhat.com/pub/fedora/linux/development/rawhide/x86_64/os/repodata/f55403032212d1990822018c3401d1480b2e8c466ae74f31c8ec3aa2351983de-comps-rawhide.xml.gz ...OK /usr/lib/python2.6/site-packages/imgcreate/errors.py:45: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 return unicode(self.message) Error creating Live CD : Failed to build transaction : gnote-0.7.2-1.fc14.x86_64 requires libboost_system-mt.so.1.41.0()(64bit) totem-2.30.2-2.fc14.x86_64 requires libpython2.6.so.1.0()(64bit) gnome-dvb-daemon-0.1.20-1.fc14.x86_64 requires python(abi) = 2.6 notify-python-0.1.1-8.fc12.x86_64 requires python(abi) = 2.6 gnote-0.7.2-1.fc14.x86_64 requires libboost_filesystem-mt.so.1.41.0()(64bit) now what?
I'm going to try and see if I can reproduce on the F13 kickstarts, I'll let you know.
Yeah, rawhide is broken now since boost just landed. ;(
The machine it is being run on is x86_64 which is what Kevin mentioned. The target machine the ISO is built for is i686. I will attach this information you requested.
Here is the file and data: [root@localhost tmp]# debugfs ext3fs.img debugfs 1.41.10 (10-Feb-2009) debugfs: stat <8> Inode: 8 Type: regular Mode: 0600 Flags: 0x80000 Generation: 0 Version: 0x00000000:00000000 User: 0 Group: 0 Size: 67108864 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 131072 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x4c4fb2c2:00000000 -- Tue Jul 27 21:32:02 2010 atime: 0x4c4fb2c2:00000000 -- Tue Jul 27 21:32:02 2010 mtime: 0x4c4fb2c2:00000000 -- Tue Jul 27 21:32:02 2010 crtime: 0x4c4fb2c2:00000000 -- Tue Jul 27 21:32:02 2010 Size of extra inode fields: 28 EXTENTS: (0-16383): 360448-376831 debugfs: dump <8> /tmp/debugfs-dump-8.txt debugfs: quit [root@localhost tmp]# ls -l /tmp/debugfs-dump-8.txt -rw-r--r-- 1 root root 67108864 Jul 28 11:52 /tmp/debugfs-dump-8.txt [root@localhost tmp]# debugfs dump of <8>: http://autopsy.liveprojects.info/external/extra/debugfs-dump-8.txt
Jasper, ok, same thing - totally zeroed log (perhaps you should have zipped it, I bet it'd compress nicely ;) I added dumpe2fs to the e2fsck() function in the python-imgcreate code: def e2fsck(fs): logging.debug("Checking filesystem %s" % fs) rc = subprocess.call(["/sbin/e2fsck", "-f", "-y", fs]) if rc != 0: return rc rc = subprocess.call(["/sbin/dumpe2fs", "-h", fs]) return rc and ran against f13 since rawhide is busted. I don't see the problem here, yet, but then I'm having a hard time updating my rawhide system, as well. I do have the e2fsprogs version in question installed, though.
Created attachment 435140 [details] livecd creation & test session From the attachment you can see that the process did fsck & dumpe2fs w/o error, but the image inside squashfs is corrupt; I'm inclined to blame squashfs, esp. since I could not hit this when composing under a rhel6 kernel. I'll do some further investigation.
Interesting find. I was looking at the possibility of it being the SquashFS at the beginning, but staring at two screens of hexdump on the machine with a good and a bad image, I didn't know much more to make of it. Thanks for looking into it further.
Here is the Koji package changelog listing for reference purposes. http://koji.fedoraproject.org/koji/buildinfo?buildID=186681
I'm now inclined to blame squashfs userspace rather than the kernel, but testing it...
Grr scratch that, older squashfs-tools works fine too. And yet new kernel / new squashfs-tools / new e2fsprogs works just fine on -my- test box...
I was unable to reproduce the issue also, and I know sometimes bugs shouldn't be filed if the bug filer cannot reproduce, but so far I have only seen it reliably reproduced on the compose host. The spins of July 27 were a specific run tested to reproduce this issue. The reason the bug was filed is because there might be a bug roaming around that isn't letting it's self known. I wonder now if there is something specific to the nightly compose machine that is corrupted. I tried reproducing on Fedora 13 i686 with squashfs-tools and e2fsprogs compiled from RAWHIDE and couldn't reproduce the issue, I also tested with a completely RAWHIDE system which is Fedora RAWHIDE x86_64 on 64 bit hardware, could not reproduce. Perhaps someone should investigate the compose host directly.
Sorry, left out important information I assume Eric or others may have known already, Kevin says he may be able to get more information from the compose host, he may also be the one to talk to about investigating, I believe he can reproduce the issue reliably.
I've also only been able to repro on the compose host. And I retract my allegations against squashfs, that doesn't seem to be the problem.
I retract my retraction ;) [root@spin01 tmp]# md5sum squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img be9cb2dfe5d61aca6614759175eb7df3 squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img [root@spin01 tmp]# mksquashfs squashdir/ mysquashfs.img &>/dev/null [root@spin01 tmp]# mount -o loop mysquashfs.img mnt/ mount: warning: mnt/ seems to be mounted read-only. [root@spin01 tmp]# md5sum mnt/tmp-ZKZYsE/LiveOS/ext3fs.img 28ea62d79234e4e458d96179f12f3190 mnt/tmp-ZKZYsE/LiveOS/ext3fs.img so the squashed & unsquashed images have different md5sums ...! [root@spin01 tmp]# ls -lh squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img -rwxr-xr-x. 1 root root 4.0G Jul 30 02:13 squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img [root@spin01 tmp]# du -hc squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img 1.5G squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img 1.5G total ok so it's sparse ... let's make a non-sparse copy and see if it fares better: [root@spin01 tmp]# cp --sparse=never squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img squashdir/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img [root@spin01 tmp]# md5sum squashdir/tmp-ZKZYsE/LiveOS/ext3fs* be9cb2dfe5d61aca6614759175eb7df3 squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img be9cb2dfe5d61aca6614759175eb7df3 squashdir/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img ok same... [root@spin01 tmp]# mksquashfs squashdir/ mysquashfs2.img &>/dev/null [root@spin01 tmp]# mount -o loop mysquashfs2.img mnt/ mount: warning: mnt/ seems to be mounted read-only. [root@spin01 tmp]# md5sum mnt/tmp-ZKZYsE/LiveOS/ext3fs* a0ff1a8402e65cbafe1b7abfa0d595f3 mnt/tmp-ZKZYsE/LiveOS/ext3fs.img 28ea62d79234e4e458d96179f12f3190 mnt/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img different! uh... ok now I'm totally confused. the original image now has the correct md5sum, while the new nonsparse image has the prior bad md5sum? Anyway... looks like a squashfs bug to me. -Eric
Ok can we reassign this to the squashfs-tools maintainer?
I have reassigned it to squashfs-tools.
It could possibly be the kernel code too I suppose, need a bit more investigation, maybe some testing of older code, I guess.
unsquashfsing also gives us a corrupted file: [root@spin01 tmp]# sudo unsquashfs mysquashfs2.img Parallel unsquashfs: Using 4 processors 3 inodes (65546 blocks) to write [================================================================================================================-] 65546/65546 100% created 3 files created 3 directories created 0 symlinks created 0 devices created 0 fifos [root@spin01 tmp]# md5sum squashdir/tmp-ZKZYsE/LiveOS/ext3fs* squashfs-root/tmp-ZKZYsE/LiveOS/ext3fs* be9cb2dfe5d61aca6614759175eb7df3 squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img be9cb2dfe5d61aca6614759175eb7df3 squashdir/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img a0ff1a8402e65cbafe1b7abfa0d595f3 squashfs-root/tmp-ZKZYsE/LiveOS/ext3fs.img 28ea62d79234e4e458d96179f12f3190 squashfs-root/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img
There have been squashfs related changes going into the kernel during this time period. squashfs-tools appeared to work without problem for a month. So while there could be a bug in it triggered by another change, I'd be more inclined to think kernel. I did do a sync up to the latest upstream development just before the branch. It's possible if there was some incompatible change that was made to the kernel support (Lougher added xattr support for 2.6.35) that syncing up might help.
In reply to comment 27 can you provide the git commands to clone the latest development branch.
Is this only happening on x86_64? I can compress stuff and loop mount it or uncompress it and things look fine? That might provide a clue as to where to look for problems.
If you are talking upstream its: cvs -d:pserver:anonymous.sourceforge.net:/cvsroot/squashfs export -D 2010-07-27 squashfs I haven't done a git check out since the switch yet. Things were still cvs earlier in the week.
Eric Can you give me a link to that ext3fs.img that gives bad md5sums on Squashfs? Also the output from mksquashfs -version would be useful as that gives me the approximate date the code was checked into CVS. From the info it looks like a squashfs-tools bug because the md5sum should never differ between the original and the squashfs version, especially so for the sparse and non-sparse file in the Squashfs filesystem. Incidentally, the fact that the md5sum switches in the different squashfs filesystems (so that the original image now has the correct md5sum in the second filesystem) is significant. It points to a bug in the code that determines whether an inode is an "extended file inode" or not, and that code has changed in the last couple of months (the code changed to accommodate the fact that a file with xattrs is an extended file inode). Thanks Phillip
I also tried the f14 mksquashfs, unsquashfs and loop mount on an otherwise f13 x86_64 system and I got the same sha1sum for the original, the unsquashfs version, and the loop mount version. I tested on a 700Mb file.
I checked out versions on June 7th and July 27th. The latest in rawhide is from July 27th. I think that this is currently the same as the latest version in your cvs repo. Two patches are applied. One to use the Fedora standard gcc options and the other to use xz for lzma support.
Looking at logs from the compose host, and from my testing locally I do notice an error message which eminates from lines 137 and 154 of xattr.c from that CVS checkout of squashfs in squashfs/squashfs-tools/xattr.c where it prints llistxattr failed in read_attrs right after the image files are produced. However not knowing enough about the xattr code here this is just a reference in case it seems important. You can see it in the logs of the nighly-compose host on any of them that fail, I notice on my machines I do not see that message, and the images are fine, and can be mounted correctly. Here also are the URLs for download of a broken image of squashfs.img and ext3fs.img of security spin from the nightly-compose host around the 27th of July: http://autopsy.liveprojects.info/external/extra/squashfs.img and http://autopsy.liveprojects.info/external/extra/ext3fs.img
I see that warning at least some of the time, but am still get good images. I am not sure how to reproduce the problem that people are seeing. Are people still having the ext3 image mounted when they try squashing it?
In response to Comment 35 Not sure. Try creating a sparse Ext4 filesystem, loset it up, (copy some /etc/yum.repos.d to $mntpoint/etc/yum.repos.d/ and hardcode the $releasever and/or $basearch to a valid value, rawhide or otherwise, populate it with yum --installroot=/mnt/ext4 install kernel bash filesystem and run resize2fs -M on it. Then squash it with SquashFS tools 4.1 from the CVS checkout. Does it appear corrupt, or does mounting it as type squashfs and running dumpe2fs show an invalid journal magic for it?
In so far as this bug is the cause of 615443, please be aware that this needs to be resolved in some way or another - we need to be able to generate working live images for x86-64 and i686 - by Tuesday 2010-08-03, or the Alpha will slip. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
(In reply to comment #37) > In so far as this bug is the cause of 615443, please be aware that this needs > to be resolved in some way or another - we need to be able to generate working > live images for x86-64 and i686 - by Tuesday 2010-08-03, or the Alpha will > slip. Given that other hosts seem to work, maybe doing the compose elsewhere could be a stopgap measure, unless there is some reason that fedora livecds need to be self-hosted/composed... :)
Yes, I will see about getting a kvm or other based box somewhere today to test/try out.
Can someone gzip/bzip2 this ext3.img, otherwise it's going to take a long time to download on my link http://autopsy.liveprojects.info/external/extra/ext3fs.img
I think I was able to duplicate this (I get the same error message) by loop mounting a good ext3 image, updating it and then running mksquashfs on the backing ext3 image file. When I later loop mounted the ext3 image inside the squashfs image, I got: [root@bruno sq]# mount -o loop test.img /mnt/iso mount: wrong fs type, bad option, bad superblock on /dev/loop3, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so
(In reply to comment #31) > Eric > > Can you give me a link to that ext3fs.img that gives bad md5sums on Squashfs? http://alt.fedoraproject.org/pub/alt/nightly-composes/ext3fs.img.bz2 Not sure the image is the problem but maybe we can at least verify that. Thanks, -Eric
I dowgraded spin01 to: squashfs-tools-4.0-4.fc14.x86_64 and still see the issue: $ md5sum squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img be9cb2dfe5d61aca6614759175eb7df3 squashdir/tmp-ZKZYsE/LiveOS/ext3fs.img $ mksquashfs /tmp/squashdir/ mysquashfs.img $ sudo mount -o loop mysquashfs.img mnt/ $ md5sum mnt/tmp-ZKZYsE/LiveOS/ext3fs.img 45fcb40fdcf9ac6a2f94f677807c73d4 mnt/tmp-ZKZYsE/LiveOS/ext3fs.img
Cannot reproduce here (Ubuntu 9.10, x86_64) Ext3.img file, stored both sparsely and non-sparsely root@logopolis:/stripe/redhat# ls -hs LiveOS/* 1.5G LiveOS/ext3fs.img 4.1G LiveOS/ext3fs-nosparse.img They both have the expected md5sum root@logopolis:/stripe/redhat# md5sum LiveOS/* be9cb2dfe5d61aca6614759175eb7df3 LiveOS/ext3fs.img be9cb2dfe5d61aca6614759175eb7df3 LiveOS/ext3fs-nosparse.img Verify we're using the latest CVS version of Mksquashfs root@logopolis:/stripe/redhat# /mksquashfs -version| head -1 mksquashfs version 4.1-CVS (2010/07/19) Do the squashing... root@logopolis:/stripe/redhat# /mksquashfs LiveOS test.sqsh -keep-as-directory -no-progress Parallel mksquashfs: Using 4 processors Creating 4.0 filesystem on test.sqsh, block size 131072. llistxattr failed in read_attrs Exportable Squashfs 4.0 filesystem, gzip compressed, data block size 131072 compressed data, compressed metadata, compressed fragments, compressed xattrs duplicates are removed Filesystem size 470680.46 Kbytes (459.65 Mbytes) 5.61% of uncompressed filesystem size (8388864.48 Kbytes) Inode table size 64534 bytes (63.02 Kbytes) 24.60% of uncompressed inode table size (262386 bytes) Directory table size 74 bytes (0.07 Kbytes) 76.29% of uncompressed directory table size (97 bytes) Number of duplicate files found 1 Number of inodes 4 Number of files 2 Number of fragments 0 Number of symbolic links 0 Number of device nodes 0 Number of fifo nodes 0 Number of socket nodes 0 Number of directories 2 Number of ids (unique uids + gids) 1 Number of uids 1 root (0) Number of gids 1 root (0) Note Mksquashfs determines that the non-sparse file is a duplicate of the sparse file (Number of duplicate files found 1) root@logopolis:/stripe/redhat# mount -t squashfs test.sqsh mnt -o loop Check sparse handling is correct ... root@logopolis:/stripe/redhat# ls -hs mnt/LiveOS/* 1.5G mnt/LiveOS/ext3fs.img 4.0G mnt/LiveOS/ext3fs-nosparse.img Md5sums are correct too ... root@logopolis:/stripe/redhat# md5sum mnt/LiveOS/* be9cb2dfe5d61aca6614759175eb7df3 mnt/LiveOS/ext3fs.img be9cb2dfe5d61aca6614759175eb7df3 mnt/LiveOS/ext3fs-nosparse.img Double check by seeing what Unsquashfs makes of the file system root@logopolis:/stripe/redhat# /unsquashfs -no-progress test.sqsh Parallel unsquashfs: Using 4 processors 2 inodes (65536 blocks) to write created 2 files created 2 directories created 0 symlinks created 0 devices created 0 fifos root@logopolis:/stripe/redhat# ls -hs squashfs-root/LiveOS/* 1.5G squashfs-root/LiveOS/ext3fs.img 4.1G squashfs-root/LiveOS/ext3fs-nosparse.img root@logopolis:/stripe/redhat# md5sum squashfs-root/LiveOS/* be9cb2dfe5d61aca6614759175eb7df3 squashfs-root/LiveOS/ext3fs.img be9cb2dfe5d61aca6614759175eb7df3 squashfs-root/LiveOS/ext3fs-nosparse.img Everything works for Unsquashfs too...
(In reply to comment #41) > I think I was able to duplicate this (I get the same error message) by loop > mounting a good ext3 image, updating it and then running mksquashfs on the > backing ext3 image file. Bruno, that is actually very much expected, but it's not the bug we're seeing here I think. (a mounted backing file is not consistent unless you freeze the filesystem). The reproducer of squashfs'ing/unsquashfs'ing an unmounted image file is what's relevant here, I think.
In response to the request to bzip2 the ext3fs.img in Comment 40 by Phillip, this is the ext3fs.img bzipped, should be about 57 MB. http://autopsy.liveprojects.info/external/extra/ext3fs.img.bz2
I would like to remind everyone this is going to be Ext4 filesystem. The name is ext3fs.img but do not let that confuses you. This is dealing with Ext4. Not Ext3. Also with that, Kevin you might try spinning up a few discs or at least one, using --fstype ext3 in the spin kickstarts and see if those are corrupt, also as a side measure to get some data and/or even alleviate the immediate problem. However results from the KVM would also be desireable on the report here, care to give that a try so we can at least narrow it down to Ext4/SquashFS? (The compose hosts kickstart files use --fstype ext4 in them, I'd imagine)
Jasper, I don't think the contents of the file matter, so whether it's ext3, ext4, or leaked army documents probably doesn't change the outcome. :) FWIW sparseness doesn't seem to matter at all. An image from a dir with just my non-sparse ext3 image shows the same problem on the spin01 host. Kevin also tested on a kvm host w/ no problems.
The contents of the file seem to not matter at all. I am unable to duplicate this with the same kernel/arch/squashfs-tools on a machine here thats a kvm guest. :( This is really starting to look like corruption or problems on spin01 (possibly xen guest issues?) I'm going to try and get another compose machine setup and test from there.
FWIW The first instance of 0s in the unsquashed corrupt file are at an interesting offset, and continue to the end of the file: # cmp -bl squashfs-root/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img squashdir/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img | more 264241153 0 ^@ 42 " [root@spin01 tmp]# bc obase=16 264241153 FC00001 Looking at a hexdump of the corrupt unsquashed file, it's all 0s from that point on: 0fc00000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 100000000 that's a rather large swath of 0s ... Maybe this rings a bell for Phillip.
Eric Previously you wrote about creating a Squashfs file with both sparse and non-sparse versions of the ext3fs.img... ====================== [root@spin01 tmp]# mksquashfs squashdir/ mysquashfs2.img &>/dev/null [root@spin01 tmp]# mount -o loop mysquashfs2.img mnt/ mount: warning: mnt/ seems to be mounted read-only. [root@spin01 tmp]# md5sum mnt/tmp-ZKZYsE/LiveOS/ext3fs* a0ff1a8402e65cbafe1b7abfa0d595f3 mnt/tmp-ZKZYsE/LiveOS/ext3fs.img 28ea62d79234e4e458d96179f12f3190 mnt/tmp-ZKZYsE/LiveOS/ext3fs-nosparse.img ======================= It would be interesting to know what that output of Mksquashfs was, i.e. whether it found any duplicates. That will help to indicate what's going wrong - we know in the filesystem that the files are different (md5sums differ), but we don't know if Mksquashfs reported any duplicates. If it *did* then we know the files when read into and compressed by Mksquashfs were identical (pointing to filesystem output problem), but if it didn't find any duplicates it points to a problem at read time... Thanks Phillip
Forgot to mention... The point is both files should be duplicates and therefore they point to the *same* data. Which makes the different md5sums impossible unless the inode contents (i.e. data block start, block-list) are corrupt, and this would be easy to verify if I had the image here.... Phillip
Where I was going in comment 41 was that maybe there is a bug in livecd-creator where the ext file system doesn't get unmounted before mksquashfs is run under some circumstances.
In reply to Comment 53.. I thought about that also, I did test this to see if I could not only not mount it, but also get an invalid journal error from dumpe2fs What I found was that it in fact couldn't be mounted. dmesg states recovery required on readonly filesystem Then write access not available. This is a possible cause for the inability to mount. However, what it didn't show was the dumpe2fs invalid superblock journal magic error. So let me send you a patch for that on teh list Bruno to try to unmount normally, and if we fall back at least print to logging and verbosely that some directories were unmountable.
Jasper O'neal Hartline wrote: =========== Looking at logs from the compose host, and from my testing locally I do notice an error message which eminates from lines 137 and 154 of xattr.c from that CVS checkout of squashfs in squashfs/squashfs-tools/xattr.c where it prints llistxattr failed in read_attrs right after the image files are produced. However not knowing enough about the xattr code here this is just a reference in case it seems important =========== I've looked into this, and it's a non issue. It is a bug but nothing to do with anything here. For the curious mksquashfs creates a dummy top-level directory for the cases where there are multiple sources specified on the command line (or the -keep-as-directory option is set). This directory obviously doesn't exist, and therefore on being passed to llistxattr it was failing. A fix is in CVS if anyone wants to pick it up.
Thanks for the fix. I am not going to do a new build right now just for that. It's a bad time due to the pending alpha and my pending vacation. When I get back I'll grab it and probably anything else you have committed by then. If we do find a squashfs bug for the ext image issue, then I'll also probably grab it, since I'll need to do another build anyway.
Jasper O'neal Hartline wrote: ========================= Here also are the URLs for download of a broken image of squashfs.img and ext3fs.img of security spin from the nightly-compose host around the 27th of July: http://autopsy.liveprojects.info/external/extra/squashfs.img and http://autopsy.liveprojects.info/external/extra/ext3fs.img ========================= The ext3fs.img and squashfs.img checkout as identical root@logopolis:/stripe/redhat/bad# mount -t squashfs squashfs.img mnt -o loop root@logopolis:/stripe/redhat/bad# md5sum ext3fs.img mnt/LiveOS/ext3fs.img a840f7180545ba8ee60a9ecb9657ba1c ext3fs.img a840f7180545ba8ee60a9ecb9657ba1c mnt/LiveOS/ext3fs.img
In my test referenced in Comment 54, the unmountable Ext4 image had a different md5sum: [root@localhost tmp]# md5sum squash-mountedext4.img 985297f03fc668103461df07c6a15a20 squash-mountedext4.img [root@localhost tmp]# mount -t squashfs -oloop squash-mountedext4.img squash-mounted [root@localhost tmp]# md5sum squash-mounted/ext3.img d1c24fcdc4998079307949a308061699 squash-mounted/ext3.img [root@localhost tmp]# mount -t ext4 -oloop squash-mounted/ext3.img /tmp/ext4-mounted/ mount: wrong fs type, bad option, bad superblock on /dev/loop3, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so [root@localhost tmp]# Furthermore the test I did where the can be mounted works fine, but also has a differing checksum: [root@localhost tmp]# mount -t squashfs -oloop squash.img squash [root@localhost tmp]# md5sum squash/ext3.img f2d4592af3590070070eb0efbfb20a91 squash/ext3.img [root@localhost tmp]# mount -t ext4 -oloop squash/ext3.img /tmp/ext3 [root@localhost tmp]# Is this strange?
I was in a hurry to post my results, clipped off the md5sum there, here is what I found, on both the md5sum differs using [root@localhost tmp]# rpm -qa squashfs-tools squashfs-tools-4.1-0.2.20100727.fc13.i686 [root@localhost tmp]# Look strange yet: [root@localhost tmp]# file squash.img squash-mountedext4.img squash.img: Squashfs filesystem, little endian, version 4.0, 59657153280 bytes, 55112 inodes, blocksize: 14 bytes, created: Fri Mar 20 21:48:00 1970 squash-mountedext4.img: Squashfs filesystem, little endian, version 4.0, 59657291776 bytes, 55115 inodes, blocksize: 14 bytes, created: Sat Mar 21 22:41:20 1970 [root@localhost tmp]# md5sum squash.img squash-mountedext4.img 25c37cbdf568c8f46f1f93d2eb2b4c1a squash.img 985297f03fc668103461df07c6a15a20 squash-mountedext4.img [root@localhost tmp]# mount -t squashfs -oloop squash.img squash [root@localhost tmp]# mount -t squashfs -oloop squash-mountedext4.img squash-mounted [root@localhost tmp]# file squash/ext3.img squash-mounted/ext3.img squash/ext3.img: Linux rev 1.0 ext4 filesystem data (extents) (huge files) squash-mounted/ext3.img: Linux rev 1.0 ext4 filesystem data (needs journal recovery) (extents) (huge files) [root@localhost tmp]# md5sum squash/ext3.img squash-mounted/ext3.img f2d4592af3590070070eb0efbfb20a91 squash/ext3.img d1c24fcdc4998079307949a308061699 squash-mounted/ext3.img [root@localhost tmp]#
(In reply to comment #51) > Eric > > Previously you wrote about creating a Squashfs file with both sparse and > non-sparse versions of the ext3fs.img... ... > It would be interesting to know what that output of Mksquashfs was, i.e. > whether it found any duplicates. That will help to indicate what's going wrong > - we know in the filesystem that the files are different (md5sums differ), but > we don't know if Mksquashfs reported any duplicates. If it *did* then we know > the files when read into and compressed by Mksquashfs were identical (pointing > to filesystem output problem), but if it didn't find any duplicates it points > to a problem at read time... Phillip, this is what it said when compressing the image containing sparse & non-sparse: # mksquashfs squashdir testquash.img Parallel mksquashfs: Using 4 processors Creating 4.0 filesystem on testquash.img, block size 131072. [===================-] 65546/65546 100% Exportable Squashfs 4.0 filesystem, data block size 131072 compressed data, compressed metadata, compressed fragments duplicates are removed Filesystem size 470635.78 Kbytes (459.61 Mbytes) 5.61% of uncompressed filesystem size (8390032.58 Kbytes) Inode table size 10887 bytes (10.63 Kbytes) 4.15% of uncompressed inode table size (262490 bytes) Directory table size 109 bytes (0.11 Kbytes) 71.71% of uncompressed directory table size (152 bytes) Number of duplicate files found 1 Number of inodes 6 Number of files 3 Number of fragments 0 Number of symbolic links 0 Number of device nodes 0 Number of fifo nodes 0 Number of socket nodes 0 Number of directories 3 Number of ids (unique uids + gids) 1 Number of uids 1 root (0) Number of gids 1 root (0) So I guess it read matching files. -Eric
Further more I am not sure Phillip if you have checked the md5sums of the wrong files or if I uploaded the wrong files, they differ in size significantly so I am not sure I uploaded the wrong files. Here is my test on a security ISO from the 27th: [root@localhost images]# mount -t iso9660 security-i386-20100727.16.iso -oloop iso [root@localhost images]# md5sum iso/LiveOS/squashfs.img b2d8200283828cd49dbf2b34318acfb0 iso/LiveOS/squashfs.img [root@localhost images]# mount -t squashfs -oloop iso/LiveOS/squashfs.img squash [root@localhost images]# md5sum squash/LiveOS/ext3fs.img fad58f2eb57e4e99becc62f7aee2530b squash/LiveOS/ext3fs.img [root@localhost images]# mount -t ext4 -oloop squash/LiveOS/ext3fs.img ext3 mount: wrong fs type, bad option, bad superblock on /dev/loop4, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so [root@localhost images]# The squash file and ext3 files md5sum differ.. and also cannot be mounted.
(uiserver):u40872554:~/LIVE/external/extra > md5sum squashfs.img 6bb831c9dcf67af2a0597946a32bd94b squashfs.img (uiserver):u40872554:~/LIVE/external/extra > cp ext3fs.img.bz2 ext3.img.bz2 (uiserver):u40872554:~/LIVE/external/extra > bzip2 -d ext3.img.bz2 (uiserver):u40872554:~/LIVE/external/extra > md5sum ext3.img a840f7180545ba8ee60a9ecb9657ba1c ext3.img (uiserver):u40872554:~/LIVE/external/extra >
(In reply to comment #61) > Further more I am not sure Phillip if you have checked the md5sums of the wrong > files or if I uploaded the wrong files, they differ in size significantly so I > am not sure I uploaded the wrong files. > Thanks a lot - nothing's to be gained by assuming I've made a mistake simply because my results differs from yours. So far all my tests on Squashfs have proved negative, but I've still spent my time continuing to look into your problem.
(In reply to comment #62) > (uiserver):u40872554:~/LIVE/external/extra > md5sum squashfs.img > 6bb831c9dcf67af2a0597946a32bd94b squashfs.img > (uiserver):u40872554:~/LIVE/external/extra > cp ext3fs.img.bz2 ext3.img.bz2 > (uiserver):u40872554:~/LIVE/external/extra > bzip2 -d ext3.img.bz2 > (uiserver):u40872554:~/LIVE/external/extra > md5sum ext3.img > a840f7180545ba8ee60a9ecb9657ba1c ext3.img > (uiserver):u40872554:~/LIVE/external/extra > root@slackware:/mnt/redhat/bad# md5sum ext3fs.img squashfs.img a840f7180545ba8ee60a9ecb9657ba1c ext3fs.img 6bb831c9dcf67af2a0597946a32bd94b squashfs.img root@slackware:/mnt/redhat/bad# mount -t squashfs squashfs.img mnt -o loop The files within the squashfs.img have an selinux label root@slackware:/mnt/redhat/bad# xattr mnt/LiveOS/ext3fs.img 0: name security.selinux, size 17, value unconfined_u:object_r:livecd_tmp_t:s0, vsize 38 This couldn't have come from anywhere but from the Redhat build host. It certainly didn't come from any of my systems, because they don't use selinux. And yes, now on a Slackware virtual guest, the md5sum still checks out OK. root@slackware:/mnt/redhat/bad# md5sum mnt/LiveOS/ext3fs.img a840f7180545ba8ee60a9ecb9657ba1c mnt/LiveOS/ext3fs.img Cheers
It's not actually a problem I can reproduce using livecd-tools which uses squashfs-tools or ext4 filesystems. So far I see only the compose host with the issue, and Bruno and my test results show that the image cannot be mounted, when mksquashfs runs on a mounted filesystem which could be a fault of livecd-creator and more specifically a lazy unmounting method we need so that failures don't eat up loop devices, eventually causing livecd-creator to never be able to run again. However the testing I did shows the image cannot be mounted, but dumpe2fs does not give me an error about the journal superblock magic, which the images from the compose host do give me. I think there may be several bugs looming around this collective problem. In reply to comment 64, that is exactly where they came from. This is the initial reason this bug report was created.
(In reply to comment #60) > (In reply to comment #51) s > Creating 4.0 filesystem on testquash.img, block size 131072. > [===================-] 65546/65546 100% > Exportable Squashfs 4.0 filesystem, data block size 131072 > compressed data, compressed metadata, compressed fragments > duplicates are removed > Filesystem size 470635.78 Kbytes (459.61 Mbytes) > 5.61% of uncompressed filesystem size (8390032.58 Kbytes) > Inode table size 10887 bytes (10.63 Kbytes) > 4.15% of uncompressed inode table size (262490 bytes) > Directory table size 109 bytes (0.11 Kbytes) > 71.71% of uncompressed directory table size (152 bytes) > Number of duplicate files found 1 > Number of inodes 6 > Number of files 3 > Number of fragments 0 > Number of symbolic links 0 > Number of device nodes 0 > Number of fifo nodes 0 > Number of socket nodes 0 > Number of directories 3 > Number of ids (unique uids + gids) 1 > Number of uids 1 > root (0) > Number of gids 1 > root (0) > > So I guess it read matching files. And yet on your system they give different md5sums within the Squashfs file system, and mksquashfs has definately only stored one copy of the data otherwise the image would be much larger. Very strange. I'd like that image please. Thanks > > -Eric
Jasper, if the scripts are squashfsing a mounted backing file that's a bug, but I don't think it's this bug. From the first investigations, we're seeing large swaths of 0s in the file; this isn't what we'd see with an inconsistent filesystem image (as would happen if we squashfs'd an image while it was mounted). -Eric
One obvious comment. I've only recently added xattr support to Squashfs, and that's been the only major change since last year. I notice some of the squashfs.img images being generated have selinux labels on the files, and obviously before Squashfs xattr support they wouldn't have had them. Has anyone tried to generate the squashfs images with xattr support disabled ( -no-xattrs option on mksquashfs), and does this make any difference? Phillip
(In reply to comment #67) > Jasper, if the scripts are squashfsing a mounted backing file that's a bug, but > I don't think it's this bug. From the first investigations, we're seeing large > swaths of 0s in the file; this isn't what we'd see with an inconsistent > filesystem image (as would happen if we squashfs'd an image while it was > mounted). > The way you'd get large runs of 0's in the files is either because 1. Mksquashfs read them from the source file (obviously), or 2. The inode block list is being corrupted, and a block is being incorrectly marked as sparse, which would obviously cause kernel squashfs and unsquashfs to zero fill the block. Incidently 0xFC00001 may be a significant number. 0xFC00000 is on a 128K block boundary which might or might not be significant (i.e. one of the blocks in the block list could be bad and interpreted as sparse). This is why it would be useful to have an image generated by RedHat which shows this problem. So far none of my generated images have shown up anything wrong, and the one I've got from RedHat checks out OK on my system. I'm not saying there isn't a bug somewhere in mksquashfs, but so far I've had nothing to debug. Cheers > -Eric
Phillip, I'll find a place to host that image, right now I've got nowhere w/ enough quota...
Phillip, regarding xattrs ... I'm seeing this even with # mksquashfs -version mksquashfs version 4.0 (2009/04/05) which has no xattr support AFAICT ... really weird, I'm not sure what's going on here.
In reply to comment 69 This is a broken ISO: http://alt.fedoraproject.org/pub/alt/nightly-composes/security/security-i386-20100727.16.iso Not sure if it is worth it to download that at 680MB.. it's your call. Booting this image says EXT4-FS(0) error loading journal
(In reply to comment #72) > In reply to comment 69 > This is a broken ISO: > http://alt.fedoraproject.org/pub/alt/nightly-composes/security/security-i386-20100727.16.iso > > Not sure if it is worth it to download that at 680MB.. it's your call. > Booting this image says EXT4-FS(0) error loading journal It's worth a try - as it's 7.32 AM in the morning here I don't think I'm going to do much more whilst it's downloading.
Phillip, I suppose Jasper's image should contain a corrupted squashfs image you can look at; if not ping me and I'll find a spot to put the squashfs image I've been referring to in the bug (the one with the large swath of 0s). Just be sure Jasper's has a large zeroed range, I think, to be sure it's not the mounted-image-file problem.
Comment 3 confirms the compose host's images are the ones with the large block of 0 data, which are the only ones I've been linking to in my posts, so he should have the corrupt images.
Jasper, ok good deal, thanks.
Was anyone able to come up with any new information on this?
(In reply to comment #77) > Was anyone able to come up with any new information on this? The problem images after analysis show there is a bug in Mksquashfs. I think I have found the bug - it is in legacy code unchanged since 2002, why the bug has never shown up until now is a mystery... I'll check a fix in and write an analysis later, it is rather late here. Due to the unreproducible nature of the bug, it has taken a considerable amount of analytical effort to track this down, in fact all weekend.
Thanks! I'll probably be able to do another squashfs-tools update Monday very late.
Phillip, http://squashfs.cvs.sourceforge.net/viewvc/squashfs/squashfs/squashfs-tools/mksquashfs.c?view=patch&r1=1.194&r2=1.195 I guess? I can run a test with that in place.
(In reply to comment #80) > Phillip, > > http://squashfs.cvs.sourceforge.net/viewvc/squashfs/squashfs/squashfs-tools/mksquashfs.c?view=patch&r1=1.194&r2=1.195 > > I guess? I can run a test with that in place. No, that was just to fix the 'llistxattr failed' warning that people were seeing. I have just checked the file system corruption fix into CVS now.
Thanks Phillip. We have been testing some spins using the changes from CVS. It seems to have corrected the problem. I think I will give this one a few days and then close it up. I appreciate your help and sandeen.
Bruno, can you rebuild squashfs-tools and provide it as an update. We need it to close this bug.
http://alt.fedoraproject.org/pub/alt/nightly-composes/xfce/xfce-i386-20100802.23.iso works. I think nightly-composes got new squashfs-tools last night
I was too tired to safely do the package update last night. I have a light day here today and will get an update done today.
squashfs-tools-4.1-0.3.20100803.fc14 has been submitted as an update for Fedora 14. http://admin.fedoraproject.org/updates/squashfs-tools-4.1-0.3.20100803.fc14
Nirik says this works for him on the nightly spin compose box.
Yep. Seems to work here fine on the compose host. I still am puzzled as to why it was only really happening on our compose machine. Perhaps it's due to it being a xen guest and some strange memory handling there. In any case, this patch fixes things here. Job well done everyone in helping to track this down. Thanks for all the testing Jasper, and thanks for finding this old and hard to reproduce bug Phillip!
I also tested on F14 alpha TC2 x86_64 on kvm guest and successfully spun a F14 Desktop Live image which booted ok. Thanks everyone!
*** Bug 615443 has been marked as a duplicate of this bug. ***
squashfs-tools-4.0-5.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/squashfs-tools-4.0-5.fc13
I'll also be backporting a fix to F12 once my commit access to that branch is approved.
squashfs-tools-4.0-4.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/squashfs-tools-4.0-4.fc12
squashfs-tools-4.1-0.3.20100803.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report.
squashfs-tools-4.0-5.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report.
squashfs-tools-4.0-4.fc12 has been pushed to the Fedora 12 stable repository. If problems still persist, please make note of it in this bug report.
For the record, patch is: --- squashfs-tools/mksquashfs.c.orig 2009-04-05 16:22:48.000000000 -0500 +++ squashfs-tools/mksquashfs.c 2010-08-14 14:07:28.000000000 -0500 @@ -938,7 +938,7 @@ (unsigned short *) (inode_table + inode_bytes), 1); inode_bytes += SQUASHFS_COMPRESSED_SIZE(c_byte) + BLOCK_OFFSET; total_inode_bytes += SQUASHFS_METADATA_SIZE + BLOCK_OFFSET; - memcpy(data_cache, data_cache + SQUASHFS_METADATA_SIZE, + memmove(data_cache, data_cache + SQUASHFS_METADATA_SIZE, cache_bytes - SQUASHFS_METADATA_SIZE); cache_bytes -= SQUASHFS_METADATA_SIZE; } @@ -1579,7 +1579,7 @@ directory_bytes += SQUASHFS_COMPRESSED_SIZE(c_byte) + BLOCK_OFFSET; total_directory_bytes += SQUASHFS_METADATA_SIZE + BLOCK_OFFSET; - memcpy(directory_data_cache, directory_data_cache + + memmove(directory_data_cache, directory_data_cache + SQUASHFS_METADATA_SIZE, directory_cache_bytes - SQUASHFS_METADATA_SIZE); directory_cache_bytes -= SQUASHFS_METADATA_SIZE;
> The problem images after analysis show there is a bug in Mksquashfs. > I think I have found the bug - it is in legacy code unchanged since > 2002, why the bug has never shown up until now is a mystery... Try valgrind. ==24196== Source and destination overlap in memcpy(0x4dcbfb0, 0x4dcdfb0, 24688) ==24196== at 0x4A06A3A: memcpy (mc_replace_strmem.c:497) ==24196== by 0x40719B: get_inode (string3.h:52) ==24196== by 0x407631: create_inode (mksquashfs.c:1319) ==24196== by 0x407ED0: write_dir (mksquashfs.c:1593) ==24196== by 0x40B081: dir_scan3 (mksquashfs.c:3590) ==24196== by 0x40C7DB: dir_scan (mksquashfs.c:3327) ==24196== by 0x40DC8D: main (mksquashfs.c:4754) ==24196==