Bug 676904 - osmin.img is sometimes not a valid squashfs on livecd images
Summary: osmin.img is sometimes not a valid squashfs on livecd images
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: livecd-tools
Version: rawhide
Hardware: All
OS: All
medium
high
Target Milestone: ---
Assignee: Bruno Wolff III
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On: 672265
Blocks: F15Alpha, F15AlphaBlocker
TreeView+ depends on / blocked
 
Reported: 2011-02-11 18:37 UTC by Bruno Wolff III
Modified: 2018-04-11 11:11 UTC (History)
31 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 672265
Environment:
Last Closed: 2011-02-18 18:24:15 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Bruno Wolff III 2011-02-11 18:37:30 UTC
+++ This bug was initially created as a clone of Bug #672265 +++

Description of problem: "Install to harddrive" from a LiveCD of rawhide for fedora 15 fails with message "Can't do live image installation unless running from a live image."



Version-Release number of selected component (if applicable):
livecd-tools-15.2-2.fc15.x86_64

How reproducible: every time


Steps to Reproduce:
1.download and burn http://alt.fedoraproject.org/pub/alt/nightly-composes/desktop/desktop-x86_64-20110123.17.iso
2.boot and login to GNOME shell
3.Activities > Applications > Install to ...
  
Actual results:"Can't do live image installation unless running from a live image."  The same message appears when running /usr/bin/liveinst from a text shell on VT2; and the shell also reports a Segmentation violation from zenity.



Expected results: install to harddrive


Additional info:

--- Additional comment from bcl on 2011-01-26 14:31:14 CST ---

This error is because there is no /dev/mapper/live-osimg-min which is created by the dracut script in /usr/share/dracut/modules.d/90dmsquash-live/dmsquash-live-root

This line appears to be failing:

echo "0 $( blockdev --getsz $BASE_LOOPDEV ) snapshot $BASE_LOOPDEV $OSMIN_LOOPDEV p 8" | dmsetup create --readonly live-osimg-min

I've attached the output of losetup -a and /var/log/messages, the relevant part is:

Jan 26 18:18:44 localhost kernel: [    3.506154] device-mapper: table: 253:1: snapshot: Cannot get COW device
Jan 26 18:18:44 localhost kernel: [    3.506157] device-mapper: ioctl: error adding target to table


I tried to manually setup the snapshot and got the same result, except that it said table: 253:3: instead of :1:

This is using the nightly livecd from 20110123.17

The dracut differences between f14 and rawhide are fairly minimal where livecd is concerned so I think the problem lies with device-mapper or something deeper.

--- Additional comment from bcl on 2011-01-26 14:31:58 CST ---

Created attachment 475479 [details]
/var/log/messages from the livecd

--- Additional comment from bcl on 2011-01-26 14:32:28 CST ---

Created attachment 475480 [details]
losetup -a from the livecd

--- Additional comment from mbroz on 2011-01-26 14:47:36 CST ---

Cannot get COW device
means that second paramater - $OSMIN_LOOPDEV - is wrong (points to nonexistent device)

can you post:
echo $BASE_LOOPDEV ; blkid -p $BASE_LOOPDEV
echo $OSMIN_LOOPDEV ; blkid -p $OSMIN_LOOPDEV
?

--- Additional comment from bcl on 2011-01-26 15:04:32 CST ---

I'm pretty sure BASE_LOOPDEV is /dev/loop3 and OSMIN_LOOPDEV is /dev/loop1


/dev/loop3: LABEL="_desktop-x86_64-" UUID="a0516eda-8620-4fb1-ab01-5f06b78686d4" VERSION="1.0" TYPE="ext4" USAGE="filesystem"

And there is no output for /dev/loop1

--- Additional comment from bcl on 2011-01-27 22:43:20 CST ---

*** Bug 673395 has been marked as a duplicate of this bug. ***

--- Additional comment from Kenny.Strawn on 2011-01-28 07:37:44 CST ---

[liveuser@localhost ~]$ echo $BASE_LOOPDEV; blkid -p $BASE_LOOPDEV

The low-level probing mode requires a device
[liveuser@localhost ~]$ echo $OSMIN_LOOPDEV; blkid -p $OSMIN_LOOPDEV

The low-level probing mode requires a device
[liveuser@localhost ~]$ 

In my case, from a USB key running on the latest (20110126.16) image, returns nothing.

--- Additional comment from Kenny.Strawn on 2011-01-28 07:49:55 CST ---

And that previous comment is from a 32-bit Rawhide 20110126.16, not a 64-bit one.

--- Additional comment from mbroz on 2011-01-28 13:08:13 CST ---

osmin.img in that livecd is corrupted in some format which is not recognized, it is unmoutable. (tried one from F14 and it works)

not sure where to reassign it, trying LiveCD component.

--- Additional comment from Kenny.Strawn on 2011-01-28 22:58:12 CST ---

Well then I guess I will try F14 and preupgrade to Rawhide as a workaround for now.

--- Additional comment from bcl on 2011-01-30 07:56:48 CST ---

*** Bug 673430 has been marked as a duplicate of this bug. ***

--- Additional comment from bruno on 2011-01-31 09:05:46 CST ---

This is possibly an artifact of being built on an F14 system with only a rawhide squashfs. Kevin is going to be looking at moving the build system to rawhide shortly. (Possibly a few days after he gets back from FUDCON.)
osmin.img is compressed using xz which 2.6.38 kernels can mount. The liveOS uses a 2.6.38 kernel which is why the system runs at all (otherwise squashfs.img couldn't be mounted). I am not sure why the file wouldn't be mounted when liveinst is run. This may or may not be related to xz compression.

--- Additional comment from bruno on 2011-01-31 09:53:10 CST ---

I was able to confirm that osmin.img is mountable from the live image. I didn't get to trying to do it in a way that liveinst should be able to use as that needed more time than I have right now.

--- Additional comment from bruno on 2011-01-31 09:54:40 CST ---

Would running the F14 version of dracut break things?

--- Additional comment from mbroz on 2011-01-31 10:17:48 CST ---

I am not sure what is dracut doing, but if you are able to mount osimg.min, it also must contain valid snapshot COW device in file (it starts with SnAp signature).

(If the file have no such signature, then shapshot creation fails with "Cannot get COW device" when accessing it trough loop as mentioned above.)

--- Additional comment from bruno on 2011-01-31 11:21:14 CST ---

[root@games1 mnt1]# od -c osmin|more
0000000   S   n   A   p 001  \0  \0  \0 001  \0  \0  \0  \b  \0  \0  \0
0000020  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0

--- Additional comment from clumens on 2011-02-01 11:56:14 CST ---

*** Bug 674220 has been marked as a duplicate of this bug. ***

--- Additional comment from meyering on 2011-02-03 08:20:13 CST ---

FYI, I too have just encountered this bug.
Needed rawhide to test very latest btrfs against failing coreutils test...

--- Additional comment from mcepl on 2011-02-03 12:13:03 CST ---

Me too. Hoped to reinstall completely busted Rawhide system :(

--- Additional comment from bruno on 2011-02-07 10:16:28 CST ---

I spent some time this weekend looking at this, but I didn't understand enough about how osmin is used and the device mapper to be able to isolate what is happening.
It would be nice if a dracut or dm expert to take a look at this.
While live images don't seem to be release blockers, I think this would be a nice to have fixed for the alpha release.

--- Additional comment from Kenny.Strawn on 2011-02-07 21:01:36 CST ---

I think the problem is the coding of the scripts. Apparently, the scripts somehow use variables instead of the paths to the files when the variables themselves don't point to them.

I think a workaround would be:

Mount the live image in /mnt, then set the variables in the script to equal the paths to the live image.

Example patch for this would be (near the beginning of the script):

+++ mkdir /mnt/{osmin,base}
+++ mount -t btrfs /mnt/live/base.img /mnt/base -o loop #don't know for certain if it is base.img
+++ mount -t btrfs /mnt/live/osmin.img /mnt/osmin -o loop
+++ BASE_LOOPDEV=/mnt/loop0
+++ OSMIN_LOOPDEV=/dev/loop1

I will test this out when I boot the image and see what happens.

--- Additional comment from Kenny.Strawn on 2011-02-07 21:07:01 CST ---

Sorry, here's the corrected proposed patch that just might work (when I test it; again near the beginning of the script):

+++ mkdir /mnt/{osmin,base}
+++ mount -t btrfs /mnt/live/base.img /mnt/base -o loop
+++ mount -t btrfs /mnt/live/osmin.img /mnt/osmin -o loop
+++ BASE_LOOPDEV=/dev/loop0
+++ OSMIN_LOOPDEV=/dev/loop1

I am currently downloading the image right now; will test this and see if it actually works. Unless, of course, mounting the loop devices automatically assigns the variables. Will test that too.

--- Additional comment from bcl on 2011-02-09 13:10:24 CST ---

Those variables are setup in the /usr/share/dracut/modules.d/90dmsquash-live/dmsquash-live-root script. The problem is that the dmsetup snapshot isn't working on the provided images.

--- Additional comment from mbroz on 2011-02-09 13:30:49 CST ---

(In reply to comment #23)
> The problem is that the dmsetup snapshot isn't working on the provided images.

DM snapshot target works perfectly, the COW image is either broken or script mangles it somehow.

When I tried it, loop was properly set up, variables were also set but the mapped COW image was not correct and snapshot target (correctly) rejected it.

--- Additional comment from bruno on 2011-02-09 14:05:44 CST ---

Any ideas on how it is mangled? Does it look truncated or anything else relatively obvious. That might help us figure out where to look.

--- Additional comment from bruno on 2011-02-09 19:58:50 CST ---

I did notice that there is a fixed size of 64 MiB for what looks to be the osmin image in fs.py. Is it possible that osmin has outgrown that limit? (It might be that I don't really understand the code.)

--- Additional comment from bruno on 2011-02-09 22:33:59 CST ---

I found some recent builds had messed on osmin.img files. I am grabbing a more currently nightly desktop build to see if that is still happening.

--- Additional comment from bruno on 2011-02-10 06:21:04 CST ---

I double check a recent Desktop nightly and the osmin.img file doesn't appear to be a proper squashfs image.

--- Additional comment from bruno on 2011-02-10 08:09:49 CST ---

I just checked a live image I built this morning and osmin.img looks OK. I'll test it on a usb drive later so that I know I am testing one with a mountable image.
There may be multiple problems. I am pretty sure I had a problem with one with a good osmin.img before, so it looks like there may be two issues.

--- Additional comment from bruno on 2011-02-10 11:06:59 CST ---

I checked the live image I rebuilt and it fails to mount osmin, but I can manually mount osmin.img.
So it does seem that there are two problems. One is that in some cases osmin.img is bad and in some cases there is a problem with the osmin even when osmin.img seems OK.

--- Additional comment from awilliam on 2011-02-11 02:35:24 CST ---

Leaving notes on proposed blockers as I won't be at the meeting tomorrow most likely:

+1 blocker, hits the intersection of "The installer must boot (if appropriate) and run on all primary architectures from default live image, DVD, and boot.iso install media" and "In most cases, the installed system must boot to a functional graphical environment without user intervention (see Blocker_Bug_FAQ)" (implication of the criteria is clearly that live install should be possible at Alpha stage).

--- Additional comment from johannbg on 2011-02-11 12:29:09 CST ---

mentioned on the alpha blocker meeting if the corruption could be relate due to the change to xz?

Comment 1 Bruno Wolff III 2011-02-11 18:54:08 UTC
Brian remembers the xz change happening after the problem with live installs. However there may be some bug in squashfs-tools-4.2 that is triggered in special circumstances. I haven't seen squashfs.img corrupted nor has the test case (https://fedoraproject.org/wiki/QA:Testcase_squashfs-tools_compression) shown any problems when I run it.
I'm going to see if I can get Kevin Fenzi to run the test case on the same machines that do the builds in case there is some arch related issue.

Comment 2 Kevin Fenzi 2011-02-12 20:42:12 UTC
I don't see any problems in the test script run on the spin01 compose machine: 

Using existing data directory.
Building squashfs image using gzip compression.
Parallel mksquashfs: Using 4 processors
Creating 4.0 filesystem on ./test-squashfs/sq.img, block size 131072.
[==============================================================================/] 232/232 100%
Exportable Squashfs 4.0 filesystem, gzip compressed, data block size 131072
	compressed data, compressed metadata, compressed fragments, compressed xattrs
	duplicates are removed
Filesystem size 12354.88 Kbytes (12.07 Mbytes)
	44.47% of uncompressed filesystem size (27783.54 Kbytes)
Inode table size 314 bytes (0.31 Kbytes)
	12.53% of uncompressed inode table size (2506 bytes)
Directory table size 246 bytes (0.24 Kbytes)
	37.27% of uncompressed directory table size (660 bytes)
Xattr table size 54 bytes (0.05 Kbytes)
	100.00% of uncompressed xattr table size (54 bytes)
Number of duplicate files found 2
Number of inodes 29
Number of files 28
Number of fragments 1
Number of symbolic links  0
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 1
Number of ids (unique uids + gids) 1
Number of uids 1
	root (0)
Number of gids 1
	root (0)
Testing unmounted extract using gzip compression.
Parallel unsquashfs: Using 4 processors
28 inodes (232 blocks) to write

[==============================================================================|] 232/232 100%
created 28 files
created 1 directories
created 0 symlinks
created 0 devices
created 0 fifos
Testing mounted image using gzip compression.
Building squashfs image using lzo compression.
Parallel mksquashfs: Using 4 processors
Creating 4.0 filesystem on ./test-squashfs/sq.img, block size 131072.
[==============================================================================-] 232/232 100%
Exportable Squashfs 4.0 filesystem, lzo compressed, data block size 131072
	compressed data, compressed metadata, compressed fragments, compressed xattrs
	duplicates are removed
Filesystem size 12354.99 Kbytes (12.07 Mbytes)
	44.47% of uncompressed filesystem size (27783.54 Kbytes)
Inode table size 387 bytes (0.38 Kbytes)
	15.44% of uncompressed inode table size (2506 bytes)
Directory table size 287 bytes (0.28 Kbytes)
	43.48% of uncompressed directory table size (660 bytes)
Xattr table size 54 bytes (0.05 Kbytes)
	100.00% of uncompressed xattr table size (54 bytes)
Number of duplicate files found 2
Number of inodes 29
Number of files 28
Number of fragments 1
Number of symbolic links  0
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 1
Number of ids (unique uids + gids) 1
Number of uids 1
	root (0)
Number of gids 1
	root (0)
Testing unmounted extract using lzo compression.
Parallel unsquashfs: Using 4 processors
28 inodes (232 blocks) to write

[==============================================================================|] 232/232 100%
created 28 files
created 1 directories
created 0 symlinks
created 0 devices
created 0 fifos
Testing mounted image using lzo compression.
Building squashfs image using lzma compression.
Parallel mksquashfs: Using 4 processors
Creating 4.0 filesystem on ./test-squashfs/sq.img, block size 131072.
[==============================================================================/] 232/232 100%
Exportable Squashfs 4.0 filesystem, lzma compressed, data block size 131072
	compressed data, compressed metadata, compressed fragments, compressed xattrs
	duplicates are removed
Filesystem size 12354.84 Kbytes (12.07 Mbytes)
	44.47% of uncompressed filesystem size (27783.54 Kbytes)
Inode table size 277 bytes (0.27 Kbytes)
	11.05% of uncompressed inode table size (2506 bytes)
Directory table size 237 bytes (0.23 Kbytes)
	35.91% of uncompressed directory table size (660 bytes)
Xattr table size 54 bytes (0.05 Kbytes)
	100.00% of uncompressed xattr table size (54 bytes)
Number of duplicate files found 2
Number of inodes 29
Number of files 28
Number of fragments 1
Number of symbolic links  0
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 1
Number of ids (unique uids + gids) 1
Number of uids 1
	root (0)
Number of gids 1
	root (0)
Testing unmounted extract using lzma compression.
Parallel unsquashfs: Using 4 processors
28 inodes (232 blocks) to write

[==============================================================================|] 232/232 100%
created 28 files
created 1 directories
created 0 symlinks
created 0 devices
created 0 fifos
Building squashfs image using xz compression.
Parallel mksquashfs: Using 4 processors
Creating 4.0 filesystem on ./test-squashfs/sq.img, block size 131072.
[==============================================================================/] 232/232 100%
Exportable Squashfs 4.0 filesystem, xz compressed, data block size 131072
	compressed data, compressed metadata, compressed fragments, compressed xattrs
	duplicates are removed
Filesystem size 12354.97 Kbytes (12.07 Mbytes)
	44.47% of uncompressed filesystem size (27783.54 Kbytes)
Inode table size 322 bytes (0.31 Kbytes)
	12.85% of uncompressed inode table size (2506 bytes)
Directory table size 282 bytes (0.28 Kbytes)
	42.73% of uncompressed directory table size (660 bytes)
Xattr table size 54 bytes (0.05 Kbytes)
	100.00% of uncompressed xattr table size (54 bytes)
Number of duplicate files found 2
Number of inodes 29
Number of files 28
Number of fragments 1
Number of symbolic links  0
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 1
Number of ids (unique uids + gids) 1
Number of uids 1
	root (0)
Number of gids 1
	root (0)
Testing unmounted extract using xz compression.
Parallel unsquashfs: Using 4 processors
28 inodes (232 blocks) to write

[==============================================================================|] 232/232 100%
created 28 files
created 1 directories
created 0 symlinks
created 0 devices
created 0 fifos
Testing mounted image using xz compression.

Comment 3 Bruno Wolff III 2011-02-14 03:03:50 UTC
I checked desktop-i386-20110212.21.iso from the nightly composes and osmin.img is a valid squashfs image. I'll keep spot checking these as they come out. It's possible that whatever was causing the bad squashfs images isn't present any more.

Comment 4 John Reiser 2011-02-14 04:59:17 UTC
The 64-bit version http://alt.fedoraproject.org/pub/alt/nightly-composes/desktop/desktop-x86_64-20110212.21.iso boots but goes into a loop erasing and re-displaying the automatic login dialog box.  Going to VT2, logging in as "liveuser" and requesting "ps" shows dozens of "pam: gdm-password" processes, each in Sleep state waiting on poll_s, and each with the same parent.  Trying "export DISPLAY=:0.0; liveinst" fails, complaining 1) libpk-gtk-module.so shared library not found; 2) AT-SPI: Accessability bus not found; 3) Unknown property: GtkDialog.has-separator.  So, earlier failures stand in the way of getting good information about liveinst.

Comment 5 Bruno Wolff III 2011-02-14 05:22:51 UTC
That's a different bug, that is being looked into separately. You can actually work around it, but it won't help as the main issue is a kernel bug that is being discussed on lkml. Ypu need to go to run level 3 and then startx.

This subbug is for the issue where osmin.img was getting corrupted in some builds. We don't know what caused that so I want to keep on the lookout for more occurrences.

Comment 6 James Laska 2011-02-16 15:30:23 UTC
@Bruno ... do you mind giving a quick update on this bug?  Is this still an issue, what is the exposure?

Comment 7 Bruno Wolff III 2011-02-16 15:49:36 UTC
I'll have a better idea in a couple of hours. I have both a nightly desktop image and an image I composed last night I want to look at today. I have a meeting in a few minutes, but will probably be able to take a time out to look at them after that. If both show osmin.img as a valid squashfs I think the risk is low.

Comment 8 Bruno Wolff III 2011-02-16 18:32:12 UTC
Both images had osmin.img as a valid squashfs image. So while I worry a little that we don't know the cause of the bad images, I don't think this is likely to block the release even if the glitch shows up again.

Comment 9 Robyn Bergeron 2011-02-18 17:50:52 UTC
Per 2011-02-18 Meeting:

#agreed 676904 - accepted as Alpha Blocker.  Appears to be resolved with latest F15 stable kernel

Comment 10 Bruno Wolff III 2011-02-18 18:24:15 UTC
Since the problem hasn't show up again, I suspect this was probably a kernel issue which previously was fixed or perhaps a variant manifestation of the cow problem.


Note You need to log in before you can comment on or make changes to this bug.