Description of problem: "Install to harddrive" from a LiveCD of rawhide for fedora 15 fails with message "Can't do live image installation unless running from a live image." Version-Release number of selected component (if applicable): livecd-tools-15.2-2.fc15.x86_64 How reproducible: every time Steps to Reproduce: 1.download and burn http://alt.fedoraproject.org/pub/alt/nightly-composes/desktop/desktop-x86_64-20110123.17.iso 2.boot and login to GNOME shell 3.Activities > Applications > Install to ... Actual results:"Can't do live image installation unless running from a live image." The same message appears when running /usr/bin/liveinst from a text shell on VT2; and the shell also reports a Segmentation violation from zenity. Expected results: install to harddrive Additional info:
This error is because there is no /dev/mapper/live-osimg-min which is created by the dracut script in /usr/share/dracut/modules.d/90dmsquash-live/dmsquash-live-root This line appears to be failing: echo "0 $( blockdev --getsz $BASE_LOOPDEV ) snapshot $BASE_LOOPDEV $OSMIN_LOOPDEV p 8" | dmsetup create --readonly live-osimg-min I've attached the output of losetup -a and /var/log/messages, the relevant part is: Jan 26 18:18:44 localhost kernel: [ 3.506154] device-mapper: table: 253:1: snapshot: Cannot get COW device Jan 26 18:18:44 localhost kernel: [ 3.506157] device-mapper: ioctl: error adding target to table I tried to manually setup the snapshot and got the same result, except that it said table: 253:3: instead of :1: This is using the nightly livecd from 20110123.17 The dracut differences between f14 and rawhide are fairly minimal where livecd is concerned so I think the problem lies with device-mapper or something deeper.
Created attachment 475479 [details] /var/log/messages from the livecd
Created attachment 475480 [details] losetup -a from the livecd
Cannot get COW device means that second paramater - $OSMIN_LOOPDEV - is wrong (points to nonexistent device) can you post: echo $BASE_LOOPDEV ; blkid -p $BASE_LOOPDEV echo $OSMIN_LOOPDEV ; blkid -p $OSMIN_LOOPDEV ?
I'm pretty sure BASE_LOOPDEV is /dev/loop3 and OSMIN_LOOPDEV is /dev/loop1 /dev/loop3: LABEL="_desktop-x86_64-" UUID="a0516eda-8620-4fb1-ab01-5f06b78686d4" VERSION="1.0" TYPE="ext4" USAGE="filesystem" And there is no output for /dev/loop1
*** Bug 673395 has been marked as a duplicate of this bug. ***
[liveuser@localhost ~]$ echo $BASE_LOOPDEV; blkid -p $BASE_LOOPDEV The low-level probing mode requires a device [liveuser@localhost ~]$ echo $OSMIN_LOOPDEV; blkid -p $OSMIN_LOOPDEV The low-level probing mode requires a device [liveuser@localhost ~]$ In my case, from a USB key running on the latest (20110126.16) image, returns nothing.
And that previous comment is from a 32-bit Rawhide 20110126.16, not a 64-bit one.
osmin.img in that livecd is corrupted in some format which is not recognized, it is unmoutable. (tried one from F14 and it works) not sure where to reassign it, trying LiveCD component.
Well then I guess I will try F14 and preupgrade to Rawhide as a workaround for now.
*** Bug 673430 has been marked as a duplicate of this bug. ***
This is possibly an artifact of being built on an F14 system with only a rawhide squashfs. Kevin is going to be looking at moving the build system to rawhide shortly. (Possibly a few days after he gets back from FUDCON.) osmin.img is compressed using xz which 2.6.38 kernels can mount. The liveOS uses a 2.6.38 kernel which is why the system runs at all (otherwise squashfs.img couldn't be mounted). I am not sure why the file wouldn't be mounted when liveinst is run. This may or may not be related to xz compression.
I was able to confirm that osmin.img is mountable from the live image. I didn't get to trying to do it in a way that liveinst should be able to use as that needed more time than I have right now.
Would running the F14 version of dracut break things?
I am not sure what is dracut doing, but if you are able to mount osimg.min, it also must contain valid snapshot COW device in file (it starts with SnAp signature). (If the file have no such signature, then shapshot creation fails with "Cannot get COW device" when accessing it trough loop as mentioned above.)
[root@games1 mnt1]# od -c osmin|more 0000000 S n A p 001 \0 \0 \0 001 \0 \0 \0 \b \0 \0 \0 0000020 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*** Bug 674220 has been marked as a duplicate of this bug. ***
FYI, I too have just encountered this bug. Needed rawhide to test very latest btrfs against failing coreutils test...
Me too. Hoped to reinstall completely busted Rawhide system :(
I spent some time this weekend looking at this, but I didn't understand enough about how osmin is used and the device mapper to be able to isolate what is happening. It would be nice if a dracut or dm expert to take a look at this. While live images don't seem to be release blockers, I think this would be a nice to have fixed for the alpha release.
I think the problem is the coding of the scripts. Apparently, the scripts somehow use variables instead of the paths to the files when the variables themselves don't point to them. I think a workaround would be: Mount the live image in /mnt, then set the variables in the script to equal the paths to the live image. Example patch for this would be (near the beginning of the script): +++ mkdir /mnt/{osmin,base} +++ mount -t btrfs /mnt/live/base.img /mnt/base -o loop #don't know for certain if it is base.img +++ mount -t btrfs /mnt/live/osmin.img /mnt/osmin -o loop +++ BASE_LOOPDEV=/mnt/loop0 +++ OSMIN_LOOPDEV=/dev/loop1 I will test this out when I boot the image and see what happens.
Sorry, here's the corrected proposed patch that just might work (when I test it; again near the beginning of the script): +++ mkdir /mnt/{osmin,base} +++ mount -t btrfs /mnt/live/base.img /mnt/base -o loop +++ mount -t btrfs /mnt/live/osmin.img /mnt/osmin -o loop +++ BASE_LOOPDEV=/dev/loop0 +++ OSMIN_LOOPDEV=/dev/loop1 I am currently downloading the image right now; will test this and see if it actually works. Unless, of course, mounting the loop devices automatically assigns the variables. Will test that too.
Those variables are setup in the /usr/share/dracut/modules.d/90dmsquash-live/dmsquash-live-root script. The problem is that the dmsetup snapshot isn't working on the provided images.
(In reply to comment #23) > The problem is that the dmsetup snapshot isn't working on the provided images. DM snapshot target works perfectly, the COW image is either broken or script mangles it somehow. When I tried it, loop was properly set up, variables were also set but the mapped COW image was not correct and snapshot target (correctly) rejected it.
Any ideas on how it is mangled? Does it look truncated or anything else relatively obvious. That might help us figure out where to look.
I did notice that there is a fixed size of 64 MiB for what looks to be the osmin image in fs.py. Is it possible that osmin has outgrown that limit? (It might be that I don't really understand the code.)
I found some recent builds had messed on osmin.img files. I am grabbing a more currently nightly desktop build to see if that is still happening.
I double check a recent Desktop nightly and the osmin.img file doesn't appear to be a proper squashfs image.
I just checked a live image I built this morning and osmin.img looks OK. I'll test it on a usb drive later so that I know I am testing one with a mountable image. There may be multiple problems. I am pretty sure I had a problem with one with a good osmin.img before, so it looks like there may be two issues.
I checked the live image I rebuilt and it fails to mount osmin, but I can manually mount osmin.img. So it does seem that there are two problems. One is that in some cases osmin.img is bad and in some cases there is a problem with the osmin even when osmin.img seems OK.
Leaving notes on proposed blockers as I won't be at the meeting tomorrow most likely: +1 blocker, hits the intersection of "The installer must boot (if appropriate) and run on all primary architectures from default live image, DVD, and boot.iso install media" and "In most cases, the installed system must boot to a functional graphical environment without user intervention (see Blocker_Bug_FAQ)" (implication of the criteria is clearly that live install should be possible at Alpha stage).
mentioned on the alpha blocker meeting if the corruption could be relate due to the change to xz?
I'd really like it if some DM expert could look at osmin when osmin.img is a valid squashfs image and note if it is corrupted or if there is some other problem. osmin does seem to start with the correct magic word, but that is about as far as I can check.
Brian's memory is that the problem predates the switch to xz. For the issue of osmin.img not being a valid squashfs image, please followup in the cloned bug 676904. Squashfs got updated to 4.2 before xz was being used for builds and perhaps we have found a bug there. (Though I haven't seen squashfs.img get corrupted, not the test cases I have.)
Per 2011-02-11 alpha blocker meeting: * AGREED: 672265 - Accepted as F15Alpha blocker, impacts *all* live image installs. (jlaska, 18:27:11) * Could use some extra eyes to determine what might be causing this issue (possibly introduction of xz compression)
I tested the latest F14 livecd-creator and the install to hard drive command got past where it had been stuck. (I didn't do an actual install since I didn't have a place to do so.) There were some patches right after that that were only committed to master. One that looks possible is the addition of the --uid option to dmsetup. When I started grabbing an F15 local repo I removed my rawhide repo (as with the mass rebuild almost every package was an update) and I am now waiting for my rsync to complete. Probably this will be done in around 12 hours. At that point I am going to do another test compose without the commit mentioned above. If that still fails, I will try build an F14 image on an F15 system an see if that also has the problem. (That might be the case if a kernel or dm related tool upgrade since F14 is causing an issue.) And I'll also check to see if gzip compression doesn't show the same problem.
(Side comment about speed of rebuild, In reply to comment #36) > When I started grabbing an F15 local repo I removed my rawhide repo (as with > the mass rebuild almost every package was an update) and I am now waiting for > my rsync to complete. Probably this will be done in around 12 hours. Taking 12 hours seems slow to me. When I composed an install DVD using pungi at 0500 UTC today (9pm PST Friday), my demand-driven download of 1939 packages from rawhide took less than one hour. I use cable modem (max. 10Mbit/s to 20Mbit/s) and connect directly to the .redhat. server to avoid version skew in the mirror system. In the .ks file I comment out all the @Languages, which reduces the .iso from 3.6 GB to 2.5 GB. My daily pungi run to compose a new .iso with a few dozen updated packages often takes less than 15 minutes.
I am in the process of getting a full copy of i386 F15 that started very late Thursday night. I have a T1 so I only get 1.5 Mb/s. Though the last part seems to be going a bit better than I expected. I am up to vegastrike-data. So it might end up being somewhat sooner before it finishes.
I checked that image again (desktop-x86_64-20110123.17.iso, latest doesn't work for me at all - seems GNOME is crashing during boot). The problem is that loop device (loop1 / osimage COW) is mounted read-only and kernel refuses to use create snapshot with read-only COW. (Using read-write loop of the same image works.) I am not sure if it is regression (with dmsetup --readonly it should work IMHO), but I'll check using some old version.
I tested not using the --uuid option and that didn't affect the problem. I am now building an F14 image on F15 and will be testing it out shortly.
I am almost sure it is a kernel bug, caused by 2.6.38 changes in block layer. (Block device read only flag handling.) I have already reproducer and the bisected commit. I'll post here more later.
Reported here https://lkml.org/lkml/2011/2/12/209 Unfortunately it cannot be easily reverted (it was part of more complex changes).
Thank you for tracking this down!
Could this also explain why sometimes osmin.img got corrupted? P.S. I'll echo Brian's thanks for figuring this out.
I also saw the loop device errors today that you reported. It was messing up my live image building. I ended up rebooting before doing each new build.
(In reply to comment #42) > Reported here https://lkml.org/lkml/2011/2/12/209 > Unfortunately it cannot be easily reverted (it was part of more complex > changes). Nice work gang! Any thoughts on how to proceed for F15 Alpha with this issue? Do we need input from the kernel folks for resolving this properly? Is there an interim workaround to try? Should we reassign this bug to the kernel?
It probably makes sense that the component be changed to kernel. The rest of us can still watch for progress. Milan probably has a better idea of whether having other Fedora kernel guys look at this would actually be helpful. There is ongoing discussion on lkml, but I am not seeing a quick resolution. I don't know how long we would want to hold up the alpha for this fix. We may be able to do a temporary work around and not set the cow device as readonly. I am not sure how well that would work. I think that part is in dracut.
We could slip this for alpha and move it to a beta blocker instead and make note of that in the release notes for alpha and point people out the live image is only "live" and they have to use the DVD image if they want to install alpha. I'm not sure how strict we need to be for the live image since we have a working install on the dvd image?
(In reply to comment #47) > It probably makes sense that the component be changed to kernel. The rest of us > can still watch for progress. Milan probably has a better idea of whether > having other Fedora kernel guys look at this would actually be helpful. I would prefer to add patch to Fedora kernel for now. > We may be able to do a temporary work around and not set the cow device as readonly. Better do not do that - the COW device is very small and if it is be read-write, it can very easily be invalidated (by some random access which updates atime or so).
(In reply to comment #49) > Better do not do that - the COW device is very small and if it is be > read-write, it can very easily be invalidated (by some random access which > updates atime or so). Can the specific problem of atime can be avoided by mounting with 'noatime'?
(In reply to comment #50) > Can the specific problem of atime can be avoided by mounting with 'noatime'? sure but that was just an example.
(In reply to comment #48) > I'm not sure how strict we need to be for the live image since we have a > working install on the dvd image? We are pretty strict about having a working live install experience in all releases. http://fedoraproject.org/wiki/Fedora_15_Alpha_Release_Criteria "The installer must boot (if appropriate) and run on all primary architectures from default live image, DVD, and boot.iso install media"
What I am proposing here is to build kernel with this patch https://lkml.org/lkml/2011/2/14/119 and then try to rebuild liveCD with it. It should be enough for this case. (But it is possible that patch is still not complete.)
Note that for the last couple of releases we haven't produced official live images until release. Before that people are supposed to use the nightly composes. Those aren't checked the same way as the offical install images, nor is the one from the release date kept. Given that I don't think we want to hold up the alpha for very long times based on problems with the live images.
It looks like some consensus on how to the attack the issue has formed in the discussion. I can't tell for sure, but it seems we may have a fix in something on the order of a few days.
"Note that for the last couple of releases we haven't produced official live images until release." That's not correct. We shipped live images with both Alpha and Beta of F14.
I may not be clearly remembering the situation. Possibly this was a subset of the spins? I thought there were dependency issues that was blocking at least games spin builds for at least one of the alpha or beta.
yeah, it is a subset, just the desktop spins - gnome, kde, xfce, lxde.
Yeah, it looks like there were releases for the 4 desktop live spins and for the rest people were pointed to the nightlies. I was mostly worrying about the nightly compose issues and didn't notice there were also official releases for some of them.
Still in TC2 Fedora-15-Alpha-x86_64-Live-Desktop.iso
Discussion on lkml continues. Some messages are copied to the dm-devel list, but most if not all of these are copied to lkml. There is discussion about possibly reverting the feature and delaying it to 2.6.39.
Before the discussion finishes, please add this patch to Fedora kernel, it should help in this case. http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-fix-opening-log-and-cow-devices-for-read-only-tables.patch
While the risk of the sub-bug was asked about, I do want to note that I think this bug has a significant risk of delaying the alpha. Getting a kernel person to look at and evaluate what milan has done so far in supplying patches and starting discussions on dm-devel and lkml would be really nice at this time.
It looks like a patch in Fedora's kernel might have been made to solve this issue. There hasn't been a new kernel build submitted with this change yet, but I'll keep an eye out for it and do test after it shows up. +# rhbz#672265 +Patch12442: revert-block-check-bdev-readonly.patch +
kernel-2.6.38-0.rc5.git1.1.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/kernel-2.6.38-0.rc5.git1.1.fc15
(In reply to comment #64) > +# rhbz#672265 > +Patch12442: revert-block-check-bdev-readonly.patch This is now upstream as commit e51900f7d38cbcfb481d84567fd92540e7e1d23a, so we're not going off on our own with this fix.
Thanks. I tested a live image using kernel-2.6.38-0.rc5.git1.1.fc15 and install to hard drive properly started. I only went far enough to make sure it got past the problem as I didn't really want to do a reinstall. So it looks like the fix does solve the problem we were having.
Downloading the nightly 20110217.00 image right now; will see what it does.
(In reply to comment #68) It does not work with the xfce nightly.
The nightlies won't work until kernel-2.6.38-0.rc5.git1.1.fc15 or later hits stable. For the RC that kernel will be forced in. You can enable updates-testing or use a local repo to test this early.
kernel-2.6.38-0.rc5.git1.1.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.
*** Bug 678767 has been marked as a duplicate of this bug. ***
Works for me with xfce-i386-20110219.iso nightly, thank you!
Also confirmed fix with anaconda-15.20.1-1.fc15
This is back in the 20110409.16 XFCE nightly (to date, the latest to compile on koji). If liveinst is run, there is an error: "Can't do live image installation unless running from a live image." If the desktop link is clicked, or anaconda --liveinst is ran, it quickly fails with an unhandled exception. anaconda-15.27-1 with kernel 2.6.38.2-9 If this was fixed at once time, it appears to be back. At least in the XFCE spin.
This is unlikely to be the same bug, even though the systems are similar. I think it would better to open a new bug rather than reopen this one. It should also be tentatively marked as a final blocker.
That should have been symptoms, not systems.
I have also seen this failure with f15 live desktop CD's It seems to occur if liveinst has failed for another reason and then liveinst or anaconda is tried a second time on the running system. The first attempt appears to have changed the file system to a non-live one.
The new reports of similar behavior are probably related to bug 694712, which is fixed in F15Beta.