In the installer DVD in dracut, we randomly see: mount -n -t squashfs -o ro /dev/loop0 /run/initramfs/squashfs mount: unknown filesystem type 'squashfs' /proc/modules is empty. This works sometimes (and the squashfs kernel module is loaded) and sometimes not. Either mount or modprobe or kernel bug. Please advise how to debug.
The impact of this is that it causes F16 Beta TC1 install (not just DVD, but boot.iso too, apparently) to fail to boot sometimes. I suspect this might be a race, as some people seem to see it always, some see it sometimes, and some haven't seen it at all yet? Proposing as Beta blocker - dupe was proposed.
*** Bug 734830 has been marked as a duplicate of this bug. ***
All DVD and netinst images from here affected http://alt.fedoraproject.org/pub/alt/stage/16-Beta.TC1/Fedora/ But looks like images from here http://alt.fedoraproject.orgpub/alt/stage/16-Alpha.RC5/Fedora/ seems to be fine.
I probably see the same error on s390x when dracut tries to mount the squashfs.img from initrd.img. Setting the number of CPUs to 1 (from 4) in the VM made this error to disappear.
Clyde, Jurgen - can you clarify whether you are seeing this at *every* boot, or just occasionally? Some other reporters (Andre, Hongqing) say they've seen this but only occasionally, and most boots work fine. If you're able reliably to reproduce it, can you please do this: Could you add "rd.debug rd.live.debug" to the boot line and attach /tmp/liveroot.*.out? (if that doesn't produce anything, try rd.live.debug by itself.) thanks! (for the record, I haven't seen this yet myself, and I've been testing in VMs on my desktop, which is quite powerful and itself is running F16. I'll try with my laptop, which is less speedy and is running F15, and see if I can hit it.)
Created attachment 521088 [details] /run/initramfs/init.log Bug randomly appears here on VM with 1 i686 and x86_64 CPU's. When squashfs module missing in /proc/modules it can be loaded manually with modprobe or mount -t squashfs from dracut# prompt.
(In reply to comment #4) > I probably see the same error on s390x when dracut tries to mount the > squashfs.img from initrd.img. Setting the number of CPUs to 1 (from 4) in the > VM made this error to disappear. can you reproduce this reliably? that might be worth to test for others, too
(In reply to comment #7) > (In reply to comment #4) > > I probably see the same error on s390x when dracut tries to mount the > > squashfs.img from initrd.img. Setting the number of CPUs to 1 (from 4) in the > > VM made this error to disappear. > > can you reproduce this reliably? that might be worth to test for others, too When I tried the installs yesterday (or the day before), the mount failed in 4 out of 5 cases in VM with 4 CPUs. Then I started to think about workarounds. kernel-3.0.0-1.fc16.s390x util-linux-2.19.1-2.fc16.s390x dracut-013-7.fc16.noarch
so grabbing http://download.fedora.redhat.com/pub/fedora/linux/development/16/x86_64/os/images/pxeboot/initrd.img and running lsinitrd on it (after doing s/xz/XZ/ on line 30 of lsinitrd) shows this.. ... lrwxrwxrwx 1 root root 19 Aug 31 12:09 lib64 -> run/initramfs/lib64 lrwxrwxrwx 1 root root 18 Aug 31 12:09 sbin -> run/initramfs/sbin lrwxrwxrwx 1 root root 17 Aug 31 12:09 tmp -> run/initramfs/tmp xzcat: initrd.img: Compressed data is corrupt ======================================================================== What exactly is going on with the creation of the initrd ? It doesn't even get as far as unpacking the kernel modules. Could this be the root of this issue ? That the kernel is getting just as confused when trying to unpack this ?
(In reply to comment #9) > so grabbing > http://download.fedora.redhat.com/pub/fedora/linux/development/16/x86_64/os/images/pxeboot/initrd.img > and running lsinitrd on it (after doing s/xz/XZ/ on line 30 of lsinitrd) shows > this.. > > ... > lrwxrwxrwx 1 root root 19 Aug 31 12:09 lib64 -> > run/initramfs/lib64 > lrwxrwxrwx 1 root root 18 Aug 31 12:09 sbin -> > run/initramfs/sbin > lrwxrwxrwx 1 root root 17 Aug 31 12:09 tmp -> run/initramfs/tmp > xzcat: initrd.img: Compressed data is corrupt > ======================================================================== > > > What exactly is going on with the creation of the initrd ? It doesn't even get > as far as unpacking the kernel modules. Could this be the root of this issue ? > That the kernel is getting just as confused when trying to unpack this ? I think this is a an image created by concatenating several xz compressed cpio images. The kernel handles those just fine by unpacking each of them over the last one.
> I think this is a an image created by concatenating several xz compressed cpio > images. The kernel handles those just fine by unpacking each of them over the > last one. could you add a script to dracut to unpack them so we can investigate the contents for debugging ?
Created attachment 521098 [details] /tmp/liveroot.138.out
(In reply to comment #12) > > I think this is a an image created by concatenating several xz compressed cpio > > images. The kernel handles those just fine by unpacking each of them over the > > last one. > > could you add a script to dracut to unpack them so we can investigate the > contents for debugging ? Hmm, some anaconda tool created it... dracut only created the first part, which some anaconda tool tool and prepended to the anaconda installer image.
might be reproducible on an installed system, where this bug sometimes occurs, by running in a loop: while mount -t squashfs -o loop squashfs.img /mnt;do umount /mnt modprobe -r squashfs done
(In reply to comment #15) > might be reproducible on an installed system, where this bug sometimes occurs, > by running in a loop: > > while mount -t squashfs -o loop squashfs.img /mnt;do > umount /mnt > modprobe -r squashfs > done If we're going to go that route, shouldn't we do it exactly how dracut is? Which would be more like: losetup -d /dev/loop0 losetup -r /dev/loop0 /squashfs.img mkdir -m 0755 -p /run/initramfs/squashfs mount -n -t squashfs -o ro /dev/loop0 /run/initramfs/squashfs umount /mnt modprobe -r squashfs
(In reply to comment #5) > Clyde, Jurgen - can you clarify whether you are seeing this at *every* boot, or > just occasionally? Some other reporters (Andre, Hongqing) say they've seen this > but only occasionally, and most boots work fine. > > If you're able reliably to reproduce it, can you please do this: > > Could you add "rd.debug rd.live.debug" to the boot line and attach > /tmp/liveroot.*.out? (if that doesn't produce anything, try rd.live.debug by > itself.) > on cold iron, 11 tries, 1 worked. On the failures, kernel panic occurs and so I can't get anything. On the one success, not really knowing where to look, I did find an init.log, but it doesn't seem to show me anything except dracut debug output. Attaching shortly. Also, tried to get some digital shots of the screen at failure time. If any are any good will attach shortly.
Created attachment 521112 [details] init log from successful boot of FC16betaTC1??
Sorry, screen shots no good. How do I get the /tmp/liveroot.*.out file from a failed VM attempt?
we don't need pictures, we know what the error looks like (andre attached a screenshot in his bug). I suppose you might be able to get the log out using virsh?
(In reply to comment #20) > we don't need pictures, we know what the error looks like (andre attached a > screenshot in his bug). Yeah, but with rd.debug there is a lot more interesting output that is available prior to dracut giving up and that was what I was trying to take pictures of. Oh well.
ahh, okay. one idea would be to take *video* of it, then you can isolate the interesting frames from the video? or you can try using netconsole.
(In reply to comment #16) > (In reply to comment #15) > > might be reproducible on an installed system, where this bug sometimes occurs, > > by running in a loop: > > > > while mount -t squashfs -o loop squashfs.img /mnt;do > > umount /mnt > > modprobe -r squashfs > > done > > If we're going to go that route, shouldn't we do it exactly how dracut is? > Which would be more like: > > losetup -d /dev/loop0 > losetup -r /dev/loop0 /squashfs.img > mkdir -m 0755 -p /run/initramfs/squashfs > mount -n -t squashfs -o ro /dev/loop0 /run/initramfs/squashfs > umount /mnt > modprobe -r squashfs OK, I ran this script (ignore the bash ugliness) after bootup: #!/bin/bash run () { $@ if [ $? != 0 ] then exit 1 fi } while true ; do losetup -d /dev/loop0 losetup -r /dev/loop0 /tmp/squashfs.img run "mount -n -t squashfs -o ro /dev/loop0 /mnt" run "umount /mnt" run "modprobe -r squashfs" done on a fully updated KVM guest with 2 CPUs, but using 3.1.0-rc3.git0 since that is what TC5 seems to be using. It did 68430 loops in just under an hour with no mount errors. There were a bunch of EBUSY errors for the losetup steps, but I didn't make them fatal in the script. I'm guessing it has something to do with the initramfs environment, so booting (DVD or PXE) repeatedly is probably going to be better for a recreate. I'd be happy to be proven an idiot and have an easy recreate at runtime.
(In reply to comment #23) > on a fully updated KVM guest with 2 CPUs, but using 3.1.0-rc3.git0 since that > is what TC5 seems to be using. It did 68430 loops in just under an hour with That should have been "...what TC1 seems to be using."
I seem to be able to recreate this (I think) fairly regularly with the TC1 netinstalliso by chosing 'Rescue', hitting tab and removing the quiet option. However, any attempt to redirect output to a console or also setting ignore_loglevel on the cmdline seems to make it succeed. Harald, is there a way to get dracut to not actually die when it can't mount the live image? rd.shell, rd.debug, etc all wind up with a pause and then a panic. Getting a shell at this point would be helpful.
(In reply to comment #21) > (In reply to comment #20) > > we don't need pictures, we know what the error looks like (andre attached a > > screenshot in his bug). > > Yeah, but with rd.debug there is a lot more interesting output that is > available prior to dracut giving up and that was what I was trying to take > pictures of. Oh well. with rd.debug all dracut output is logged in /run/initramfs/init.log
(In reply to comment #25) > I seem to be able to recreate this (I think) fairly regularly with the TC1 > netinstalliso by chosing 'Rescue', hitting tab and removing the quiet option. > However, any attempt to redirect output to a console or also setting > ignore_loglevel on the cmdline seems to make it succeed. > > Harald, is there a way to get dracut to not actually die when it can't mount > the live image? rd.shell, rd.debug, etc all wind up with a pause and then a > panic. Getting a shell at this point would be helpful. "rd.shell rd.break=pre-mount"
(In reply to comment #21) > (In reply to comment #20) > > we don't need pictures, we know what the error looks like (andre attached a > > screenshot in his bug). > > Yeah, but with rd.debug there is a lot more interesting output that is > available prior to dracut giving up and that was what I was trying to take > pictures of. Oh well. Anyway... the dracut output is not interesting anymore, because I know now, that the mount failed, because the kernel module is not loaded. So the kernel output and dmesg is much more interesting.
Maybe this is now redundant but...F16 beta installer start fine every time in a KVM VM with 2CPUs/2048GB memory. I've attached the dmesg output, maybe this yields some needed info. I always fails on my PC and macmini (both EFI and legacy boot).
Created attachment 521172 [details] working installer in KVM, 2CPUs/2048MB mem, dmesg output
managed to get 2 successful boots on my macmini with maxcpus=1, once using legacy boot and one time with EFI boot.
(In reply to comment #27) > (In reply to comment #25) > > I seem to be able to recreate this (I think) fairly regularly with the TC1 > > netinstalliso by chosing 'Rescue', hitting tab and removing the quiet option. > > However, any attempt to redirect output to a console or also setting > > ignore_loglevel on the cmdline seems to make it succeed. > > > > Harald, is there a way to get dracut to not actually die when it can't mount > > the live image? rd.shell, rd.debug, etc all wind up with a pause and then a > > panic. Getting a shell at this point would be helpful. > > "rd.shell rd.break=pre-mount" That doesn't work. In the failing case, it hangs for a few seconds after failing to mount the squashfs.img then exits and causes a kernel panic. In a working case, the squashfs.img is already mounted at that point so it's pointless to have a shell.
"rd.break=pre-trigger rd.shell=1" then you'll have to start some things by hand # udevadm trigger --action=add # /sbin/dmsquash-live-root /squashfs.img
(In reply to comment #33) > "rd.break=pre-trigger rd.shell=1" > > then you'll have to start some things by hand > # udevadm trigger --action=add > # /sbin/dmsquash-live-root /squashfs.img if you want dracut debug here: # echo " rd.debug " >> /etc/cmdline
Created attachment 521241 [details] dmesg of boot failure dmesg of failed boot on cold iron before kernel panic (thanks Harald for the methodology). Not shown in dmesg was a msg on screen like: can't delete loop device loop0. device or resource busy. Only one file in /tmp, root.info: root='live:/squashfs.img' rflags='ro' fstype='auto' netroot='' NEWROOT='/sysroot'
I even got further... copied vmlinuz and initrd.img from the .iso $ qemu-kvm -smp 2 -nographic -m 1024 -kernel vmlinuz -initrd initrd.img -append "rd.break=pre-trigger rd.shell root=live:/squashfs.img rescue console=ttyS0,115200n81" .... [ 3.810947] dracut Warning: Break before pre-trigger dracut Warning: Break before pre-trigger Dropping to debug shell. ... pre-trigger:/# udevadm trigger --action=add [ 116.285083] cdrom_id used greatest stack depth: 3952 bytes left pre-trigger:/# /sbin/dmsquash-live-root /squashfs.img [ 14.299293] dracut: FATAL: cannot mount live image (unknown filesystem type) [ 14.301417] dracut: Refusing to continue ... so, it can be triggered in a "-smp 2" qemu-kvm jwb, even managed to install a modprobe shell wrapper, which _didn't_ log the kernel trying to "modprobe squashfs"
(In reply to comment #36) > jwb, even managed to install a modprobe shell wrapper, which _didn't_ log the > kernel trying to "modprobe squashfs" Yep, here's a log of that. I didn't run the udevadm step, but it doesn't seem to matter at all for my case. All that gets added is that cdrom device and that isn't accessed in the failing paths. [ 2.167841] dracut Warning: Break before pre-trigger dracut Warning: Break before pre-trigger Dropping to debug shell. sh: cannot set terminal process group (-1): Inappropriate ioctl for device sh: no job control in this shell pre-trigger:/# [ 2.229229] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2 [ 2.496862] Refined TSC clocksource calibration: 2691.265 MHz. pre-trigger:/# cp -a /sbin/modprobe /sbin/modprobe.bin pre-trigger:/# cat >/sbin/modprobe <<EOF > #!/bin/sh > echo modprobe "\$@" >> /tmp/mod.log > exec /sbin/modprobe.bin "\$@" >>/tmp/mod.log 2>&1 > EOF pre-trigger:/# /sbin/dmsquash-live-root /squashfs.img [ 69.286300] dracut: FATAL: cannot mount live image (unknown filesystem type) [ 69.288107] dracut: Refusing to continue pre-trigger:/# cat /tmp/mod.log cat: /tmp/mod.log: No such file or directory pre-trigger:/#
Very odd, I can install from the ISO DVD with SMP set to 2 (see comment #31) but when I try the kvm commands from comment #36 by hand with -smp 2 it breaks. Without smp I get: [ 2.215940] mknod used greatest stack depth: 5136 bytes left [ 2.358591] dracut: dracut-013-4.fc16 [ 2.373810] dracut: root was live:/squashfs.img, liveroot is now live:/squashfs.img [ 2.395409] initqueue used greatest stack depth: 4864 bytes left [ 2.410718] udevd[69]: starting version 173 [ 2.427894] dracut Warning: Break before pre-trigger dracut Warning: Break before pre-trigger Dropping to debug shell. sh: cannot set terminal process group (-1): Inappropriate ioctl for device sh: no job control in this shell pre-trigger:/# [ 2.474705] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2 [ 2.722752] Refined TSC clocksource calibration: 2672.726 MHz. pre-trigger:/# /sbin/dmsquash-live-root /squashfs.img [ 35.235713] squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 35.379940] blkid used greatest stack depth: 4656 bytes left [ 35.426303] dmsquash-live-r used greatest stack depth: 4640 bytes left pre-trigger:/#
Giving the dmsquash-live-root command twice yields: pre-trigger:/# [ 2.922513] Refined TSC clocksource calibration: 2672.726 MHz. pre-trigger:/# /sbin/dmsquash-live-root /squashfs.img [ 35.764073] dracut: FATAL: cannot mount live image (unknown filesystem type) [ 35.766566] dracut: Refusing to continue pre-trigger:/# /sbin/dmsquash-live-root /squashfs.img [ 48.532348] squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 48.653581] blkid used greatest stack depth: 4656 bytes left this is with -smp 2
ok, I found one bugfix for "cannot mount live image (unknown filesystem type)" so, to rule out the "cannot mount live image (unknown filesystem type)" and get a reproducer for "mount: unknown filesystem type 'squashfs'", you should run: # sed -i -e 's# _fs=#udevadm settle; _fs=#g' /sbin/dmsquash-live-root and then: # udevadm trigger --action=add; udevadm settle # /sbin/dmsquash-live-root /squashfs.img This is to find a reproducer for: mount: unknown filesystem type 'squashfs' umount: /run/initramfs/squashfs: not mounted mount: unknown filesystem type 'DM_snapshot_cow' in a loop and finally: dracut Warning: Can't mount root filesystem
dracut-013-8.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/dracut-013-8.fc16
Discussed at 2011-09-02 blocker review meeting. Agreed this is a Beta blocker under criterion "The installer must be able to complete an installation using the text, graphical and VNC installation interfaces". Tim will build a boot.iso with the proposed fix for testing, stay tuned!
dracut-013-8.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.