Bug 735199 - "mount -t squashfs" fails
"mount -t squashfs" fails
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: dracut (Show other bugs)
16
All Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Harald Hoyer
Fedora Extras Quality Assurance
AcceptedBlocker
: Triaged
: 734830 (view as bug list)
Depends On:
Blocks: F16Beta/F16BetaBlocker
  Show dependency treegraph
 
Reported: 2011-09-01 15:11 EDT by Harald Hoyer
Modified: 2011-09-06 23:21 EDT (History)
19 users (show)

See Also:
Fixed In Version: dracut-013-8.fc16
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-09-06 23:21:08 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
/run/initramfs/init.log (47.17 KB, text/plain)
2011-09-01 16:06 EDT, nucleo
no flags Details
/tmp/liveroot.138.out (10.06 KB, text/plain)
2011-09-01 16:52 EDT, nucleo
no flags Details
init log from successful boot of FC16betaTC1?? (31.82 KB, text/plain)
2011-09-01 19:10 EDT, Clyde E. Kunkel
no flags Details
working installer in KVM, 2CPUs/2048MB mem, dmesg output (59.03 KB, text/plain)
2011-09-02 04:28 EDT, Jurgen Kramer
no flags Details
dmesg of boot failure (57.08 KB, text/plain)
2011-09-02 10:21 EDT, Clyde E. Kunkel
no flags Details

  None (edit)
Description Harald Hoyer 2011-09-01 15:11:24 EDT
In the installer DVD in dracut, we randomly see:

mount -n -t squashfs -o ro /dev/loop0 /run/initramfs/squashfs
mount: unknown filesystem type 'squashfs'

/proc/modules is empty.

This works sometimes (and the squashfs kernel module is loaded) and sometimes not.

Either mount or modprobe or kernel bug.

Please advise how to debug.
Comment 1 Adam Williamson 2011-09-01 15:20:27 EDT
The impact of this is that it causes F16 Beta TC1 install (not just DVD, but boot.iso too, apparently) to fail to boot sometimes.

I suspect this might be a race, as some people seem to see it always, some see it sometimes, and some haven't seen it at all yet?

Proposing as Beta blocker - dupe was proposed.
Comment 2 Adam Williamson 2011-09-01 15:20:47 EDT
*** Bug 734830 has been marked as a duplicate of this bug. ***
Comment 3 nucleo 2011-09-01 15:26:29 EDT
All DVD and netinst images from here affected http://alt.fedoraproject.org/pub/alt/stage/16-Beta.TC1/Fedora/

But looks like images from here http://alt.fedoraproject.orgpub/alt/stage/16-Alpha.RC5/Fedora/ seems to be fine.
Comment 4 Dan Horák 2011-09-01 15:52:52 EDT
I probably see the same error on s390x when dracut tries to mount the squashfs.img from initrd.img. Setting the number of CPUs to 1 (from 4) in the VM made this error to disappear.
Comment 5 Adam Williamson 2011-09-01 16:03:49 EDT
Clyde, Jurgen - can you clarify whether you are seeing this at *every* boot, or just occasionally? Some other reporters (Andre, Hongqing) say they've seen this but only occasionally, and most boots work fine.

If you're able reliably to reproduce it, can you please do this:

Could you add "rd.debug rd.live.debug" to the boot line and attach
/tmp/liveroot.*.out? (if that doesn't produce anything, try rd.live.debug by
itself.)

thanks!

(for the record, I haven't seen this yet myself, and I've been testing in VMs on my desktop, which is quite powerful and itself is running F16. I'll try with my laptop, which is less speedy and is running F15, and see if I can hit it.)
Comment 6 nucleo 2011-09-01 16:06:11 EDT
Created attachment 521088 [details]
/run/initramfs/init.log

Bug randomly appears here on VM with 1 i686 and x86_64 CPU's.

When squashfs module missing in /proc/modules it can be loaded manually with
modprobe or mount -t squashfs from dracut# prompt.
Comment 7 Harald Hoyer 2011-09-01 16:14:11 EDT
(In reply to comment #4)
> I probably see the same error on s390x when dracut tries to mount the
> squashfs.img from initrd.img. Setting the number of CPUs to 1 (from 4) in the
> VM made this error to disappear.

can you reproduce this reliably? that might be worth to test for others, too
Comment 8 Dan Horák 2011-09-01 16:22:50 EDT
(In reply to comment #7)
> (In reply to comment #4)
> > I probably see the same error on s390x when dracut tries to mount the
> > squashfs.img from initrd.img. Setting the number of CPUs to 1 (from 4) in the
> > VM made this error to disappear.
> 
> can you reproduce this reliably? that might be worth to test for others, too

When I tried the installs yesterday (or the day before), the mount failed in 4 out of 5 cases in VM with 4 CPUs. Then I started to think about workarounds.

kernel-3.0.0-1.fc16.s390x
util-linux-2.19.1-2.fc16.s390x
dracut-013-7.fc16.noarch
Comment 9 Dave Jones 2011-09-01 16:34:00 EDT
so grabbing http://download.fedora.redhat.com/pub/fedora/linux/development/16/x86_64/os/images/pxeboot/initrd.img and running lsinitrd on it (after doing s/xz/XZ/ on line 30 of lsinitrd) shows this..

...
lrwxrwxrwx   1 root     root           19 Aug 31 12:09 lib64 -> run/initramfs/lib64
lrwxrwxrwx   1 root     root           18 Aug 31 12:09 sbin -> run/initramfs/sbin
lrwxrwxrwx   1 root     root           17 Aug 31 12:09 tmp -> run/initramfs/tmp
xzcat: initrd.img: Compressed data is corrupt
========================================================================


What exactly is going on with the creation of the initrd ? It doesn't even get as far as unpacking the kernel modules.  Could this be the root of this issue ? That the kernel is getting just as confused when trying to unpack this ?
Comment 10 Harald Hoyer 2011-09-01 16:39:17 EDT
(In reply to comment #9)
> so grabbing
> http://download.fedora.redhat.com/pub/fedora/linux/development/16/x86_64/os/images/pxeboot/initrd.img
> and running lsinitrd on it (after doing s/xz/XZ/ on line 30 of lsinitrd) shows
> this..
> 
> ...
> lrwxrwxrwx   1 root     root           19 Aug 31 12:09 lib64 ->
> run/initramfs/lib64
> lrwxrwxrwx   1 root     root           18 Aug 31 12:09 sbin ->
> run/initramfs/sbin
> lrwxrwxrwx   1 root     root           17 Aug 31 12:09 tmp -> run/initramfs/tmp
> xzcat: initrd.img: Compressed data is corrupt
> ========================================================================
> 
> 
> What exactly is going on with the creation of the initrd ? It doesn't even get
> as far as unpacking the kernel modules.  Could this be the root of this issue ?
> That the kernel is getting just as confused when trying to unpack this ?

I think this is a an image created by concatenating several xz compressed cpio images. The kernel handles those just fine by unpacking each of them over the last one.
Comment 12 Dave Jones 2011-09-01 16:52:06 EDT
> I think this is a an image created by concatenating several xz compressed cpio
> images. The kernel handles those just fine by unpacking each of them over the
> last one.

could you add a script to dracut to unpack them so we can investigate the contents for debugging ?
Comment 13 nucleo 2011-09-01 16:52:19 EDT
Created attachment 521098 [details]
/tmp/liveroot.138.out
Comment 14 Harald Hoyer 2011-09-01 17:12:05 EDT
(In reply to comment #12)
> > I think this is a an image created by concatenating several xz compressed cpio
> > images. The kernel handles those just fine by unpacking each of them over the
> > last one.
> 
> could you add a script to dracut to unpack them so we can investigate the
> contents for debugging ?

Hmm, some anaconda tool created it... dracut only created the first part, which some anaconda tool tool and prepended to the anaconda installer image.
Comment 15 Harald Hoyer 2011-09-01 17:32:54 EDT
might be reproducible on an installed system, where this bug sometimes occurs, by running in a loop:

while mount -t squashfs -o loop squashfs.img /mnt;do 
   umount /mnt
   modprobe -r squashfs
done
Comment 16 Josh Boyer 2011-09-01 17:38:20 EDT
(In reply to comment #15)
> might be reproducible on an installed system, where this bug sometimes occurs,
> by running in a loop:
> 
> while mount -t squashfs -o loop squashfs.img /mnt;do 
>    umount /mnt
>    modprobe -r squashfs
> done

If we're going to go that route, shouldn't we do it exactly how dracut is?  Which would be more like:

losetup -d /dev/loop0
losetup -r /dev/loop0 /squashfs.img
mkdir -m 0755 -p /run/initramfs/squashfs
mount -n -t squashfs -o ro /dev/loop0 /run/initramfs/squashfs
umount /mnt
modprobe -r squashfs
Comment 17 Clyde E. Kunkel 2011-09-01 19:08:48 EDT
(In reply to comment #5)
> Clyde, Jurgen - can you clarify whether you are seeing this at *every* boot, or
> just occasionally? Some other reporters (Andre, Hongqing) say they've seen this
> but only occasionally, and most boots work fine.
> 
> If you're able reliably to reproduce it, can you please do this:
> 
> Could you add "rd.debug rd.live.debug" to the boot line and attach
> /tmp/liveroot.*.out? (if that doesn't produce anything, try rd.live.debug by
> itself.)
> 

on cold iron, 11 tries, 1 worked.  On the failures, kernel panic occurs and so I can't get anything.  On the one success, not really knowing where to look, I did find an init.log, but it doesn't seem to show me anything except dracut debug output.  Attaching shortly.  Also, tried to get some digital shots of the screen at failure time. If any are any good will attach shortly.
Comment 18 Clyde E. Kunkel 2011-09-01 19:10:13 EDT
Created attachment 521112 [details]
init log from successful boot of FC16betaTC1??
Comment 19 Clyde E. Kunkel 2011-09-01 19:13:30 EDT
Sorry, screen shots no good.

How do I get the /tmp/liveroot.*.out file from a failed VM attempt?
Comment 20 Adam Williamson 2011-09-01 19:28:24 EDT
we don't need pictures, we know what the error looks like (andre attached a screenshot in his bug).

I suppose you might be able to get the log out using virsh?
Comment 21 Clyde E. Kunkel 2011-09-01 19:42:13 EDT
(In reply to comment #20)
> we don't need pictures, we know what the error looks like (andre attached a
> screenshot in his bug).

Yeah, but with rd.debug there is a lot more interesting output that is available prior to dracut giving up and that was what I was trying to take pictures of.  Oh well.
Comment 22 Adam Williamson 2011-09-01 19:59:39 EDT
ahh, okay. one idea would be to take *video* of it, then you can isolate the interesting frames from the video?

or you can try using netconsole.
Comment 23 Josh Boyer 2011-09-01 20:18:57 EDT
(In reply to comment #16)
> (In reply to comment #15)
> > might be reproducible on an installed system, where this bug sometimes occurs,
> > by running in a loop:
> > 
> > while mount -t squashfs -o loop squashfs.img /mnt;do 
> >    umount /mnt
> >    modprobe -r squashfs
> > done
> 
> If we're going to go that route, shouldn't we do it exactly how dracut is? 
> Which would be more like:
> 
> losetup -d /dev/loop0
> losetup -r /dev/loop0 /squashfs.img
> mkdir -m 0755 -p /run/initramfs/squashfs
> mount -n -t squashfs -o ro /dev/loop0 /run/initramfs/squashfs
> umount /mnt
> modprobe -r squashfs

OK, I ran this script (ignore the bash ugliness) after bootup:

#!/bin/bash
run () {
        $@
        if [ $? != 0 ]
        then
                exit 1
        fi
}

while true ; do
	losetup -d /dev/loop0
	losetup -r /dev/loop0 /tmp/squashfs.img
	run "mount -n -t squashfs -o ro /dev/loop0 /mnt"
	run "umount /mnt"
	run "modprobe -r squashfs"
done

on a fully updated KVM guest with 2 CPUs, but using 3.1.0-rc3.git0 since that is what TC5 seems to be using.  It did 68430 loops in just under an hour with no mount errors.  There were a bunch of EBUSY errors for the losetup steps, but I didn't make them fatal in the script.

I'm guessing it has something to do with the initramfs environment, so booting (DVD or PXE) repeatedly is probably going to be better for a recreate.

I'd be happy to be proven an idiot and have an easy recreate at runtime.
Comment 24 Josh Boyer 2011-09-01 20:21:23 EDT
(In reply to comment #23) 
> on a fully updated KVM guest with 2 CPUs, but using 3.1.0-rc3.git0 since that
> is what TC5 seems to be using.  It did 68430 loops in just under an hour with

That should have been "...what TC1 seems to be using."
Comment 25 Josh Boyer 2011-09-01 21:23:14 EDT
I seem to be able to recreate this (I think) fairly regularly with the TC1 netinstalliso by chosing 'Rescue', hitting tab and removing the quiet option.  However, any attempt to redirect output to a console or also setting ignore_loglevel on the cmdline seems to make it succeed.

Harald, is there a way to get dracut to not actually die when it can't mount the live image?  rd.shell, rd.debug, etc all wind up with a pause and then a panic.  Getting a shell at this point would be helpful.
Comment 26 Harald Hoyer 2011-09-02 02:57:48 EDT
(In reply to comment #21)
> (In reply to comment #20)
> > we don't need pictures, we know what the error looks like (andre attached a
> > screenshot in his bug).
> 
> Yeah, but with rd.debug there is a lot more interesting output that is
> available prior to dracut giving up and that was what I was trying to take
> pictures of.  Oh well.

with rd.debug all dracut output is logged in /run/initramfs/init.log
Comment 27 Harald Hoyer 2011-09-02 03:00:33 EDT
(In reply to comment #25)
> I seem to be able to recreate this (I think) fairly regularly with the TC1
> netinstalliso by chosing 'Rescue', hitting tab and removing the quiet option. 
> However, any attempt to redirect output to a console or also setting
> ignore_loglevel on the cmdline seems to make it succeed.
> 
> Harald, is there a way to get dracut to not actually die when it can't mount
> the live image?  rd.shell, rd.debug, etc all wind up with a pause and then a
> panic.  Getting a shell at this point would be helpful.

"rd.shell rd.break=pre-mount"
Comment 28 Harald Hoyer 2011-09-02 03:09:56 EDT
(In reply to comment #21)
> (In reply to comment #20)
> > we don't need pictures, we know what the error looks like (andre attached a
> > screenshot in his bug).
> 
> Yeah, but with rd.debug there is a lot more interesting output that is
> available prior to dracut giving up and that was what I was trying to take
> pictures of.  Oh well.

Anyway... the dracut output is not interesting anymore, because I know now, that the mount failed, because the kernel module is not loaded.

So the kernel output and dmesg is much more interesting.
Comment 29 Jurgen Kramer 2011-09-02 04:27:26 EDT
Maybe this is now redundant but...F16 beta installer start fine every time in a KVM VM with 2CPUs/2048GB memory. I've attached the dmesg output, maybe this yields some needed info.
I always fails on my PC and macmini (both EFI and legacy boot).
Comment 30 Jurgen Kramer 2011-09-02 04:28:31 EDT
Created attachment 521172 [details]
working installer in KVM, 2CPUs/2048MB mem, dmesg output
Comment 31 Jurgen Kramer 2011-09-02 04:44:47 EDT
managed to get 2 successful boots on my macmini with maxcpus=1, once using legacy boot and one time with EFI boot.
Comment 32 Josh Boyer 2011-09-02 07:29:39 EDT
(In reply to comment #27)
> (In reply to comment #25)
> > I seem to be able to recreate this (I think) fairly regularly with the TC1
> > netinstalliso by chosing 'Rescue', hitting tab and removing the quiet option. 
> > However, any attempt to redirect output to a console or also setting
> > ignore_loglevel on the cmdline seems to make it succeed.
> > 
> > Harald, is there a way to get dracut to not actually die when it can't mount
> > the live image?  rd.shell, rd.debug, etc all wind up with a pause and then a
> > panic.  Getting a shell at this point would be helpful.
> 
> "rd.shell rd.break=pre-mount"

That doesn't work.  In the failing case, it hangs for a few seconds after failing to mount the squashfs.img then exits and causes a kernel panic.  In a working case, the squashfs.img is already mounted at that point so it's pointless to have a shell.
Comment 33 Harald Hoyer 2011-09-02 08:31:59 EDT
"rd.break=pre-trigger rd.shell=1"

then you'll have to start some things by hand
# udevadm trigger  --action=add
# /sbin/dmsquash-live-root /squashfs.img
Comment 34 Harald Hoyer 2011-09-02 08:36:23 EDT
(In reply to comment #33)
> "rd.break=pre-trigger rd.shell=1"
> 
> then you'll have to start some things by hand
> # udevadm trigger  --action=add
> # /sbin/dmsquash-live-root /squashfs.img

if you want dracut debug here:

# echo " rd.debug " >> /etc/cmdline
Comment 35 Clyde E. Kunkel 2011-09-02 10:21:56 EDT
Created attachment 521241 [details]
dmesg of boot failure

dmesg of failed boot on cold iron before kernel panic (thanks Harald for the methodology).

Not shown in dmesg was a msg on screen like:

can't delete loop device loop0. device or resource busy.

Only one file in /tmp, root.info:

root='live:/squashfs.img'
rflags='ro'
fstype='auto'
netroot=''
NEWROOT='/sysroot'
Comment 36 Harald Hoyer 2011-09-02 10:49:15 EDT
I even got further... copied vmlinuz and initrd.img from the .iso

$ qemu-kvm -smp 2 -nographic  -m 1024 -kernel vmlinuz -initrd initrd.img -append "rd.break=pre-trigger rd.shell root=live:/squashfs.img  rescue console=ttyS0,115200n81"
....
[    3.810947] dracut Warning: Break before pre-trigger
dracut Warning: Break before pre-trigger

Dropping to debug shell.
...
pre-trigger:/# udevadm trigger  --action=add
[  116.285083] cdrom_id used greatest stack depth: 3952 bytes left
pre-trigger:/# /sbin/dmsquash-live-root /squashfs.img
[   14.299293] dracut: FATAL: cannot mount live image (unknown filesystem type)
[   14.301417] dracut: Refusing to continue

... so, it can be triggered in a "-smp 2" qemu-kvm

jwb, even managed to install a modprobe shell wrapper, which _didn't_ log the kernel trying to "modprobe squashfs"
Comment 37 Josh Boyer 2011-09-02 10:57:01 EDT
(In reply to comment #36)
> jwb, even managed to install a modprobe shell wrapper, which _didn't_ log the
> kernel trying to "modprobe squashfs"

Yep, here's a log of that.  I didn't run the udevadm step, but it doesn't seem to matter at all for my case.  All that gets added is that cdrom device and that isn't accessed in the failing paths.

[    2.167841] dracut Warning: Break before pre-trigger
dracut Warning: Break before pre-trigger

Dropping to debug shell.

sh: cannot set terminal process group (-1): Inappropriate ioctl for device
sh: no job control in this shell
pre-trigger:/# [    2.229229] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2
[    2.496862] Refined TSC clocksource calibration: 2691.265 MHz.

pre-trigger:/# cp -a /sbin/modprobe /sbin/modprobe.bin
pre-trigger:/# cat >/sbin/modprobe <<EOF
> #!/bin/sh
> echo modprobe "\$@" >> /tmp/mod.log
> exec /sbin/modprobe.bin "\$@" >>/tmp/mod.log 2>&1
> EOF
pre-trigger:/# /sbin/dmsquash-live-root /squashfs.img
[   69.286300] dracut: FATAL: cannot mount live image (unknown filesystem type)
[   69.288107] dracut: Refusing to continue
pre-trigger:/# cat /tmp/mod.log
cat: /tmp/mod.log: No such file or directory
pre-trigger:/#
Comment 38 Jurgen Kramer 2011-09-02 11:22:55 EDT
Very odd, I can install from the ISO DVD with SMP set to 2 (see comment #31) but when I try the kvm commands from comment #36 by hand with -smp 2 it breaks. Without smp I get:

[    2.215940] mknod used greatest stack depth: 5136 bytes left
[    2.358591] dracut: dracut-013-4.fc16
[    2.373810] dracut: root was live:/squashfs.img, liveroot is now live:/squashfs.img
[    2.395409] initqueue used greatest stack depth: 4864 bytes left
[    2.410718] udevd[69]: starting version 173


[    2.427894] dracut Warning: Break before pre-trigger
dracut Warning: Break before pre-trigger

Dropping to debug shell.

sh: cannot set terminal process group (-1): Inappropriate ioctl for device
sh: no job control in this shell
pre-trigger:/# [    2.474705] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2
[    2.722752] Refined TSC clocksource calibration: 2672.726 MHz.

pre-trigger:/# /sbin/dmsquash-live-root /squashfs.img
[   35.235713] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[   35.379940] blkid used greatest stack depth: 4656 bytes left
[   35.426303] dmsquash-live-r used greatest stack depth: 4640 bytes left
pre-trigger:/#
Comment 39 Jurgen Kramer 2011-09-02 11:26:34 EDT
Giving the dmsquash-live-root command twice yields:

pre-trigger:/# [    2.922513] Refined TSC clocksource calibration: 2672.726 MHz.

pre-trigger:/# /sbin/dmsquash-live-root /squashfs.img
[   35.764073] dracut: FATAL: cannot mount live image (unknown filesystem type)
[   35.766566] dracut: Refusing to continue
pre-trigger:/# /sbin/dmsquash-live-root /squashfs.img
[   48.532348] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[   48.653581] blkid used greatest stack depth: 4656 bytes left

this is with -smp 2
Comment 40 Harald Hoyer 2011-09-02 11:45:51 EDT
ok, I found one bugfix for "cannot mount live image (unknown filesystem type)"

so, to rule out the "cannot mount live image (unknown filesystem type)" and get a reproducer for "mount: unknown filesystem type 'squashfs'", you should run:

# sed -i -e 's# _fs=#udevadm settle;  _fs=#g'  /sbin/dmsquash-live-root

and then:

# udevadm trigger  --action=add; udevadm settle
# /sbin/dmsquash-live-root /squashfs.img


This is to find a reproducer for:

mount: unknown filesystem type 'squashfs'
umount: /run/initramfs/squashfs: not mounted
mount: unknown filesystem type 'DM_snapshot_cow'
in a loop and finally:
dracut Warning: Can't mount root filesystem
Comment 41 Fedora Update System 2011-09-02 13:47:04 EDT
dracut-013-8.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/dracut-013-8.fc16
Comment 42 Adam Williamson 2011-09-02 14:24:49 EDT
Discussed at 2011-09-02 blocker review meeting. Agreed this is a Beta blocker under criterion "The installer must be able to complete an installation using the text, graphical and VNC installation interfaces". Tim will build a boot.iso with the proposed fix for testing, stay tuned!
Comment 43 Fedora Update System 2011-09-06 23:20:54 EDT
dracut-013-8.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.