Bug 475495

Summary: ext3 root mount failure with mkinitrd 6.0.71 and later
Product: [Fedora] Fedora Reporter: Sean Middleditch <sean>
Component: mkinitrdAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 10CC: ben, dcantrell, fschwarz, hdegoede, james, katzj, kernel-maint, mwoehlke.floss, pjones, quintela, stu, thomas.mey, wtogami
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-06 16:34:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sean Middleditch 2008-12-09 14:56:17 UTC
When booting with any recent Fedora 10 kernel, boot fails very early on with the following message:

mount: error mounting /dev/root on /sysroot as ext3: Invalid argument

The last kernel to work is 2.6.27.4-68.fc10.  All kernel updates after that one, including the latest, fail with the above message.  This is an x86-64 machine, Fedora 10 final.  The mkinitrd version is mkinitrd-6.0.71-2.fc10.x86_64.

My fstab is as follows:
UUID=20cbf463-7ef5-49db-93f4-c8be81c10f74 /                       ext3    defaults,relatime	1 1
UUID=305bcc65-573d-423a-a7cb-5a4654a65400 /home                   ext4    defaults,relatime	1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
UUID=7884defb-7e74-48b4-a99c-16d4e050049e swap                    swap    defaults        0 0

Yes, I have tried this without the relatime option in there.  Exact same error.  I've also tried removing the /home mount (which is ext4; the root partition has always been ext3), and still the same message.

In case it helps (known driver bug or something), here is the output of lspci:
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge
00:01.0 PCI bridge: ASRock Incorporation Device 9602
00:09.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 4)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 5)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [IDE mode]
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.1 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI1 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.1 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI1 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3a)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control
01:05.0 VGA compatible controller: ATI Technologies Inc Radeon HD 3200 Graphics
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
05:07.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev c0)

If there are any other packages that you need the version for, please let me know; I'm not sure what else could be at fault here, but I'm definitely eager to help debug this.  :)

Comment 1 Dave Jones 2008-12-09 19:02:37 UTC
possibly a dupe of 470628

Comment 2 Sean Middleditch 2008-12-09 20:17:28 UTC
You're the expert, Dave, but to me the bug doesn't sound related.  This isn't a pause... it just flat out won't boot at all... or if it is a pause, it's a _lot_ longer than 10 seconds.  Bug 470628 seems to have started at an earlier kernel than the one that breaks for me.

Bug 470628 does sound like it might be related to a different bug I filed, bug 468800.

I'll try the mkinitrd fixes though and see if that helps.

Comment 3 Ben Webb 2008-12-10 02:08:15 UTC
You could try adding --with=scsi_wait_scan to your mkinitrd invocation, or adding
MODULES="scsi_wait_scan"
to /etc/sysconfig/mkinitrd. I get a similar boot failure to you on one of my machines (root filesystem handled by aic7xxx) except that I get (IIRC) "No such file or directory" rather than "Invalid argument". It seems to be because it's trying to mount the root filesystem before the SCSI modules have finished starting up. Adding scsi_wait_scan works for me, but I don't know whether a) you have the same problem or b) this is a problem with the kernel or mkinitrd.

Comment 4 Ben Webb 2008-12-10 02:35:51 UTC
Could be related to bug 466534 ?

Comment 5 Sean Middleditch 2008-12-11 18:34:56 UTC
kernel-2.6.27.7-134.fc10.x86_64 does not fix this.

mkinitrd-6.0.71-3.fc10.x86_64 (via bug 466534) does not fix this; yes I regenerated the initrd.

Going to try both the scsi_wait_scan trick and the suggestion from bug 470628 right now.

Comment 6 Sean Middleditch 2008-12-11 19:14:50 UTC
Ah ha.  The working initrd I have for the older kernel was generated with mkinitrd 6.0.69.  I downloaded 6.0.70 from Koji and regenerated the initrd for the newer kernel (2.6.27.7-134.fc10.x86_64) and the system booted up just fine!

So it's something in mkinitrd 6.0.71 that breaks things for me.  It seems that -2 was the first package for f10 in that series; the -1 package revision in Koji is tagged as f11, so I'm not sure it's safe to test that.

Maybe whatever causes bug 468800 is also related to why the newer mkinitrd doesn't work?  It seems only me and one other person are affected by that one (that are reporting it, anyway), and it doesn't sound like anybody else is having this specific issue.

Let me know what else I can do to help fix this.  For now I'm just going to yum exclude mkinitrd/nash/libbdevid-python so I can update without breaking boot.

Comment 7 Sean Middleditch 2008-12-20 22:00:38 UTC
I tried mkinitrd-6.0.73-7 (rebuilt on F10 from the source RPM so it works with Python 2.5) and the error still occurs.  Every single mkinitrd since 6.0.70 generates unbootable initrd images.

Comment 8 Thomas Meyer 2009-01-05 20:09:10 UTC
I guess I'm also hitting this mkinitrd bug:
I use to have a self compiled kernel that is linus' git tree, but I've got nearly everything compiled in. Booting without "quiet" option I get to see:

Setting up hotplug.
Creating block device nodes.
Creating character device nodes.
Making device-mapper control node
Setting up disk encryption: /dev/sda4
key slot 0 unlocked.
Command successful.
Setting up disk encryption: /dev/sda5
key slot 0 unlocked.
Command successful.
Unable to access resume device (LABEL=SWAP)
Creating root device.
Mounting root filesystem.
mount: error mounting /dev/root on /sysroot as ext3: No such file or directory
"

rootfs is encrypted and is a labeled ext3 partition (name=ROOT). swap partition is labeled as "SWAP".
The funny thing is: The cryptsetup seems to complete successfully. But the mount of the rootfs fails. The question is why?

Comment 9 Thomas Meyer 2009-01-05 20:21:59 UTC
My mkinitrd version is:
$ yum info mkinitrd
Geladene Plugins: refresh-packagekit
Installierte Pakete
Name       : mkinitrd
Architektur : i386
Version    : 6.0.71
Ausgabe    : 3.fc10
Grösse     : 142 k
Repo       : installed

Relevant init "script" from initrd is:
"(cut)
mknod /dev/ttyS3 c 4 67
/lib/udev/console_init tty0
daemonize --ignore-missing /bin/plymouthd
plymouth --show-splash
echo Setting up hotplug.
hotplug
echo Creating block device nodes.
mkblkdevs
echo Creating character device nodes.
mkchardevs
echo Making device-mapper control node
mkdmnod
modprobe scsi_wait_scan
rmmod scsi_wait_scan
mkblkdevs
echo Setting up disk encryption: /dev/sda4
plymouth ask-for-password --command "cryptsetup luksOpen /dev/sda4 luks-sda4"
echo Setting up disk encryption: /dev/sda5
plymouth ask-for-password --command "cryptsetup luksOpen /dev/sda5 luks-swap"
resume LABEL=SWAP
echo Creating root device.
mkrootdev -t ext3 -o defaults,noatime,ro LABEL=ROOT
echo Mounting root filesystem.
mount /sysroot
cond -ne 0 plymouth --hide-splash
echo Setting up other filesystems.
setuproot
loadpolicy
plymouth --newroot=/sysroot
echo Switching to new root and running init.
switchroot
echo Booting has failed.
sleep -1"

Boot command line is "
ro root=LABEL=ROOT rhgb lapic_timer_c2_ok resume=LABEL=SWAP usbcore.autosuspend=1 nomodeset initcall_debug"

The root= and resume= options are pretty useless for encrypted disk as they are hard-coded into the init script of the initrd... But that's another story.

By the way: Why is the initrd file called ".img" and not ".cpio.gz" which is way more midnight commander friendly?

Comment 10 Hans de Goede 2009-01-06 10:11:56 UTC
I see from the copied init file, that you've got noatime in your options field for / in fstab. This is know to break nash (and is never put there by any Fedora tools, you've put that there yourself).

Removing the noatime option from your entry for / in fstab should fix this.

Closing.

*** This bug has been marked as a duplicate of bug 296361 ***

Comment 11 Hans de Goede 2009-01-06 10:13:33 UTC
Erm, scrap my last remark, the know issue is with norelatime, not with noatime, reopening, sorry for the spam.

Comment 12 Sean Middleditch 2009-01-07 18:30:14 UTC
For what it's worth, I tried both with and without relatime.  (I have not tried the noatime/nodiratime options.)

Is there any other information I should provide to help?  I'm at a total loss as to what to do, and so far even the F11 packages remain broken for me.  I was tempted to try a fresh F10 reinstall, but given it looks like that would just result in a completely unbootable system for me, I'm not really keen on risking it.  Same with F11 pre-releases.

Maybe some tweaks I should try to make to the mkinitrd code, or a way to enable more verbose log messages about what the message I'm getting is being caused by?

Comment 13 Stu Tomlinson 2009-01-21 18:11:08 UTC
Did you rebuild your initrd after removing relatime? See bug #296361 comment #4

Comment 14 Sean Middleditch 2009-01-29 18:36:49 UTC
I was pretty sure I did, but I tried again with a newer version of mkinitrd and now it works when relatime is not there.  So I was wrong, and it is just the same bug about relatime not being supported, apparently.  My apologies.

Comment 15 Thomas Meyer 2009-02-07 11:12:19 UTC
(In reply to comment #9)
> echo Setting up disk encryption: /dev/sda4
> plymouth ask-for-password --command "cryptsetup luksOpen /dev/sda4 luks-sda4"
> echo Setting up disk encryption: /dev/sda5
> plymouth ask-for-password --command "cryptsetup luksOpen /dev/sda5 luks-swap"
> resume LABEL=SWAP
> echo Creating root device.
> mkrootdev -t ext3 -o defaults,noatime,ro LABEL=ROOT
> echo Mounting root filesystem.
> mount /sysroot
> cond -ne 0 plymouth --hide-splash
> echo Setting up other filesystems.
> setuproot
> loadpolicy
> plymouth --newroot=/sysroot
> echo Switching to new root and running init.
> switchroot
> echo Booting has failed.
> sleep -1"
> 
> Boot command line is "
> ro root=LABEL=ROOT rhgb lapic_timer_c2_ok resume=LABEL=SWAP
> usbcore.autosuspend=1 nomodeset initcall_debug"
> 
> The root= and resume= options are pretty useless for encrypted disk as they are
> hard-coded into the init script of the initrd... But that's another story.

maybe above statement is not totally true:
I need to change my boot command line to

"ro root=/dev/mapper/luks-sda2 rhgb lapic_timer_c2_ok resume=LABEL=SWAP usbcore.autosuspend=1 nomodeset fastboot"

to make boot work again for my selfcompiled latest linus' git tree kernel, that as nearly no modules, i.e. everthing is compiled in.

Who's responsible of creating the /dev/root device?

I guess the boot command line parameter "root=xxx" seems to be directly involved in creation of /dev/root device?!

Comment 16 Jeremy Katz 2009-05-06 16:34:55 UTC
Closing as a dupe based on comment #14 from Sean (original reporter)

Thomas -- /dev/root is created based on root=, yes.

*** This bug has been marked as a duplicate of bug 296361 ***