480667 – nash unable to find dm devs by uuid or label causing boot to fail

Bug 480667 - nash unable to find dm devs by uuid or label causing boot to fail

Summary: nash unable to find dm devs by uuid or label causing boot to fail

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mkinitrd
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Peter Jones
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (5):	480761 481037 484474 484508 485274 (view as bug list)
Depends On:
Blocks:	F11Alpha, F11AlphaBlocker
TreeView+	depends on / blocked

Reported:	2009-01-19 18:48 UTC by Jesse Keating
Modified:	2013-01-10 03:26 UTC (History)
CC List:	19 users (show)
Fixed In Version:	6.0.71-4.fc10
Clone Of:
Environment:
Last Closed:	2009-02-25 16:07:26 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
JPG Screenshot at kernel freeze (597.57 KB, image/jpeg) 2009-01-25 13:59 UTC, Frank Murphy	no flags	Details
serial console log of boot failure on Lenovo desktop (18.59 KB, text/plain) 2009-02-24 23:58 UTC, Clark Williams	no flags	Details
View All

Description Jesse Keating 2009-01-19 18:48:08 UTC

I've got a new install on an nvidia sata controller where / exists in an LVM group.  At boot time, the boot messages show that we're not able to find the volume group and then not able to: mount: error mounting /dev/root on /sysroot as ext3: No such file or directory

If I boot with boot_delay=5 it'll work fine.  Booting with scsi_scan=sync also fails.  Both i386 and x86_64 installs on this system exhibit the problem.

Comment 1 Jesse Keating 2009-01-19 19:29:21 UTC

More information.  When doing encrypted lvm, I can't delay long enough yet to make this work.  However I can clearly see that the "scsi" disk is detected, and I get prompted for the passphrase.  The passphrase is taken then the LVM volume group and logical volumes are created, but maybe not fast enough as the mount still fails.  Now I think we're missing some delay or settle or something when making LVM volumes active.  I'm going to repeat this test but without LVM involved.

Comment 2 Jesse Keating 2009-01-19 19:48:53 UTC

Creating a / filesystem that was not on LVM indeed works like a charm.  Even more evidence that something is wrong in LVM land.

Comment 3 Jesse Keating 2009-01-19 21:53:16 UTC

And more data, encrypted FS without LVM also fails to mount, so whatever encrypted FS and lvm have in common is where issues are.

Comment 4 Jeremy Katz 2009-01-19 22:47:22 UTC

It doesn't so much look like we're racing as much as we're just not seeing uevents in nash for the dm devices, thus don't add them into our block dev cache and thus never are able to match the UUID.  The devices do exist, though, and the dm-* devs are even in sysfs.  

Also, using root=/dev/VolGroup00/RootVol (or whatever) seems to work fine on the box I've got failing.  

Reassigning to mkinitrd for now

Comment 5 Jesse Keating 2009-01-21 19:56:01 UTC

We may not be able to fix this one by Alpha, in which case we'll have to announce with alpha the work arounds.

Comment 6 Luke Macken 2009-01-21 21:36:47 UTC

I can confirm that specifying the root volume group, as opposed to the uuid, works.

Comment 7 Kevin R. Page 2009-01-21 22:49:11 UTC

Also confirm that changing uuid to root volume group works - but note this is for F10 test build kernel-2.6.28.1-9.rc2.fc10.i686 (so this should probably block any push of this kernel toward F10 too?)

Rebuilding an initrd for kernel-2.6.27.9-159.fc10.i686 doesn't suffer from this on the same machine, btw.

Comment 8 Kevin R. Page 2009-01-21 22:50:15 UTC

*** Bug 480761 has been marked as a duplicate of this bug. ***

Comment 9 Jesse Keating 2009-01-21 23:35:14 UTC

(In reply to comment #7)
> Also confirm that changing uuid to root volume group works - but note this is
> for F10 test build kernel-2.6.28.1-9.rc2.fc10.i686 (so this should probably
> block any push of this kernel toward F10 too?)
> 
> Rebuilding an initrd for kernel-2.6.27.9-159.fc10.i686 doesn't suffer from this
> on the same machine, btw.

Please confirm the mkinitrd version being used.

Comment 10 Kevin R. Page 2009-01-22 17:57:56 UTC

(In reply to comment #9)
> Please confirm the mkinitrd version being used.

mkinitrd-6.0.71-3.fc10.i386

Bug #480761 has some other details (fstab, etc.)

Comment 11 Frank Murphy 2009-01-25 09:26:38 UTC

Had F10-LXDE encrypted /,/home working

preupgrade to F11 didn't boot.
rescue mode fails loading /mnt/sysimage (errors),
so can't fix anything.

Downloaded
 boot.iso   23-Jan-2009 07:53  120M
Booted from it as install\ug, won't boot
intel_iommu=on\off makes no difference.

Stuck at Virtual Kernel Memory Layout,
will get some sort of screen image as
soon as find the old camera.

It's an old P3 800mhz, 512mb sdram (it's max)
2x250gb ide hd

Comment 12 Frank Murphy 2009-01-25 09:27:24 UTC

PS: ext4 except /boot

Comment 13 Frank Murphy 2009-01-25 13:59:13 UTC

Created attachment 329936 [details]
JPG Screenshot at kernel freeze

Comment 14 Hans de Goede 2009-02-04 20:50:48 UTC

This is fixed in mkinitrd-6.0.76, which will be in the next rawhide push.

Comment 15 Hans de Goede 2009-02-08 13:35:53 UTC

*** Bug 484508 has been marked as a duplicate of this bug. ***

Comment 16 Eric Sandeen 2009-02-09 02:58:59 UTC

*** Bug 484474 has been marked as a duplicate of this bug. ***

Comment 17 Hans de Goede 2009-02-12 11:16:22 UTC

*** Bug 481037 has been marked as a duplicate of this bug. ***

Comment 18 Hans de Goede 2009-02-12 18:25:57 UTC

*** Bug 485274 has been marked as a duplicate of this bug. ***

Comment 19 Clark Williams 2009-02-24 23:56:08 UTC

I have two systems experiencing this problem; A T60 laptop and a Lenovo desktop system. Both running uptodate rawhide.

I did a clean reinstall of F10 on the desktop box (which worked fine), then enabled the rawhide repo and upgraded to rawhide via yum. On reboot the system hung trying to mount the rootfs.

I tried booting with both UUID and VG as the root, with and without boot_delay=5 (or 10 even) with no success. 

If my T60 (intel chipset) didn't exhibit the same problem, I'd say it was NV SATA, but that's not the case. 

I've got a serial console log of the boot that I'll attach

Comment 20 Clark Williams 2009-02-24 23:58:19 UTC

Created attachment 333103 [details]
serial console log of boot failure on Lenovo desktop

This is a console log of the boot failure on the Lenovo desktop box. At the point where the output stops, the kernel is still alive (i.e. sysrq works).

Comment 21 Hans de Goede 2009-02-25 07:54:18 UTC

(In reply to comment #20)
> Created an attachment (id=333103) [details]
> serial console log of boot failure on Lenovo desktop
> 
> This is a console log of the boot failure on the Lenovo desktop box. At the
> point where the output stops, the kernel is still alive (i.e. sysrq works).

What does "rpm -q mkinitrd" say? Do you have version 6.0.76 or later?
If you do please try booting with "root=/dev/VolGroup00/LogVol00" as kernel cmdline argument (replacing the existing root= argument) if that still does not boot you are seeing another problem, in that case please create a new bug.

When creating a new bug please also do (as root):
zcat /boot/mkinitrd-<failing-kern-version>.img | cpio -i init

This will extract the init script from the initrd, and then attach the init script.

Thanks!

Comment 22 Clark Williams 2009-02-25 16:07:26 UTC

$ rpm -q mkinitrd
mkinitrd-6.0.78-1.fc11.x86_64

Actually, the boot line on the desktop box already refers to the volgroup rather than a label or a UUID, so it's probably another bug. I'll close this one (again) and open a new one.

Comment 23 Fedora Update System 2009-03-09 23:10:31 UTC

mkinitrd-6.0.71-4.fc10 has been pushed to the Fedora 10 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.