Bug 526752 - Must set rd_* parameters
Summary: Must set rd_* parameters
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: grubby
Version: rawhide
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: Hans de Goede
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-10-01 16:58 UTC by David Zeuthen
Modified: 2009-10-01 19:25 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-10-01 19:03:29 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description David Zeuthen 2009-10-01 16:58:53 UTC
Without any rd_* options, the assembly code (and by 'assemble' I mean, assembling RAID arrays, unlocking LUKS volumes, setting up LVM) in dracut is very optimistic and will just assemble anything it sees. This won't work for a multitude a reasons

 1. Consider the box booting from a SAN with thousands of LUNs
 2. Consider multiple initiators in a SAN
 3. Consider disks not used for booting

I ran into this problem because I happened to plug a 2.5" HDD into my server. This HDD used to sit in my laptop before I upgraded the laptop to a SSD. The HDD has a LUKS encrypted partition. When I rebooted my server, I was greeted with a password prompt in Plymouth (it didn't even tell me what disk/filesystem but that's another bug).

Actually, I pointed out this problem some time ago and Harald added the rd_* parameters to dracut. These parameters instructs dracut to only assemble what it needed. But it seems we are not using it.

Comment 1 David Zeuthen 2009-10-01 16:59:24 UTC
From IRC

<davidz> haraldh: uh, latest rawhide asks for a password for a luks partition - for a partition on a disk that is not related to booting at all - is that supposed to happen?
 haraldh: s/rawhide/dracut in rawhide/
<haraldh> yes
 if you don't want that
 rd_NO_LUKS
<-- steffen_ has quit ("Leaving")
<davidz> right
<haraldh> if you have no encrypted partitions at all 
<davidz> how about computing the rd_* variables before adding the new boot entry?
<haraldh> or rd_LUKS_UUID=<luks uuid> if you only want to be asked for that specific partition
<davidz> I mean, this is what we talked about in portland - we just cannot autoassemble/autounlock the world
 it won't work in SAN environments for example
 and not in this simple example either - I mean, the use case for me...
<haraldh> # dracut-gencmdline
 rd_DM_UUID=isw_bfadchbffa_Volume0 rd_LVM_VG=VolGroup00 KEYTABLE=de-latin1-nodeadkeys SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 root=LABEL=root rd_plytheme=charge 
<davidz> I just had happened to plug in an old laptop HDD (that I replaced with a SSD) some time ago
<haraldh> but hansg doesn't do that in new-kernel-pkg :-/
<davidz> and then I forgot about that and rebooted
 well, we should really do this
 I mean, this is almost a stop-ship thing
<haraldh> poke hansg :)
<davidz> well, I can live with this - but I guarantee you many F12 users will run into this....
<haraldh> yes
<davidz> hansg: why won't you add rd_* stuff in /sbin/new-kernel-pkg ?
 hansg: I mean, it's not trivial, I know that... you will need to calculate a lot of things
 hansg: on the other hand, we really really really cannot do it like this
* davidz files a bug

Comment 2 David Zeuthen 2009-10-01 17:01:08 UTC
Assigning to Hans G as per what Harald said.

Comment 3 David Zeuthen 2009-10-01 17:02:50 UTC
Adding to F12 blocker bug. I also think this is a stop-ship bug.

Comment 4 Hans de Goede 2009-10-01 17:14:03 UTC
Ok, so this is definitely not a stop-ship bug, esp given that doing this
will be very tricky and *we are frozen*

You are the first person to complain about this. And frankly your complaint makes no sense, what we do in dracut is no different from what rc.sysinit does, so all the examples you give are doomed anyways was rc.sysinit will do the exact same thing you want dracut to not do, right after dracut is done.

Granted, rc.sysinit does not try to activate all crypto volumes, but it will assemble all VolGroups it can find, and probe each and every disk for BIOS RAID metadata, and ...

We are just beginning to implement device filtering (what this is) in anaconda for F-13, once we've figured it out there, we can try to translate what this means for the initrd and rc.sysinit.

Also new-kernel-pgg is a convenience wrapper around grubby, and grubby is:
"command line tool for configuring grub, lilo, and elilo"

So I seriously do not think this belongs inside new-kernel-pkg.

This all feels very wrong, we end up re-creating mkinitrd (which has hardcoded
inside which volgroups / luks uuid / raidsets to activate), but then in an
asynchronous manner to make it more "fun".

Comment 5 David Zeuthen 2009-10-01 17:40:25 UTC
(In reply to comment #4)
> Ok, so this is definitely not a stop-ship bug, esp given that doing this
> will be very tricky and *we are frozen*

I don't think a stop-ship bug depends on whether we are frozen or not. Whether it's stop-ship or not, that's not really my call though. I just filed a bug because I thought, wow, this is pretty broken that we can't boot because you plug in a random harddisk.

> And frankly your complaint makes no sense

I'm sorry that you think my bug is a "complaint". And I'm sorry it doesn't makes sense to you. For the record, I still can't boot my box with the 2.5" HDD attached. But I can personally live with that. I'll try to stay out of your way.

Comment 6 Hans de Goede 2009-10-01 17:58:48 UTC
(In reply to comment #5)
> For the record, I still can't boot my box with the 2.5" HDD
> attached. But I can personally live with that. I'll try to stay out of your
> way.  

Note that your original bug report does *not mention* that you cannot boot, it
complains that you are presented with a password screen, what happens if you type the correct password ? Or if you press ESC or type a wrong password ?

Comment 7 Matthias Clasen 2009-10-01 18:01:15 UTC
Lets discuss this bug in the F12 blocker meeting then, if we cannot agree on the stop-ship-ness here.

Comment 8 Hans de Goede 2009-10-01 18:12:36 UTC
(In reply to comment #7)
> Lets discuss this bug in the F12 blocker meeting then, if we cannot agree on
> the stop-ship-ness here.  

There really is nothing to discuss, David has hit a very obscure scenario, which normal users will almost certainly never hit. Otherwise there certainly would be
more bugs about this.

And although we could debate over how obscure (or not) this scenario is for hours. Their is no straight forward fix, the dracut-gencmdline tool which could
be (a part of) the fix, does not work correctly atm, it failed at the very first test run done, and in its current incarnation even if it were to work and be integrated into new-kernel-pkg, it still does not fix this particular bug.

While at the same time, this means making big changes to the way the initrd works after the freeze, causing many regressions of the system does not boot kind with
a certainty approaching 100%.

So the choices are:

1) live with this bug which only happens in rare circumstances
2) Don't ship F-12 (unfreeze, make changes, add a month of testing atleast)
3) Ship an F-12 which won't boot on an unknown but large amount of systems.

Comment 9 Harald Hoyer 2009-10-01 18:19:21 UTC
(In reply to comment #8)
> Their is no straight forward fix, the dracut-gencmdline tool which could
> be (a part of) the fix, does not work correctly atm, it failed at the very
> first test run done, and in its current incarnation even if it were to work and
> be integrated into new-kernel-pkg, it still does not fix this particular bug.

try dracut-gencmdline from dracut-002-11.gita8a3ca51.fc12

http://koji.fedoraproject.org/koji/taskinfo?taskID=1722466

Comment 10 David Zeuthen 2009-10-01 18:23:45 UTC
(In reply to comment #8)
> 1) live with this bug which only happens in rare circumstances
> 2) Don't ship F-12 (unfreeze, make changes, add a month of testing atleast)
> 3) Ship an F-12 which won't boot on an unknown but large amount of systems.  

FWIW, I just tried reproducing it on my laptop. The laptop has a ext3 rootfs (/boot and / on the same fs) on the first partition of the SSD. Nothing fancy at all. I tried two things

     1. having a usb stick with a LUKS partition
     2. having a LUKS partition on the SSD

and neither brought up a password dialog and I could boot. I don't know why my other box is hitting this - it requires more debugging.

Comment 11 David Zeuthen 2009-10-01 19:03:29 UTC
OK, I debugged this some more. So all this drama was caused by two issues

 1. mdadm.conf now requires the rootfs array to be listed - I didn't
    have to do that with the an August-vintage dracut.

 2. luks dialog for drives not related to boot are popping up.

I resolved 1. by adding an entry to mdadm.conf (I remember Harald and Hans saying this is now needed) and then I could boot without the 2.5" HHD attached.

With the 2.5" HDD attached I still get the password dialog at bootup... but dismissing it three times makes it go away and booting continues. While this LUKS dialog is an inconvenience, it is hardly a blocker and, in some extreme, not even a bug. So I'm closing this bug as WORKSFORME.

We probably still want to set the rd_* parameters in the future but that is outside the scope of this bug. Sorry for jumping the gun early and making this a blocker.

Comment 12 Harald Hoyer 2009-10-01 19:25:42 UTC
(In reply to comment #11)
> OK, I debugged this some more. So all this drama was caused by two issues
> 
>  1. mdadm.conf now requires the rootfs array to be listed - I didn't
>     have to do that with the an August-vintage dracut.

add "rd_NO_MDADMCONF" if you don't want that :)


Note You need to log in before you can comment on or make changes to this bug.