Bug 128175

Summary: diskless support for mkinitrd
Product: [Fedora] Fedora Reporter: Mark McLoughlin <markmc>
Component: mkinitrdAssignee: Peter Jones <pjones>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: andy, barryn, bastiaan, dgunchev
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-02-24 19:16:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 124120    
Attachments:
Description Flags
diskless-mkinitrd
none
diskless-linuxrc
none
diskless-dhclient-script none

Description Mark McLoughlin 2004-07-19 16:56:20 UTC
Attached is a diskless-mkinitrd script (also an init script and
dhclient script for the initrd)

This feature really belongs in mkinitrd itself rather than a
seprately maintained script.

So, the initrd generated for diskless-mkinitrd does the following:

  - scans the pci bus for an ethernet card, consults pcitable for
    the appropriate modules and loads it and its dependancies.
  - loads the NFS module and its dependancies
  - brings the appropriate network interface up
  - runs DHCP against that interface
  - NFS mounts the root directory and does the usual pivot_root

Everything else is (or at least should be) pretty much the same
as stock mkinitrd.

In order to do all this we have to do some nasty things:

  - include all network card drivers
  - include modprobe, modules.dep and modules.conf.inst
  - include busybox.anaconda for a bunch of utitilities the
    init script needs. Specifically:
      + lspci for scanning the pci bus
      + sort, awk, grep, cut for parsing the lspci output
        and comparing it with pcitable
      + ash for the init script itself
      + ifconfig for configuring the network interface
  - include dhclient
      + we also include our own dhclient script because it
        will only need to do a small fixed set of tasks and
        doesn't need to be as complex (and have as many
        dependancies) as the stock dhclient-script
  - purposely leave the DHCP lease in /tmp for the initscripts
    to pickup and re-use
  - include ld.so, libc, libm and libselinux for the various
    binaries above.

So, I guess we could have a statically linked modprobe and the
PCI scanning/module loopkup stuff in nash but I'm not sure
what we'd do about dhclient.

(Purposely ignoring anything which should be resolved by using
udev, by the way)

Comment 1 Mark McLoughlin 2004-07-19 16:57:11 UTC
Created attachment 102048 [details]
diskless-mkinitrd

Comment 2 Mark McLoughlin 2004-07-19 16:57:52 UTC
Created attachment 102049 [details]
diskless-linuxrc

Comment 3 Mark McLoughlin 2004-07-19 16:58:39 UTC
Created attachment 102050 [details]
diskless-dhclient-script

Comment 4 Mark McLoughlin 2004-07-19 17:02:43 UTC
Background for others why this shouldn't go in pretty much as is: as
it is now, diskless-mkinitrd is quite different from stock mkinitrd in
that the initrd contains a different set of binaries and the init
script is a proper shell script rather than a nash script.

This bug is essentially about how to resolve those differences sanely
so that diskless support can be added to stock mkinitrd without having
a big divergence in behaviour for the diskless support.

Comment 5 Jeremy Katz 2004-07-19 17:28:56 UTC
My initial thought to avoid needing to pull in lots of modules and all
of the modprobe infrastructure would be that, similar to how we do for
scsi modules, mkinitrd should only put in the network modules needed
for the machine it's being run on.  And then you can have a
--with-net-module or some such to add additional modules if you really
want.

Also, we could easily go with no dhclient script this early probably
and let the later interface initialization do anything more
complicated that needs to be done.  

Those are just my quick thoughts from just reading the description of
the changes so that I make sure to have put something here before
leaving for OLS :)

Comment 6 Mark McLoughlin 2004-07-19 17:57:27 UTC
(thanks for the comments)

> include only the network modules for the machine its being run on

I did consider this, because it is obviously the approach that stock
mkinitrd users. However, the use case is very different here, though.
You'd be typically running this on the server not on the thin-client -
i.e. to do it on the thin-client you would need to boot it (with a
floppy, CD, or PXE with an initrd that *does* have all the modules),
run mkinitrd and copy the initrd back to the server.

The way I think this would work is that you'd have a system image on
the server which can be NFS mounted, you chroot into that, run
mkinitrd and then copy the initrd into /tftpboot.

Ideally, you don't need to go and figure out what network card each
client has and have a different initrd for each of those ...

And, FWIW, this is the approach LTSP takes too.

> Also, we could easily go with no dhclient script this early probably
> and let the later interface initialization do anything more 
> complicated that needs to be done.

We need to have the interface fully initialised to mount the NFS root,
though.

Comment 7 Arkadiusz Miskiewicz 2004-07-19 21:43:06 UTC
We at PLD are using busybox utils: ifconfig, route, udhcpc which 
makes initrd very small. The problem is of course with net drivers.

http://svn.pld-linux.org/cgi-bin/viewsvn/geninitrd/

Comment 8 Mark McLoughlin 2004-07-20 11:20:45 UTC
I spent some time looking into whether we could get pxelinux to append
all the network details from DHCP neccessary to bring up the interface
and not have dhclient on the initrd.

Good thing is that pxelinux has an "ipappend 1" option which will
append those details using the form:

  ip=<client-ip>:<boot-server-ip>:<gw-ip>:<netmask>

The one worrying thing here is that we don't have all the details from
the DHCP lease, and we need dhclient not to re-configure the network
interface (during the real init) after we've mounted the NFS root.
Perhaps we could create a dummy lease which will expire immediately,
though.

Also, we should be looking to use the "ipappend 2" option which will
append the hardware address of the network interface we booted from so
we know which interface to bring up.

Comment 9 Andy Green 2004-07-20 11:40:21 UTC
Can I put in a word for the Linux Network Block Device (NBD)?  This  
works excellently and exposes a filesystem in a file on the server  
as a /dev/nd0 on a client booted over PXE.  The main thing you need  
is the nbd module in the initrd, after that you have it come up in  
ext3 but on /dev/nd0.  
  
For more on NBD see:  
http://atrey.karlin.mff.cuni.cz/~pavel/nbd/nbd.html  
  

Comment 10 Mark McLoughlin 2004-07-20 12:07:37 UTC
> Can I put in a word for the Linux Network Block Device (NBD)?

NBD for the root fs? Interesting, I'd heard of people using NBD for
swap but not for the root filesystem.

Another suggestion being bandied about is iSCSI:

  http://linux-iscsi.sourceforge.net

Comment 11 Andy Green 2004-07-20 18:44:55 UTC
> NBD for the root fs? Interesting, I'd heard of people  
> using NBD for swap but not for the root filesystem. 
 
It's really fast and reliable for local networks... for most of a 
year I ran a fanless and diskless EPIA-M off it based on RH9.  
Unfortunately I got it working initially with some severe hacking 
based on my own initrd.  (In retrospect the weeks of trouble I had 
getting it booting on a 2.6 kernel were due to the Via C3 bug...) I 
migrated it to Fedora 1 here using the default initrd with minor 
changes to get the NBD module up inside the initrd (it is built in 
the RH kernels already), it ended up a very slim initrd+kernel 
package built by scripts into /tftpboot/<ip-based-dir> and is very 
cool to see it netboot. 
 
You can read some out of date notes and partially updated 
information here 
 
http://warmcat.com/silentcat 
 
In the end I needed a local 24/7 mailserver + rsync backup and 
changed the EPIA to have a local 200G HDD, so I never updated the 
website with the (working) FC1 implementation. 
 
Warning!  Opinionated rant ahead! 
 
Two things it left be with is a really strong desire to see the 
minimum interdependent package set really reduced, so this "Fedora 
Core" can be called *CORE* (the CORE should not be having X, just a 
thin usermode over the kernel; even the kernel has this concept of a 
minimal build for constrained devices, why not FCx?), and a 
disrespect for the OS "install" concept (again "Installing" FCx 
should just be setting partitions, copying over the kernel, doing 
grub and minimum packageset copying and rebooting... the rest can be 
installed, and a modular anaconda run, from the actual installed OS) 
 

Comment 12 Jeremy Katz 2004-07-28 21:46:35 UTC
Using ipappend and the dummy lease file sounds like a fairly sane
approach to me.

As far as the module stuff, I would really prefer to do just load the
modules for this machine.  There are easy command line options that
can be used for cases where people want to have more modules (try to)
load.  If you're building an initrd on another machine than the one
that you're actually using, you have to do some special tweaks, that's
always been the case and moving away from that makes things a lot more
complicated for very little gain in the general case.

Comment 13 Mark McLoughlin 2004-07-29 08:42:17 UTC
The use case is the important thing to consider here.

AFAICS, mkinitrd works the way it does now for good reason - its
primary use cases are all about "make an initrd to boot this machine
with".

The use case here is that an admin buys a bunch of thin clients like
these:

http://www.disklessworkstations.com/cgi-bin/web/info/ltsp_t150.html

and wants to setup them up as X terminals.

The initrd cannot be generated on the client, because it can only boot
with PXE. So, the initrd must be generated on the server at some
point. If you require that mkinitrd be explicitly given the name of
the network card's driver, you're going to have a UI like this:

  [Add Diskless Client]

     MAC Address:
     Network Card: [ (drop down list of known cards) ]

                                             [Cancel] [Add]

and you require that the admin go through each terminal he's bought
and add these details.

The other option is that on the server you generate an initrd that can
be used on all clients and the terminals all just work out of the box.

Out of the box workingness is what we should be looking for. Making
admins choose a network card from a drop down list just seems like a
giant step backwards.

Comment 14 Jeremy Katz 2004-07-29 18:20:50 UTC
Then instead of your "choose a network card" widget, you have the
diskless admin program just add options to add all of the network
modules.  

Then, they all just try to load and the ones that are that
successfully do so and Things Work without any added complications to
mkinitrd itself and at the same time, people who are using mkinitrd
now and count on its current behavior (generalized out to network
stuff) have it continue working.

Comment 15 Mark McLoughlin 2004-07-30 10:18:24 UTC
Okay, I'll give the "through all network drivers at it and see what
falls out" approach a go, but I'm dubious ... its certainly not
pretty... :-)

And I'm not proposing changing how normal mkinitrds would work - the
diskless case is different, and I think we need the different
behaviour for that case and that case only. If you don't want to
complicate mkinitrd with the different behaviour for the diskless case
maybe its just indicative that a separate diskless-mkinitrd actually
is the right approach.

However, I'm not sure the different behaviour for diskless will
complicate matters too much so long as we make it possible for the
diskless init script to be a nash script without many differences from
the stock init script.

The only part I'm worried about is modprobe - we need to be able to
somehow handle module dependancies for all the different network cards
and the install scriptlets associated with at least the NFS modules.
Perhaps we could figure out a way to hardcode all that in the init
script and remove the need for modprobe, though. One possibility to
make that easy would be to have something like modprobe -nv where we
could generate the scriptlet required to load a module. (-nv won't do
that now, though, if the module is loaded on the machine generating
the initrd)

Comment 16 Jeremy Katz 2004-07-30 14:17:19 UTC
Somehow, I don't think you really want to have it be a different
script   unless you also want to keep up with the flavor of the week
kernel change.  :)

And my argument is that diskless is just a generalization of the
normal initrd case, not some special case that should be handled
massively differently in its module loading.  And there are already
people using it as such (cf bug 128832 that got filed last night). 
And, every module the is in the initrd takes up (valuable) unswappable
kernel memory and thus increases resource requirements on what we're
trying to say are thin clients.

modprobe won't matter as mkinitrd already discovers module
dependencies and ensures things get loaded in the right order. 
Scripts shouldn't be required for doing NFS mounts -- they're needed
for the server side, but not the client AFAIK.

Comment 17 bastiaan 2005-09-27 12:52:57 UTC
Mark, how is what you want different from the diskless client support in
system-config-netboot (if you would remove the read-write snapshot dir stuff)?
It runs on the boot-server, has all the network modules, etc.
I've modified system-config-netboots updateDiskless and disklessrc to setup
iSCSI rather than NFS, for booting our diskless servers. It has some extra's for
redundancy: multiple boot interfaces, ethernet bonding and RAID1 over two SANs.
Could probably adapted for NBD without much effort as well. 
Unlike the NFS client setup we want to create the initrd on the diskless server,
instead of the boot server: the boot server cannot access the filesystem of the
diskless client (unlike the NFS case), therefore it doesn't have access to the
right kernel modules, etc. Also we want to make sure that the diskless server
never gets booted with an newer kernel than it has modules for. So, the kernel
and initrd on the boot server have to be updated upon install of a new kernel
RPM on the diskless server. 
To accomplish this I'be modified new-kernel-pkg to call mkinitrd-${BOOTMETHOD}
and grubby-${BOOTMETHOD} instead of the regular ones, if /etc/sysconfig/kernel
contains a variable BOOTMETHOD.
This way we can create all kinds of mkinitrd scripts without having to modify
the standard mkinitrd package each time. 
Should I add the patch to this bug or create a new entry in Bugzilla?



Comment 18 Jeremy Katz 2006-02-24 19:16:15 UTC
mkinitrd as of 5.0.22 has some pretty basic support for nfs root somewhat
inspired by these long-ago discussions.  Currently, we're linking libpump into
nash for getting the IP instead of parsing the pxelinux ipappend stuff and
there's nothing magic for dhclient (which could well cause problems later, but
we'll get there).  The default model of operation matches the norm for mkinitrd
where it uses the modules for the specific machine, but there's also support for
putting MODULES="e1000 pcnet32" or whatever in /etc/sysconfig/mkinitrd and all
of those modules will be pulled into the initrd.