Bug 135411

Summary: diskless: Following instructions does not result in a correct install / no troubleshooting info
Product: Red Hat Enterprise Linux 3 Reporter: Graham Leggett <minfrin>
Component: redhat-config-netbootAssignee: Jason Vas Dias <jvdias>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: bastiaan, hgarcia, kreilly, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2005-486 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-28 19:36:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 156320    
Attachments:
Description Flags
adds kernel modules /lib subdir to initrd to solve unloadable pcnet32.o problem none

Description Graham Leggett 2004-10-12 15:48:05 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3)
Gecko/20040925

Description of problem:
If the instructions included inside the rpm are followed to produce a
diskless client, and an attempt is made to boot the client, the
attempt fails with the error:

- Kernel panic: No init found

There is no troubleshooting section in the manual to describe what to
do if things go wrong, as as NFS does not support logging of any kind
(that I can find anyway) troubleshooting the problem is shot in the
dark at best.

The documentation for net booting needs a serious improvement.


Version-Release number of selected component (if applicable):
redhat-config-netboot-0.1.1-22

How reproducible:
Always

Steps to Reproduce:
xxx

Additional info:

Comment 1 Graham Leggett 2004-10-12 16:06:06 UTC
Further info on this.

It seems that the initrd.img file produced by redhat-config-netboot is
an empty filesystem, thus the missing disklessrc file.


Comment 2 Graham Leggett 2004-10-12 16:52:36 UTC
Further problems: while trying to delete the install and start again
by selecting the "delete" option from the setup program, it seems that
the directories that are created during the install process are not
cleaned up correctly.

For example, deleting an installation called "Fedora Core 2" does not
delete the directory /tftpboot/linux-install/Fedora_Core_2 and it's
contents. These stray directories are possibly responsible for the
instability of the package.


Comment 3 Graham Leggett 2004-10-12 18:39:25 UTC
Much googling later, and virtually no info is found except for this
message on the Fedora list describing the same problem:

http://www.redhat.com/archives/fedora-list/2004-June/msg04532.html

There were no followups. Waiting to see if the person who posted the
message ever solved the problem.


Comment 4 Graham Leggett 2004-10-12 19:01:59 UTC
Experimenting with different kernels:

- Using vmlinuz-2.6.8-1.521 (from the client OS) allows the ramdisk to
work, but refuses to load init, ending with a kernel panic as
described in this bug report.

- Using vmlinuz-2.4.21-20.EL (from the server OS) causes the kernel to
panic claiming the ramdisk is too big.

In all cases it seems the "init=<init>" option is ineffective -
regardless of what it is set to, and regardless of whether the file it
points to exists or not, the kernel bombs out with "Kernel panic: init
not found".

It seems that the kernel is not trying to load disklessrc from the
initrd, but rather from somewhere else (the kernel error message is
too vague to be useful, it doesn't specify what the init parameter is
set to).

Changing redhat-config-netboot (from RHEL3) to system-config-netboot
(from Fedora Core 2) makes no difference - the kernel still panics on
startup.

The only thing I can think of is that the kernel as shipped with
Fedora Core 2 (the client OS) is not compatible with netboot. Has
netboot been tested with the most recent Fedora shipped kernels?


Comment 5 Graham Leggett 2004-10-12 20:24:37 UTC
Just tried to install the original kernel-2.6.5-1.358.i586.rpm kernel
and use that to boot from - also a no show.

Just tried to set up a diskless client on a completely separate client
machine - also a no show.

Tried uninstalling redhat-config-netboot completely, manually deleting
all the directories it created and reinstalled it again from scratch -
still a no show.

Seems RHEL3 cannot be a server to diskless clients.


Comment 6 Graham Leggett 2004-10-12 21:44:24 UTC
Progress at last:

Turns out neither the busybox nor busybox-anaconda packages were
installed on the server machine. It was expected that the
redhat-config-netboot RPM would have depended on these two packages,
but this dependancy is missing.

- Bug #1: missing dependancies on busybox and busybox-anaconda in
redhat-config-netbott.

- Bug #2: the fact that busybox was missing did not throw up an error
when the initrd was created. Missing files should be a fatal error at
this stage, but redhat-config-netboot silently succeeds, producing a
bogus initrd.

Fixing the busybox problem did not solve the problem. Digging further,
it was found that the /lib directory contained a whole lot of symlinks
to lib files - but the actual libfiles were not there. This caused
bash to not run, and therefore the disklessrc script to not run, thus
the "Kernel Panic: init not found" error message.

- Bug #3: The initrd creation script does not copy the libraries to
the correct place.

Having copied the libraries, and copied busybox manually, the
disklessrc script now runs, but then fails with the error that
modprobe.conf.dist can not be found.

- Bug #4: Add modprobe.conf.dist to the etc directory in initrd.

Copying modprobe.conf.dist into the ramdisk causes the modules to be
installed, but these modules complain about missing symbols. Busy
trying to figure out where the modules came from.

The investigation continues.


Comment 7 Graham Leggett 2004-10-12 22:36:38 UTC
More bugs.

When the time comes for the nfs module to be installed, it complains
of missing symbols, and the install fails. The script then terminates
instead of spawning a copy of ash, causing the kernel to panic (and in
the process preventing the option of further debugging).

Looking further at the nfs module, the installation of this module
fails because the installation of lockd fails, and in turn the
installation of sunrpc fails. This is seemingly due to the install
command attached to sunrpc, which is "install sunrpc /sbin/modprobe
--first-time --ignore-install sunrpc && { /bin/mount -t rpc_pipefs
sunrpc /var/lib/nfs/rpc_pipefs > /dev/null 2>&1 || :; }".

The rpc_pipefs directory as specified does not exist in the initrd,
but creating this directory does not fix the problem, the modprobe
still fails saying there was an error during the running of the
install script. It neglects to mention what that error is however.


Comment 8 Graham Leggett 2004-10-12 22:42:50 UTC
Even more bugs: Even with busybox and busybox-anaconda installed, the
busybox binary is not copied to the initrd.


Comment 9 Simon Roberts 2004-11-17 11:50:31 UTC
Exactly the same problems here with FC2

Comment 10 Simon Roberts 2004-11-17 23:21:51 UTC
Further progress:

- the reason sunrpc module fails to load, is because modutils is
having problems with the modprobe.conf line for it (perhaps ash
doesn't support this syntax?) :

    install sunrpc /sbin/modprobe --first-time --ignore-install sunrpc
&& { /bin/mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs >
/dev/null 2>&1 || :; }

  I hacked around it by removing the line from modprobe.conf, then
manually mounting sunrpc in disklessrc right after the nfs modules are
loaded

- Next, /sbin/dhclient and /sbin/dhclient-script wasn't copied to the
initrd (it wasn't installed on the source machine, but no error was
emitted)

Comment 12 Alan 2005-04-06 00:55:14 UTC
some notes I've seen running under FC2:

- problem in adding library seems to be that the updateDiskless file hardcodes
the CVERS variable to 2.3.2, so any updated system would not contain the library
versions the program was looking for.  libz is also hardcoded to 1.1.4
fixing these two problems in the script erases the kernel panic (but leads to
the other problems mentioned)

- the libselinux.so.1 library was also not copied over, throwing a bunch of errors.

- sunrpc does not seem to be failing to load because of a bash/ash problem. 
when I issue the same long install command on the command line it works fine. 
but when it is done simply as "modprobe sunrpc" it fails.  

- busybox copied fine on my system (pre-installed).

- when finished the kernel and initrd get put in their own subfolder of
linux-install under /tftpboot.  this seems like the wrong nomenclature, since it
is a remote boot, not an install.

-  when finished the files get put in the tftpboot tree with no permissions
check.  if root has a restrictive mask, the file permissions will need to be
changed to be usable.

Comment 14 Jason Vas Dias 2005-05-31 16:22:11 UTC
This is now fixed with redhat-config-netboot-0.1.16-1, which should be in 
RHEL-3-U6, but which meanwhile can be downloaded from:
 http://people.redhat.com/~jvdias/redhat-config-netboot/

Comment 18 Jason Vas Dias 2005-06-14 23:04:08 UTC
I cannot reproduce this problem. I have just tested booting a rhel-3 kernel
from a rhel-4 boot server.

It sounds like you may be trying to boot an initrd created by the older, broken
version of system-config-netboot . Please ensure you have installed the latest
version:
 # wget
http://people.redhat.com/jvdias/system-config-netboot/system-config-netboot-0.1.18-1.noarch.rpm
 # rpm -Uvh --force system-config-netboot-0.1.18-1.noarch.rpm 

Then please delete and recreate the OS and clients you are trying to boot with 
system-config-netboot.

Ensure that the /tftpboot/linux.install/${OS}/initrd.img has a modification time
from AFTER you deleted and recreated ${OS} .

If the problem still occurs, please send the initrd.img and client configuration
file ( /tftpboot/linux.install/pxelinux.cfg/{HEX IP} ) to : jvdias .


Comment 19 Jason Vas Dias 2005-06-15 00:53:22 UTC
I've just discovered a problem with e2fsprogs that could cause this problem:
RHEL-3 clients are unable to boot from an initrd created by RHEL-4 e2fsprogs.
It seems the rhel-3 initrd I tested as described in comment #12 was created on
RHEL-3. 
I just tried creating an initrd on RHEL-4, and a RHEL-3 client cannot read it.
It looks like there is still a problem with creating RHEL-3 initrds on RHEL-4
systems - please ignore comment #12. I'm contacting the e2fsprogs maintainers
to see if they think this is an e2fsprogs bug.

Comment 20 Jason Vas Dias 2005-06-15 01:24:36 UTC
The problem is that symbolic links created by later releases to RHEL-3 are
not readable by RHEL-3. So the updateDiskless script must create the initrd
within a chroot inside the client root directory, and all client roots must
have the e2fsprogs utilities installed. I'll make version that implements 
this tomorrow.

Comment 21 Jason Vas Dias 2005-06-15 22:44:13 UTC
RE: Comment #17 - Comment #20:

The problem mentioned in Comment #17 is specific to RHEL-4 systems with SELinux
enabled, as explained in bug #149000 . 

redhat-config-netboot-0.1.18-1_EL3 is the latest version for RHEL-3 and no 
problems have been found with it so far. This version can be downloaded from:
   http://people.redhat.com/~jvdias/redhat-config-netboot/
.

Comment 23 bastiaan 2005-07-21 07:15:09 UTC
If the user has forgotten to install busybox-anaconda on the client root,
it would be nice if updateDiskless would indicate that, instead of simply
complaining it cannot find the file in some obscure location. I suspect I'm not
the only person to fail to notice the relevant instruction in the System
Administration Guide :-)
 

Comment 24 bastiaan 2005-07-21 07:28:49 UTC
redhat-config-netboot-0.1.18-1_EL3 (and earlier I guess) creates an unusable
initrd if you have a pcnet32 network card (and possible other cards as well),
because it doesn't include the crc32 module needed by pcnet32.o. 
Solved by adding the kernel modules /lib subdir to the initrd, see attached patch.


Comment 25 bastiaan 2005-07-21 07:30:41 UTC
Created attachment 117014 [details]
adds kernel modules /lib subdir to initrd to solve unloadable pcnet32.o problem

Comment 26 Bastien Nocera 2005-07-21 09:18:47 UTC
Bastiaan, please create 2 other bugs if you want those particular bug fixed.
The original bug report is already being worked on.

Comment 27 bastiaan 2005-07-22 11:46:01 UTC
OK, Ill create a separate report for the network card issue. Do you want a
separate one for the missing busybox-anaconda issue as well? It is in response
to comment #6 and its follow-ups.
 

Comment 28 Bastien Nocera 2005-07-22 11:58:37 UTC
Yes please. It makes it easier for maintainers to track their own bugs if
they're in their products, rather than spread across.

Comment 29 Jason Vas Dias 2005-07-22 20:43:57 UTC
The last issues in Comment #23 and Comment #24 are now fixed with 
redhat-config-netboot-0.1.20-1_EL3 for Bug 163951, Bug 163954 .


Comment 30 Red Hat Bugzilla 2005-09-28 19:36:45 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-486.html