Bug 135411
Summary: | diskless: Following instructions does not result in a correct install / no troubleshooting info | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Graham Leggett <minfrin> | ||||
Component: | redhat-config-netboot | Assignee: | Jason Vas Dias <jvdias> | ||||
Status: | CLOSED ERRATA | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.0 | CC: | bastiaan, hgarcia, kreilly, tao | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHBA-2005-486 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2005-09-28 19:36:45 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 156320 | ||||||
Attachments: |
|
Description
Graham Leggett
2004-10-12 15:48:05 UTC
Further info on this. It seems that the initrd.img file produced by redhat-config-netboot is an empty filesystem, thus the missing disklessrc file. Further problems: while trying to delete the install and start again by selecting the "delete" option from the setup program, it seems that the directories that are created during the install process are not cleaned up correctly. For example, deleting an installation called "Fedora Core 2" does not delete the directory /tftpboot/linux-install/Fedora_Core_2 and it's contents. These stray directories are possibly responsible for the instability of the package. Much googling later, and virtually no info is found except for this message on the Fedora list describing the same problem: http://www.redhat.com/archives/fedora-list/2004-June/msg04532.html There were no followups. Waiting to see if the person who posted the message ever solved the problem. Experimenting with different kernels: - Using vmlinuz-2.6.8-1.521 (from the client OS) allows the ramdisk to work, but refuses to load init, ending with a kernel panic as described in this bug report. - Using vmlinuz-2.4.21-20.EL (from the server OS) causes the kernel to panic claiming the ramdisk is too big. In all cases it seems the "init=<init>" option is ineffective - regardless of what it is set to, and regardless of whether the file it points to exists or not, the kernel bombs out with "Kernel panic: init not found". It seems that the kernel is not trying to load disklessrc from the initrd, but rather from somewhere else (the kernel error message is too vague to be useful, it doesn't specify what the init parameter is set to). Changing redhat-config-netboot (from RHEL3) to system-config-netboot (from Fedora Core 2) makes no difference - the kernel still panics on startup. The only thing I can think of is that the kernel as shipped with Fedora Core 2 (the client OS) is not compatible with netboot. Has netboot been tested with the most recent Fedora shipped kernels? Just tried to install the original kernel-2.6.5-1.358.i586.rpm kernel and use that to boot from - also a no show. Just tried to set up a diskless client on a completely separate client machine - also a no show. Tried uninstalling redhat-config-netboot completely, manually deleting all the directories it created and reinstalled it again from scratch - still a no show. Seems RHEL3 cannot be a server to diskless clients. Progress at last: Turns out neither the busybox nor busybox-anaconda packages were installed on the server machine. It was expected that the redhat-config-netboot RPM would have depended on these two packages, but this dependancy is missing. - Bug #1: missing dependancies on busybox and busybox-anaconda in redhat-config-netbott. - Bug #2: the fact that busybox was missing did not throw up an error when the initrd was created. Missing files should be a fatal error at this stage, but redhat-config-netboot silently succeeds, producing a bogus initrd. Fixing the busybox problem did not solve the problem. Digging further, it was found that the /lib directory contained a whole lot of symlinks to lib files - but the actual libfiles were not there. This caused bash to not run, and therefore the disklessrc script to not run, thus the "Kernel Panic: init not found" error message. - Bug #3: The initrd creation script does not copy the libraries to the correct place. Having copied the libraries, and copied busybox manually, the disklessrc script now runs, but then fails with the error that modprobe.conf.dist can not be found. - Bug #4: Add modprobe.conf.dist to the etc directory in initrd. Copying modprobe.conf.dist into the ramdisk causes the modules to be installed, but these modules complain about missing symbols. Busy trying to figure out where the modules came from. The investigation continues. More bugs. When the time comes for the nfs module to be installed, it complains of missing symbols, and the install fails. The script then terminates instead of spawning a copy of ash, causing the kernel to panic (and in the process preventing the option of further debugging). Looking further at the nfs module, the installation of this module fails because the installation of lockd fails, and in turn the installation of sunrpc fails. This is seemingly due to the install command attached to sunrpc, which is "install sunrpc /sbin/modprobe --first-time --ignore-install sunrpc && { /bin/mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs > /dev/null 2>&1 || :; }". The rpc_pipefs directory as specified does not exist in the initrd, but creating this directory does not fix the problem, the modprobe still fails saying there was an error during the running of the install script. It neglects to mention what that error is however. Even more bugs: Even with busybox and busybox-anaconda installed, the busybox binary is not copied to the initrd. Exactly the same problems here with FC2 Further progress: - the reason sunrpc module fails to load, is because modutils is having problems with the modprobe.conf line for it (perhaps ash doesn't support this syntax?) : install sunrpc /sbin/modprobe --first-time --ignore-install sunrpc && { /bin/mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs > /dev/null 2>&1 || :; } I hacked around it by removing the line from modprobe.conf, then manually mounting sunrpc in disklessrc right after the nfs modules are loaded - Next, /sbin/dhclient and /sbin/dhclient-script wasn't copied to the initrd (it wasn't installed on the source machine, but no error was emitted) some notes I've seen running under FC2: - problem in adding library seems to be that the updateDiskless file hardcodes the CVERS variable to 2.3.2, so any updated system would not contain the library versions the program was looking for. libz is also hardcoded to 1.1.4 fixing these two problems in the script erases the kernel panic (but leads to the other problems mentioned) - the libselinux.so.1 library was also not copied over, throwing a bunch of errors. - sunrpc does not seem to be failing to load because of a bash/ash problem. when I issue the same long install command on the command line it works fine. but when it is done simply as "modprobe sunrpc" it fails. - busybox copied fine on my system (pre-installed). - when finished the kernel and initrd get put in their own subfolder of linux-install under /tftpboot. this seems like the wrong nomenclature, since it is a remote boot, not an install. - when finished the files get put in the tftpboot tree with no permissions check. if root has a restrictive mask, the file permissions will need to be changed to be usable. This is now fixed with redhat-config-netboot-0.1.16-1, which should be in RHEL-3-U6, but which meanwhile can be downloaded from: http://people.redhat.com/~jvdias/redhat-config-netboot/ I cannot reproduce this problem. I have just tested booting a rhel-3 kernel from a rhel-4 boot server. It sounds like you may be trying to boot an initrd created by the older, broken version of system-config-netboot . Please ensure you have installed the latest version: # wget http://people.redhat.com/jvdias/system-config-netboot/system-config-netboot-0.1.18-1.noarch.rpm # rpm -Uvh --force system-config-netboot-0.1.18-1.noarch.rpm Then please delete and recreate the OS and clients you are trying to boot with system-config-netboot. Ensure that the /tftpboot/linux.install/${OS}/initrd.img has a modification time from AFTER you deleted and recreated ${OS} . If the problem still occurs, please send the initrd.img and client configuration file ( /tftpboot/linux.install/pxelinux.cfg/{HEX IP} ) to : jvdias . I've just discovered a problem with e2fsprogs that could cause this problem: RHEL-3 clients are unable to boot from an initrd created by RHEL-4 e2fsprogs. It seems the rhel-3 initrd I tested as described in comment #12 was created on RHEL-3. I just tried creating an initrd on RHEL-4, and a RHEL-3 client cannot read it. It looks like there is still a problem with creating RHEL-3 initrds on RHEL-4 systems - please ignore comment #12. I'm contacting the e2fsprogs maintainers to see if they think this is an e2fsprogs bug. The problem is that symbolic links created by later releases to RHEL-3 are not readable by RHEL-3. So the updateDiskless script must create the initrd within a chroot inside the client root directory, and all client roots must have the e2fsprogs utilities installed. I'll make version that implements this tomorrow. RE: Comment #17 - Comment #20: The problem mentioned in Comment #17 is specific to RHEL-4 systems with SELinux enabled, as explained in bug #149000 . redhat-config-netboot-0.1.18-1_EL3 is the latest version for RHEL-3 and no problems have been found with it so far. This version can be downloaded from: http://people.redhat.com/~jvdias/redhat-config-netboot/ . If the user has forgotten to install busybox-anaconda on the client root, it would be nice if updateDiskless would indicate that, instead of simply complaining it cannot find the file in some obscure location. I suspect I'm not the only person to fail to notice the relevant instruction in the System Administration Guide :-) redhat-config-netboot-0.1.18-1_EL3 (and earlier I guess) creates an unusable initrd if you have a pcnet32 network card (and possible other cards as well), because it doesn't include the crc32 module needed by pcnet32.o. Solved by adding the kernel modules /lib subdir to the initrd, see attached patch. Created attachment 117014 [details]
adds kernel modules /lib subdir to initrd to solve unloadable pcnet32.o problem
Bastiaan, please create 2 other bugs if you want those particular bug fixed. The original bug report is already being worked on. OK, Ill create a separate report for the network card issue. Do you want a separate one for the missing busybox-anaconda issue as well? It is in response to comment #6 and its follow-ups. Yes please. It makes it easier for maintainers to track their own bugs if they're in their products, rather than spread across. The last issues in Comment #23 and Comment #24 are now fixed with redhat-config-netboot-0.1.20-1_EL3 for Bug 163951, Bug 163954 . An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-486.html |