Bug 149000
Summary: | system-config-netboot creates a broken initrd for diskless clients | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Ean J. Price <ejprice> |
Component: | system-config-netboot | Assignee: | Jason Vas Dias <jvdias> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.0 | CC: | dmilburn, hgarcia, kreilly, pkmartin, tao |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2005-484 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-10-05 13:47:00 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 165735, 165772, 166217 | ||
Bug Blocks: | 156320, 156322 |
Description
Ean J. Price
2005-02-17 20:39:22 UTC
I am a dumb ass. The correct bug reference is #144240. I added this Bugzilla report to the RHEL4-U2 Proposed Blocker based on the current Priority Score of 13 in its associated Issue Tracker ticket #68011. For completeness, this problem is also reported against RHEL3 and so I have added Bugzilla report #135411 (Issue Tracker ticket #68195) to the RHEL3-U6 Proposed Blocker. Ken Reilly 2-3906 This bug is now fixed with system-config-netboot-0.1.16-1, which should be in RHEL-4-U2 and which meanwhile can be downloaded from: http://people.redhat.com/~jvdias/system-config-netboot Please try it out and let me know of any issues - thanks. (In reply to comment #7) > This bug is now fixed with system-config-netboot-0.1.16-1, which > should be in RHEL-4-U2 and which meanwhile can be downloaded from: > http://people.redhat.com/~jvdias/system-config-netboot > Please try it out and let me know of any issues - thanks. > Works for me. Thank you sir. I cannot reproduce this problem. I have just tested booting a rhel-3 kernel from a rhel-4 boot server. It sounds like you may be trying to boot an initrd created by the older, broken version of system-config-netboot . Please ensure you have installed the latest version: # wget http://people.redhat.com/jvdias/system-config-netboot/system-config-netboot-0.1.18-1.noarch.rpm # rpm -Uvh --force system-config-netboot-0.1.18-1.noarch.rpm Then please delete and recreate the OS and clients you are trying to boot with system-config-netboot. Ensure that the /tftpboot/linux.install/${OS}/initrd.img has a modification time from AFTER you deleted and recreated ${OS} . If the problem still occurs, please send the initrd.img and client configuration file ( /tftpboot/linux.install/pxelinux.cfg/{HEX IP} ) to : jvdias . (In reply to comment #12) > I cannot reproduce this problem. I have just tested booting a rhel-3 kernel > from a rhel-4 boot server. > Hi Jason, As per my previous post - using the version you put up on people.redhat.com, diskless booting works on RHEL4. I tested it with a RHEL4 client with the NFS server running RHEL3. As far as I am concerned the bug is resolved. Thanks again, Ean I've just discovered a problem with e2fsprogs that could cause this problem: RHEL-3 clients are unable to boot from an initrd created by RHEL-4 e2fsprogs. It seems the rhel-3 initrd I tested as described in comment #12 was created on RHEL-3. I just tried creating an initrd on RHEL-4, and a RHEL-3 client cannot read it. It looks like there is still a problem with creating RHEL-3 initrds on RHEL-4 systems - please ignore comment #12. I'm contacting the e2fsprogs maintainers to see if they think this is an e2fsprogs bug. The problem is that symbolic links created by later releases to RHEL-3 are not readable by RHEL-3. So the updateDiskless script must create the initrd within a chroot inside the client root directory, and all client roots must have the e2fsprogs utilities installed. I'll make version that implements this tomorrow. RE: Comment #11, Comment #15: The problem with creating RHEL-3 initrd images on RHEL-4 was that if SELinux was enabled (enforcing or permissive), a "mount" on a newly created ext2 image turned on SELinux xattr labelling of the filesystem; this results in symbolic links being of a non-zero size, which 2.4 kernels have only recently been patched not to crash about. Even so, they complain: An 'ls' of such a link produces this error message: ' ls: cannot read symbolic link bin: Input/output error ' and emit these kernel error messages to the log when such an image is mounted: ' kernel: attempt to access beyond end of device kernel: 07:01: rw=0, want=1768042287, limit=64 ' That renders all files accessed via a link created under SELinux to be unreadable to RHEL-3 kernels. Mounting the filesystem under SELinux with the mount options ' -o loop,context=system_u:object_r:removable_t ' under an SELinux enabled kernel is the only way to allow a 2.4 kernel to read it correctly. So system-config-netboot-0.1.20-1's updateDiskless now does this and can create valid RHEL-3 initrd.img images under RHEL-4 with SELinux enabled. Please try out the new version, available from: http://people.redhat.com/~jvdias/system-config-netboot/ Thank You! *** Bug 161394 has been marked as a duplicate of this bug. *** RE: Comment #21 ---- Additional Comments From pkmartin.com 2005-08-09 13:55 EDT Let me deal with each point raised : > I believe I have found several issues with the supplied fix. Root is > remounted as RW later in the booting process. The root device is never mounted RW by default, unless you changed the RC scripts to do so. Say the root installation you want clients to mount (RO) is under $ROOT/root on the boot server. The $ROOT/root directory is always mounted read-only. The $ROOT/snapshot/${HOST} directory is always mounted read-write, under /.snapshot . Each file listed in the $ROOT/snapshot/files is mounted RW in '--bind' mode - see documentation of mount's '--bind' / '-o bind' argument . What may be confusing is that the output of the 'mount' command reports: # mount rootfs on / type rootfs (rw) ... The rootfs filesystem is a special pseudo-filesystem and is always mounted RW, even if the underlying device is mounted RO - the next line of mount output from the command above shows the device: # mount rootfs on / type rootfs (rw) ${SERVER}/${ROOT}/root on / type nfs (ro,v3,rsize=8192,wsize=8192,hard,udp,nolock,addr=${SERVER}) # touch /test touch: creating `/test': Read-only file system Unless you've changed /usr/share/system-config-netboot/diskless/disklessrc the root NFS filesystem will always be mounted RO . The $ROOT/snapshot directory is mounted RW under /.snapshot , and each file or directory in /.snapshot/files is mounted with '-o bind' from /.snapshot/${file} to /${file} . If you find any of the above is not the case for you, please append the output of 'mount' to this bug report. > The Init script seems to grab only > "snapshot" and will not grab other snapshots for additional hosts (as in > a Blade chassis). The host name "reache" is what the machine boots to, not > the name designated in the DHCP server. I don't understand what you are expecting here. Each host has its own snapshot directory under $ROOT/snapshot/$HOST - by default, if you do not specify a snapshot location in the GUI , this will be the hostname associated with the DNS PTR record for the IP address granted by the DHCP server. ie., when the DHCP server grants a lease for 192.168.2.1 to the boot client, the disklessrc script does a DNS lookup for '1.2.168.192.in-addr.arpa. PTR?' and will use the name returned as its host name and snapshot directory name, unless the snapshot name is set in the GUI . If you are having problems with generation of the per-host snapshot directories, please append the output of these commands run on the boot client to this bug report: # hostname # host $IP ( where $IP is the address of your IP boot interface ) # df /.snapshot What may be confusing is that when the disklessrc script does a # mount -n -o bind /.snapshot/${HOST}/${file} ${file} the mountpoint is listed in /proc/mounts as ${SERVER}:${ROOT}/snapshot ${FILE} This is because of the '-o bind' option - it is not a real mount, just an alias for that path under ${SERVER}:${ROOT}/snapshot . So each host DOES have unique snapshot files mounted rw on ${SERVER}:${ROOT}/snapshot/${HOST}/${FILE} , but /proc/mounts lists them as /snapshot/${HOST}/${FILE} . ( On the client, you can do: # touch /var/log/test # cp /var/log/test /.snapshot/${HOST}/var/log/test cp: `/var/log/test' and `/.snapshot/${HOST}/var/log/test' are the same file ). > As noted before there are two errors at log on, Sorry - I didn't see these errors - please can you attach them to this bug report ? > one gnome cant find the reache hostname, and two XKB shows an error. It sounds like you may have a DNS problem. The problems you mentioned above could be explained by a lack of a DNS PTR record for the IP address allocated the client - please append the output of the 'host' command above to this bug, and details of the "XKB error" . > The fix seems to mount a single iteration just fine now but multiples > seem to be problematic. > Please clarify - you mean the first time booting off the nfs $ROOT, but subsequent boots fail ? How do they fail ? What error messages, if any, are generated during the subsequent boots? What is the contents of the /var/log/messages file after the boot fails? I can't reproduce any such problems here. The issues identified in Comment #23 have been investigated, bugs were raised, and are now fixed with system-config-netboot-0.1.26-1_EL4, available from: http://people.redhat.com/~jvdias/system-config-netboot 1. Bug 165772 : read-only root filesystem reported as mounted with "rw" option It appears that the initscripts package of RHEL-4 requires a "READONLY=yes" option in /etc/sysconfig/init in the client root for rc.sysinit not to attempt to mount it read-write. 2. Bug 165735 : diskless clients cannot cope with no DNS PTR record for their IP address It was possible for multiple clients who were unable to determine their host name to mount the same snapshot directory - this is now impossible; If clients are unable to determine their host name and have empty SNAPSHOT settings, the IP address is used for the snapshot directory. It is also now possible to set the client host names with DHCP options as described in Bug 165735, and DNS names do not have to be defined for them. In reply to pkmartin's last comment: > 1) building diskless boot os fails with a popup message The initrd is being made too small for the kernel-smp and the required modules. Bug 166217 covers this issue and will be fixed in the next release. > 2) Adding additional hosts with gui tool, does not seem to reread the > dhcpd.conf The GUI tool does not read the dhcpd.conf . As the documentation states, users are expected to configure the DHCP server themselves . In the future, system-config-netboot may interface with another GUI tool for DHCP configuration (enhancement request in Bug 166218) . The dhcpd.conf configuration you quote would NOT be sending the host-name option to boot clients; you'd need to specify the dhcpd.conf option: 'use-host-decl-names on;' OR modify the host declarations to contain eg.: 'host blade4 { ... option host-name "blade4"; ... }' to make dhcpd send the host declaration names (ie. 'blade4', 'blade5', 'blade6' in your example) as the host-name option in leases. I cannot explain how your host ends up with the name 'reached' if you are using the current system-config-netboot-0.1.26-1_EL4 version , which is available from: http://people.redhat.com/~jvdias/system-config-netboot/RHEL-4/ . The previous system-config-netboot versions did not take any account of whatever dhcp host-name option may have been sent, and would blindly use the last word emitted by the 'host $IP_ADDRESS' output, regardless of whether the lookup succeeded or not. So it sounds like you are not using the 0.1.26-1 version, which fixes this problem ( Bug 165735 ), and moreover, you supply no DHCP domain-name-servers option, so the clients can reach no nameservers and 'host' emits the message: ' ;; connection timed out; no servers could be reached ' and the broken system-config-netboot version you are using uses 'reached' as the host name. TO FIX : use the fixed 0.1.26-1 version, and make the dhcp server send the host-name option, or specify the domain-name-servers option to contain the IP addresses of DNS servers with PTR records for the clients. RE: ---- Additional Comments From pkmartin.com 2005-08-24 16:48 EDT ------- > Running with 0.1.30-1EL4; the SMP option seems to work fine at this time. have > the host name issues worked out with the DHCPD.conf file. (thanks for the > pointer :) ). > There is still an XKB config error at login You mean: (EE) Couldn't load XKB keymap, falling back to pre-XKB keymap It sounds like your /etc/X11/xorg.conf file may be missing or misconfigured. This file is mounted as a modifiable file under the ${CLIENT_ROOT_DIR}/snapshot/${HOST_NAME} directory. Check that the /etc/X11/xorg.conf file is there on the boot client and is correctly configured for the keyboard that is attached to the client - see 'man xorg.conf' . > > and a GNOME errror (looking for the > hostname/ip) at login, but these go away with the click of a button and so far I > cant find any long term down side or impact from these errors. > Yes, GNOME will always do a host lookup for the local host name. As you do not use DNS, this fails. If the boot client's host name as assigned by dhcp is in /etc/hosts on the client root, this error should also go away. > > Only thing I would like to quibble about at this time is the /home directory is > mounted as R/O, so a vanilla user (non root) will be unable to log in with this > default config, even though he would probably have the home dir mounted else > where in an NFS environment. > > Please let me know what you think, and thanks for the help.......... You can change how the /home directory is mounted by modifying /etc/fstab in the client root - this is also a per-client modifiable file in the snapshot directory, and by default contains no separate mount point for the /home directory, which is therefore on / . You can add an /etc/fstab entry like: ' server:/home_directory /home nfs rw 0 0 ' I'll raise an enhancement bug that the GUI should support configuration of the home directory mount point. (In reply to comment #29) > RE: > > There is still an XKB config error at login > > You mean: > (EE) Couldn't load XKB keymap, falling back to pre-XKB keymap > > It sounds like your /etc/X11/xorg.conf file may be missing or misconfigured. > This file is mounted as a modifiable file under the > ${CLIENT_ROOT_DIR}/snapshot/${HOST_NAME} > directory. > Check that the /etc/X11/xorg.conf file is there on the boot client and > is correctly configured for the keyboard that is attached to the client - > see 'man xorg.conf' I also get this error as well. The xorg.conf file was created on the client machine, then the whole OS was rsync'ed to the NFS server. When the client boots diskless and a user logs in, you get this error. If I boot off of the hard disk the error does not come up. The error is somehow caused by booting diskless. Maybe the xorg.conf file needs to be R/W since Xorg can write its own conf file now? I will check that out. > > Only thing I would like to quibble about at this time is the /home directory is > > mounted as R/O, so a vanilla user (non root) will be unable to log in with this > > default config, even though he would probably have the home dir mounted else > > where in an NFS environment. > > > > Please let me know what you think, and thanks for the help.......... > > You can change how the /home directory is mounted by modifying /etc/fstab > in the client root - this is also a per-client modifiable file in the snapshot > directory, and by default contains no separate mount point > for the /home directory, which is therefore on / . > You can add an /etc/fstab entry like: > ' > server:/home_directory /home nfs rw 0 0 > ' > I'll raise an enhancement bug that the GUI should support configuration of the > home directory mount point. I just wanted to weigh in on this one - the automounter is a cleaner way to go and works flawlessly on diskless clients. Regards, Ean In replay to Comment #30: RE: xorg.conf: The /etc/X11/xorg.conf file is mounted RW on the per-client snapshot directory. It is up to users to configure this file correctly for their boot client (xorg-x11 currently has no xf86config replacement to automate this configuration) . RE: /home mount points: Yes, the automounter also could be used to mount the /home directories, providing users configure the required maps. When bug 166727 is fixed, the implementation should give users the opportunity to use either NFS or to configure automount maps for the /home directories. (In reply to comment #31) > RE: xorg.conf: > The /etc/X11/xorg.conf file is mounted RW on the per-client snapshot directory. Yup. I didn't realize that but I see it now. So then thats not the problem. I just diff'ed the xorg.conf on the client and the diskless version on the NFS server is the same as the one that is on the harddrive. I re-rsynced just to be certain. > It is up to users to configure this file correctly for their boot client > (xorg-x11 currently has no xf86config replacement to automate this configuration) . 'system-config-display --noui' will do the trick. It automagically builds an xorg.conf using the maximum resolution that your display adapter and monitor will support ( new in RHEL4,FC2/3 ). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-484.html |