Bug 149000 - system-config-netboot creates a broken initrd for diskless clients
Summary: system-config-netboot creates a broken initrd for diskless clients
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: system-config-netboot
Version: 4.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Jason Vas Dias
QA Contact:
URL:
Whiteboard:
Depends On: 165735 165772 166217
Blocks: 156320 156322
TreeView+ depends on / blocked
 
Reported: 2005-02-17 20:39 UTC by Ean J. Price
Modified: 2007-11-30 22:07 UTC (History)
5 users (show)

Fixed In Version: RHBA-2005-484
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-05 13:47:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2005:484 0 qe-ready SHIPPED_LIVE Updated system-config-netboot package 2005-10-05 04:00:00 UTC

Description Ean J. Price 2005-02-17 20:39:22 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

Description of problem:
The initrd created for diskless clients is completely broken.  Library's are missing from /lib.  The nfs module doesn't load without hacking /disklessrc.  After fixing all of this in the initrd, the kernel still panics ( trying to kill init ) when disklessrc exits.  I haven't figured out how to get around that yet.

See bug #148022 for FC3 for a more complete description since system-config-netboot is obviously broken in FC3 as well.

Version-Release number of selected component (if applicable):
system-config-netboot-0.1.8-1

How reproducible:
Always

Steps to Reproduce:
1.Run system-config-netboot
2.
3.
  

Actual Results:  Broken, unbootable initrd.

Expected Results:  The system should have booted using NFS.

Additional info:

Comment 1 Ean J. Price 2005-02-17 20:42:29 UTC
I am a dumb ass.  The correct bug reference is #144240.

Comment 6 Ken Reilly 2005-05-04 16:22:43 UTC
I added this Bugzilla report to the RHEL4-U2 Proposed Blocker based on the
current Priority Score of 13 in its associated Issue Tracker ticket #68011. 

For completeness, this problem is also reported against RHEL3 and so I have
added Bugzilla report #135411 (Issue Tracker ticket #68195) to the RHEL3-U6
Proposed Blocker. 


Ken Reilly
2-3906

Comment 7 Jason Vas Dias 2005-05-27 20:31:13 UTC
This bug is now fixed with system-config-netboot-0.1.16-1, which
should be in RHEL-4-U2 and which meanwhile can be downloaded from:
 http://people.redhat.com/~jvdias/system-config-netboot 
Please try it out and let me know of any issues - thanks.



Comment 8 Ean J. Price 2005-05-31 17:02:17 UTC
(In reply to comment #7)
> This bug is now fixed with system-config-netboot-0.1.16-1, which
> should be in RHEL-4-U2 and which meanwhile can be downloaded from:
>  http://people.redhat.com/~jvdias/system-config-netboot 
> Please try it out and let me know of any issues - thanks.
> 

Works for me.  Thank you sir.



Comment 12 Jason Vas Dias 2005-06-14 23:05:55 UTC
I cannot reproduce this problem. I have just tested booting a rhel-3 kernel
from a rhel-4 boot server.

It sounds like you may be trying to boot an initrd created by the older, broken
version of system-config-netboot . Please ensure you have installed the latest
version:
 # wget
http://people.redhat.com/jvdias/system-config-netboot/system-config-netboot-0.1.18-1.noarch.rpm
 # rpm -Uvh --force system-config-netboot-0.1.18-1.noarch.rpm 

Then please delete and recreate the OS and clients you are trying to boot with 
system-config-netboot.

Ensure that the /tftpboot/linux.install/${OS}/initrd.img has a modification time
from AFTER you deleted and recreated ${OS} .

If the problem still occurs, please send the initrd.img and client configuration
file ( /tftpboot/linux.install/pxelinux.cfg/{HEX IP} ) to : jvdias .

Comment 13 Ean J. Price 2005-06-15 00:32:07 UTC
(In reply to comment #12)
> I cannot reproduce this problem. I have just tested booting a rhel-3 kernel
> from a rhel-4 boot server.
> 

Hi Jason,

As per my previous post - using the version you put up on people.redhat.com,
diskless booting works on RHEL4.  I tested it with a RHEL4 client with the NFS
server running RHEL3.  As far as I am concerned the bug is resolved.

Thanks again,
Ean

Comment 14 Jason Vas Dias 2005-06-15 00:52:04 UTC
I've just discovered a problem with e2fsprogs that could cause this problem:
RHEL-3 clients are unable to boot from an initrd created by RHEL-4 e2fsprogs.
It seems the rhel-3 initrd I tested as described in comment #12 was created on
RHEL-3. 
I just tried creating an initrd on RHEL-4, and a RHEL-3 client cannot read it.
It looks like there is still a problem with creating RHEL-3 initrds on RHEL-4
systems - please ignore comment #12. I'm contacting the e2fsprogs maintainers
to see if they think this is an e2fsprogs bug.



Comment 15 Jason Vas Dias 2005-06-15 01:28:42 UTC
The problem is that symbolic links created by later releases to RHEL-3 are
not readable by RHEL-3. So the updateDiskless script must create the initrd
within a chroot inside the client root directory, and all client roots must
have the e2fsprogs utilities installed. I'll make version that implements 
this tomorrow.

Comment 17 Jason Vas Dias 2005-06-15 22:40:04 UTC
RE: Comment #11, Comment #15:
 
  The problem with creating RHEL-3 initrd images on RHEL-4 was that
  if SELinux was enabled (enforcing or permissive), a "mount" on a
  newly created ext2 image turned on SELinux xattr labelling of the
  filesystem; this results in symbolic links being of a non-zero size,
  which 2.4 kernels have only recently been patched not to crash about.
  Even so, they complain: 
  An 'ls' of such a link produces this error message:
  '
     ls: cannot read symbolic link bin: Input/output error
  '
  and emit these kernel error messages to the log when such an image is mounted:
  '
     kernel: attempt to access beyond end of device
     kernel: 07:01: rw=0, want=1768042287, limit=64
  '
  That renders all files accessed via a link created under SELinux to be 
  unreadable to RHEL-3 kernels.
  Mounting the filesystem under SELinux with the mount options 
  '
     -o loop,context=system_u:object_r:removable_t
  '
  under an SELinux enabled kernel is the only way to allow a 2.4 kernel 
  to read it correctly.
 
  So system-config-netboot-0.1.20-1's updateDiskless now does this and can
  create valid RHEL-3 initrd.img images under RHEL-4 with SELinux enabled.

  Please try out the new version, available from:
   http://people.redhat.com/~jvdias/system-config-netboot/

Thank You!


Comment 19 Jason Vas Dias 2005-07-06 13:48:15 UTC
*** Bug 161394 has been marked as a duplicate of this bug. ***

Comment 23 Jason Vas Dias 2005-08-09 20:31:21 UTC
RE: Comment #21 ---- Additional Comments From pkmartin.com  2005-08-09
13:55 EDT 

Let me deal with each point raised :

> I believe I have found several issues with the supplied fix.  Root is
> remounted as RW later in the booting process. 

The root device is never mounted RW by default, unless you changed the
RC scripts to do so. 

Say the root installation you want clients to mount (RO) is under $ROOT/root 
on the boot server.

The $ROOT/root directory is always mounted read-only.

The $ROOT/snapshot/${HOST} directory is always mounted read-write, 
under /.snapshot .

Each file listed in the $ROOT/snapshot/files is mounted RW in '--bind' mode -
see documentation of mount's '--bind' / '-o bind' argument .

What may be confusing is that the output of the 'mount' command reports:
# mount
rootfs on / type rootfs (rw)
...

The rootfs filesystem is a special pseudo-filesystem and is always mounted
RW, even if the underlying device is mounted RO - the next line of mount
output from the command above shows the device:
# mount
rootfs on / type rootfs (rw)
${SERVER}/${ROOT}/root on / type nfs
(ro,v3,rsize=8192,wsize=8192,hard,udp,nolock,addr=${SERVER})
# touch /test
touch: creating `/test': Read-only file system

Unless you've changed /usr/share/system-config-netboot/diskless/disklessrc
the root NFS filesystem will always be mounted RO .

The $ROOT/snapshot directory is mounted RW under /.snapshot , and each file
or directory in /.snapshot/files is mounted with '-o bind' from 
/.snapshot/${file} to /${file} .

If you find any of the above is not the case for you, please append the
output of 'mount' to this bug report.


> The Init script seems to grab only
> "snapshot" and will not grab other snapshots for additional hosts (as in
> a Blade chassis).  The host name "reache" is what the machine boots to, not 
> the name  designated in the DHCP server.

I don't understand what you are expecting here.

Each host has its own snapshot directory under $ROOT/snapshot/$HOST - by 
default, if you do not specify a snapshot location in the GUI , this will 
be the hostname associated with the DNS PTR record for the IP address granted 
by the DHCP server.

ie., when the DHCP server grants a lease for 192.168.2.1 to the boot client,
the disklessrc script does a DNS lookup for '1.2.168.192.in-addr.arpa. PTR?' 
and will use the name returned as its host name and snapshot directory name,
unless the snapshot name is set in the GUI .

If you are having problems with generation of the per-host snapshot
directories, please append the output of these commands run on the boot client
to this bug report:
# hostname
# host $IP
( where $IP is the address of your IP boot interface )
# df  /.snapshot

What may be confusing is that when the disklessrc script does a
  # mount -n -o bind /.snapshot/${HOST}/${file} ${file}
the mountpoint is listed in /proc/mounts as 
          ${SERVER}:${ROOT}/snapshot ${FILE}
This is because of the '-o bind' option - it is not a real mount,
just an alias for that path under ${SERVER}:${ROOT}/snapshot .

So each host DOES have unique snapshot files mounted rw on
${SERVER}:${ROOT}/snapshot/${HOST}/${FILE} , but /proc/mounts
lists them as /snapshot/${HOST}/${FILE} .
( On the client, you can do:
  # touch /var/log/test
  # cp /var/log/test /.snapshot/${HOST}/var/log/test
  cp: `/var/log/test' and `/.snapshot/${HOST}/var/log/test' are the same file
).

> As noted before there are two errors at log on,

Sorry - I didn't see these errors - please can you attach them to this bug
report ?

> one gnome cant find the reache hostname, and two XKB shows an error.  

It sounds like you may have a DNS problem. The problems you mentioned above
could be explained by a lack of a DNS PTR record for the IP address allocated
the client - please append the output of the 'host' command above to this bug,
and details of the "XKB error" . 

> The fix seems to mount a single iteration just fine now but multiples 
> seem to be problematic. 
> 
Please clarify - you mean the first time booting off the nfs $ROOT, but
subsequent boots fail ? How do they fail ? What error messages, if any,
are generated during the subsequent boots? What is the contents of the
/var/log/messages file after the boot fails? I can't reproduce any such
problems here.



Comment 25 Jason Vas Dias 2005-08-12 01:54:21 UTC
The issues identified in Comment #23 have been investigated, bugs were raised,
and are now fixed with 
system-config-netboot-0.1.26-1_EL4, available from:
  http://people.redhat.com/~jvdias/system-config-netboot

1. Bug 165772 : read-only root filesystem reported as mounted with "rw" option
  It appears that the initscripts package of RHEL-4 requires a "READONLY=yes"
  option in /etc/sysconfig/init in the client root for rc.sysinit not to attempt
  to mount it read-write. 

2. Bug 165735 : diskless clients cannot cope with no DNS PTR record for their IP
address
  It was possible for multiple clients who were unable to determine their 
  host name to mount the same snapshot directory - this is now impossible;
  If clients are unable to determine their host name and have empty 
  SNAPSHOT settings, the IP address is used for the snapshot directory.
  
  It is also now possible to set the client host names with DHCP options
  as described in Bug 165735, and DNS names do not have to be defined for them.


Comment 27 Jason Vas Dias 2005-08-17 22:11:35 UTC
In reply to pkmartin's last comment:

>  1) building diskless boot os fails with a popup message
 
The initrd is being made too small for the kernel-smp and the required modules.

Bug 166217 covers this issue and will be fixed in the next release.

>  2) Adding additional hosts with gui tool, does not seem to reread the 
>     dhcpd.conf

The GUI tool does not read the dhcpd.conf . As the documentation states, users
are expected to configure the DHCP server themselves .  In the future,
system-config-netboot may interface with another GUI tool for DHCP configuration
(enhancement request in Bug 166218) .

The dhcpd.conf configuration you quote would NOT be sending the host-name 
option to boot clients; you'd need to specify the dhcpd.conf option:
   'use-host-decl-names on;'
OR modify the host declarations to contain eg.:
   'host blade4  { ... option host-name "blade4"; ... }'
to make dhcpd send the host declaration names (ie. 'blade4', 'blade5', 'blade6' 
in your example) as the host-name option in leases.

I cannot explain how your host ends up with the name 'reached' if you
are using the current system-config-netboot-0.1.26-1_EL4 version , which
is available from:
  http://people.redhat.com/~jvdias/system-config-netboot/RHEL-4/ .

The previous system-config-netboot versions did not take any account of 
whatever dhcp host-name option may have been sent, and would blindly use
the last word emitted by the 'host $IP_ADDRESS' output, regardless of 
whether the lookup succeeded or not.
So it sounds like you are not using the 0.1.26-1 version, which fixes this
problem ( Bug 165735 ), and moreover, you supply no DHCP domain-name-servers
option, so the clients can reach no nameservers and 'host' emits the message:
  ' ;; connection timed out; no servers could be reached
  '
and the broken system-config-netboot version you are using uses 'reached'
as the host name.
TO FIX : use the fixed 0.1.26-1 version, and make the dhcp server send the
         host-name option, or specify the domain-name-servers option to 
         contain the IP addresses of DNS servers with PTR records for the 
         clients.

Comment 29 Jason Vas Dias 2005-08-24 22:24:32 UTC
RE:
---- Additional Comments From pkmartin.com  2005-08-24 16:48 EDT -------
> Running with 0.1.30-1EL4; the SMP option seems to work fine at this time. have
> the  host name issues worked out with the DHCPD.conf file. (thanks for the
> pointer :) ).
> There is still an XKB config error at login 

You mean:
(EE) Couldn't load XKB keymap, falling back to pre-XKB keymap

It sounds like your /etc/X11/xorg.conf file may be missing or misconfigured.
This file is mounted as a modifiable file under the 
${CLIENT_ROOT_DIR}/snapshot/${HOST_NAME}
directory.
Check that the /etc/X11/xorg.conf file is there on the boot client and 
is correctly configured for the keyboard that is attached to the client -
see 'man xorg.conf' .
>
> and a GNOME errror (looking for the
> hostname/ip) at login, but these go away with the click of a button and so far I
> cant find any long term down side or impact from these errors.  
>
Yes, GNOME will always do a host lookup for the local host name. As you do not
use DNS, this fails. If the boot client's host name as assigned by dhcp is in 
/etc/hosts on the client root, this error should also go away. 
>
> Only thing I would like to quibble about at this time is the /home directory is
> mounted as R/O, so a vanilla user (non root) will be unable to log in with this
> default config, even though he would probably have the home dir mounted else
> where in an NFS environment.
> 
> Please let me know what you think, and thanks for the help..........

You can change how the /home directory is mounted by modifying /etc/fstab 
in the client root - this is also a per-client modifiable file in the snapshot 
directory, and by default contains no separate mount point 
for the /home directory, which is therefore on / .
You can add an /etc/fstab entry like:
'
server:/home_directory    /home    nfs     rw   0   0
'
I'll raise an enhancement bug that the GUI should support configuration of the
home directory mount point.


Comment 30 Ean J. Price 2005-08-25 11:00:09 UTC
(In reply to comment #29)
> RE:

> > There is still an XKB config error at login 
> 
> You mean:
> (EE) Couldn't load XKB keymap, falling back to pre-XKB keymap
> 
> It sounds like your /etc/X11/xorg.conf file may be missing or misconfigured.
> This file is mounted as a modifiable file under the 
> ${CLIENT_ROOT_DIR}/snapshot/${HOST_NAME}
> directory.
> Check that the /etc/X11/xorg.conf file is there on the boot client and 
> is correctly configured for the keyboard that is attached to the client -
> see 'man xorg.conf' 

I also get this error as well.  The xorg.conf file was created on the client
machine, then the whole OS was rsync'ed to the NFS server.  When the client
boots diskless and a user logs in, you get this error.  If I boot off of the
hard disk the error does not come up.  The error is somehow caused by booting
diskless.  Maybe the xorg.conf file needs to be R/W since Xorg can write its own
conf file now?  I will check that out.

> > Only thing I would like to quibble about at this time is the /home directory is
> > mounted as R/O, so a vanilla user (non root) will be unable to log in with this
> > default config, even though he would probably have the home dir mounted else
> > where in an NFS environment.
> > 
> > Please let me know what you think, and thanks for the help..........
> 
> You can change how the /home directory is mounted by modifying /etc/fstab 
> in the client root - this is also a per-client modifiable file in the snapshot 
> directory, and by default contains no separate mount point 
> for the /home directory, which is therefore on / .
> You can add an /etc/fstab entry like:
> '
> server:/home_directory    /home    nfs     rw   0   0
> '
> I'll raise an enhancement bug that the GUI should support configuration of the
> home directory mount point.

I just wanted to weigh in on this one - the automounter is a cleaner way to go
and works flawlessly on diskless clients.

Regards,
Ean



Comment 31 Jason Vas Dias 2005-08-25 15:41:28 UTC
In replay to  Comment #30:

RE: xorg.conf:
The /etc/X11/xorg.conf file is mounted RW on the per-client snapshot directory.
It is up to users to configure this file correctly for their boot client 
(xorg-x11 currently has no xf86config replacement to automate this configuration) . 

RE: /home mount points:
Yes, the automounter also could be used to mount the /home directories,
providing users configure the required maps. 
When bug 166727 is fixed, the implementation should give users the opportunity
to use either NFS or to configure automount maps for the /home directories.

Comment 32 Ean J. Price 2005-08-25 16:10:46 UTC
(In reply to comment #31)
> RE: xorg.conf:
> The /etc/X11/xorg.conf file is mounted RW on the per-client snapshot directory.

Yup.  I didn't realize that but I see it now.  So then thats not the problem.  I
just diff'ed the xorg.conf on the client and the diskless version on the NFS
server is the same as the one that is on the harddrive.  I re-rsynced just to be
certain.

> It is up to users to configure this file correctly for their boot client 
> (xorg-x11 currently has no xf86config replacement to automate this
configuration) . 

'system-config-display --noui' will do the trick.  It automagically builds an
xorg.conf using the maximum resolution that your display adapter and monitor
will support ( new in RHEL4,FC2/3 ).

Comment 33 Red Hat Bugzilla 2005-10-05 13:47:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-484.html



Note You need to log in before you can comment on or make changes to this bug.