Red Hat Bugzilla – Bug 503109
nash does not wait till dhclient is done configuring the nic
Last modified: 2009-06-01 08:43:14 EDT
Description of problem:
After upgrading an F10 system to F11 the initrd fails to boot the system from iscsi.
Version-Release number of selected component (if applicable):
100% for me
Steps to Reproduce:
1. Upgrade a diskless iscsi F10 system to F11-Preview
2. Boot from the newly installed F11 kernel
3. Watch the boot hang around something like "eth0: link up".
Hanging boot, ending in "eth0: link up".
Further analysis shows that "Bringing up eth0" messages (and later messages ) disappear due to Bug 496895 (this is a non graphic boot). This bug hides the fact that after the second "Bringing up eth0" a message shows up like "dhclient (666): already running" followed by a lot of confusion about the fact that the iscisi initiator cannot connect to the target. Apparently the interfaces is not brought up properly.
The mentioned information shows up after removing the plymouth lines from the init script in the initrd.
I installed the 126.96.36.199-167.fc11.i586 kernel both on a fc10 and a fc11 system (actually the same system, different iscsi boot image). The initrd created on fc11 hangs during boot as described. The initrd created on fc10 does boot flawlessly, it does even boot the fc11 system.
This makes me think that the problem is in the created initrd and not in the kernel. It may be an mkinitrd problem, or a problem in the components that make up initrd like nash and dhclient.
Hmm, bummer, can you extract the init script from the working and non working initrd's please and attach them both ?
To extract the init script do:
zcat /boot/initrd-<kernel-version>.img | cpio -i init
Created attachment 345934 [details]
init of initrd-188.8.131.52-167.fc11.i586.img generated by f10
Created attachment 345935 [details]
init of initrd-184.108.40.206-167.fc11.i586.img generated by f11
I did a little more research. It's not the init script: both the f10 generated and the f11 gererated initscript make the boot fail on a f11 generated initrd.
The problem seems to be nash. f11 nash has no integrated dhcp client, and apparently calls dhclient instead. It looks like (after some tweaking of the init script to get any output at all, see Bug 496895) dhclient isn't ready configuring the interface at the time the first iscsistart is called. I have an example of the first iscsistart failing (so the root filesystem isn't accessible) but the second iscsistart succeeding (so the swap space is available). This suggests some sort of timing problem with the real dhclient, but not with the integrated nash dhcpclient of f11.
Ah, ok. I'll update the summary of this bug this and re-assign this to our
network specialist who also is the one who made the libdhcpclient ->
use dhclient changes.
David, it looks like the libdhcpclient -> use dhclient changes in nash cause
issues with certain setups (The nash script continues while dhclient isn't ready with configuring the interface yet).
I tried to verify my own observations, and now it narrows down to something else: dhclient does not create a default gateway, but F10 nash does.
The abserved timing issue was caused by a different target IP address for the root target and the swap target (a mistake caused by hacking the initrd to get more information). Both IP's are from te same iscsi target, but in different subnets.
(In reply to comment #6)
> I tried to verify my own observations, and now it narrows down to something
> else: dhclient does not create a default gateway, but F10 nash does.
> The abserved timing issue was caused by a different target IP address for the
> root target and the swap target (a mistake caused by hacking the initrd to get
> more information). Both IP's are from te same iscsi target, but in different
Ah so this is a duplicate of bug 501033, marking it as such.
*** This bug has been marked as a duplicate of bug 501033 ***