Bug 503109 - nash does not wait till dhclient is done configuring the nic
nash does not wait till dhclient is done configuring the nic
Status: CLOSED DUPLICATE of bug 501033
Product: Fedora
Classification: Fedora
Component: mkinitrd (Show other bugs)
rawhide
i386 Linux
low Severity high
: ---
: ---
Assigned To: David Cantrell
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-05-28 18:09 EDT by Rolf Fokkens
Modified: 2009-06-01 08:43 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-06-01 08:43:14 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
init of initrd-2.6.29.4-167.fc11.i586.img generated by f10 (2.11 KB, text/plain)
2009-05-29 13:57 EDT, Rolf Fokkens
no flags Details
init of initrd-2.6.29.4-167.fc11.i586.img generated by f11 (2.44 KB, text/plain)
2009-05-29 13:57 EDT, Rolf Fokkens
no flags Details

  None (edit)
Description Rolf Fokkens 2009-05-28 18:09:39 EDT
Description of problem:
After upgrading an F10 system to F11 the initrd fails to boot the system from iscsi.

Version-Release number of selected component (if applicable):
mkinitrd-6.0.85-1.fc11.i586
kernel-2.6.27.24-170.2.68.fc10.i686
nash-6.0.85-1.fc11.i586
dhclient-4.1.0-20.fc11.i586

How reproducible:
100% for me

Steps to Reproduce:
1. Upgrade a diskless iscsi F10 system to F11-Preview
2. Boot from the newly installed F11 kernel
3. Watch the boot hang around something like "eth0: link up".
  
Actual results:
Hanging boot, ending in "eth0: link up".
Further analysis shows that "Bringing up eth0" messages (and later messages ) disappear due to Bug 496895 (this is a non graphic boot). This bug hides the fact that after the second "Bringing up eth0" a message shows up like "dhclient (666): already running" followed by a lot of confusion about the fact that the iscisi initiator cannot connect to the target. Apparently the interfaces is not brought up properly.
The mentioned information shows up after removing the plymouth lines from the init script in the initrd.

Expected results:
Flawless boot.

Additional info:
I installed the 2.6.29.4-167.fc11.i586 kernel both on a fc10 and a fc11 system (actually the same system, different iscsi boot image). The initrd created on fc11 hangs during boot as described. The initrd created on fc10 does boot flawlessly, it does even boot the fc11 system.

This makes me think that the problem is in the created initrd and not in the kernel. It may be an mkinitrd problem, or a problem in the components that make up initrd like nash and dhclient.
Comment 1 Hans de Goede 2009-05-29 03:54:13 EDT
Hmm, bummer, can you extract the init script from the working and non working initrd's please and attach them both ?

To extract the init script do:
zcat /boot/initrd-<kernel-version>.img | cpio -i init
Comment 2 Rolf Fokkens 2009-05-29 13:57:18 EDT
Created attachment 345934 [details]
init of initrd-2.6.29.4-167.fc11.i586.img generated by f10
Comment 3 Rolf Fokkens 2009-05-29 13:57:45 EDT
Created attachment 345935 [details]
init of initrd-2.6.29.4-167.fc11.i586.img generated by f11
Comment 4 Rolf Fokkens 2009-05-29 18:54:55 EDT
I did a little more research. It's not the init script: both the f10 generated and the f11 gererated initscript make the boot fail on a f11 generated initrd.

The problem seems to be nash. f11 nash has no integrated dhcp client, and apparently calls dhclient instead. It looks like (after some tweaking of the init script to get any output at all, see Bug 496895) dhclient isn't ready configuring the interface at the time the first iscsistart is called. I have an example of the first iscsistart failing (so the root filesystem isn't accessible) but the second iscsistart succeeding (so the swap space is available). This suggests some sort of timing problem with the real dhclient, but not with the integrated nash dhcpclient of f11.
Comment 5 Hans de Goede 2009-05-31 05:58:02 EDT
Ah, ok. I'll update the summary of this bug this and re-assign this to our
network specialist who also is the one who made the libdhcpclient ->
use dhclient changes.

David, it looks like the libdhcpclient -> use dhclient changes in nash cause
issues with certain setups (The nash script continues while dhclient isn't ready with configuring the interface yet).
Comment 6 Rolf Fokkens 2009-05-31 06:59:20 EDT
I tried to verify my own observations, and now it narrows down to something else: dhclient does not create a default gateway, but F10 nash does.

The abserved timing issue was caused by a different target IP address for the root target and the swap target (a mistake caused by hacking the initrd to get more information). Both IP's are from te same iscsi target, but in different subnets.
Comment 7 Hans de Goede 2009-06-01 08:43:14 EDT
(In reply to comment #6)
> I tried to verify my own observations, and now it narrows down to something
> else: dhclient does not create a default gateway, but F10 nash does.
> 
> The abserved timing issue was caused by a different target IP address for the
> root target and the swap target (a mistake caused by hacking the initrd to get
> more information). Both IP's are from te same iscsi target, but in different
> subnets.  

Ah so this is a duplicate of bug 501033, marking it as such.

*** This bug has been marked as a duplicate of bug 501033 ***

Note You need to log in before you can comment on or make changes to this bug.