Red Hat Bugzilla – Bug 225479
initrd nash network --device eth0 --bootproto dhcp fails during PXE boot before NFS mount of root file system
Last modified: 2007-11-30 17:11:54 EST
Description of problem:
Using the latest mkinitrd, the PXE boot loader gets its IP address from DHCP
just fine, but once kernel is running and initrd is loaded, the nash script
init runs the command
network --device eth0 --bootproto dhcp
and we see error messages listed below under actual results
Version-Release number of selected component (if applicable):
This problem did not exist with vmlinuz-2.6.18-1.2869.fc6 and the then
concurrent version of mkinitrd
How reproducible: always
Steps to Reproduce:
1. Have a diskless booting system that loads kernel and initrd from a tftp
server using pxelinux
2. mkinitrd --with=nfs --fstab=<fstab showing NFS-mounted root FS>
3. put resulting initrd in tftp server's /tftpboot
Bringing up eth0
waiting for link... 0 seconds
send_packet: Network is down
network interface comes up, root FS mounts over NFS
It seems like the command "network" in the nash script "init" in the initrd
must be a built-in nash command, because there is no separate "network"
executable in the initrd. So I would assign this as a bug to nash except that
nash is not listed as a component.
Please try the work-around described in bug# 225363 comment #4 to see if it will
fix your problem.
I have the same problem (on i686) with Realtek 8139. It looks like nash network
builtin do not bring interface up.
I've added "busybox ifconfig eth0 up" before nash network and it said "Link Up"
but the "nash network" put it back down. I've added "busybox msh" after it and
interface was down.
Simple "ifconfig eth0 up" "ifconfig eth0 10.1.1.1" worked and I was able to ping.
Querstions for rhladik:
You say you added "busybox ifconfig eth0 up". Was this something you had as
part of your nash script before seeing this bug, or something you did as part
of bug investigation? Also, I presume by the way you brought it up eventually
that you are not using DHCP but rather a simple static IP address.
If that is true, then the common thread between the behavior you are seeing
and mine is that the NIC already is up and has an address when you reach
"network ..." in the nash script.
> Please try the work-around described in bug# 225363 comment #4 to see if it
> will fix your problem.
Thanks, but we are not using selinux at all, so that workaround is unlikely to
help (and we can not apply it since we do not have selinux).
(In reply to comment #3)
> Querstions for rhladik:
> You say you added "busybox ifconfig eth0 up". Was this something you had as
> part of your nash script before seeing this bug, or something you did as part
> of bug investigation? Also, I presume by the way you brought it up eventually
> that you are not using DHCP but rather a simple static IP address.
> If that is true, then the common thread between the behavior you are seeing
> and mine is that the NIC already is up and has an address when you reach
> "network ..." in the nash script.
Short version :I've added the busybox as part of investigation. And I'm using
DHCP but for this one case I've used static IP.
I'm testing boot from iSCSI root device, so I've forced the mkinitrd script to
create ramdisk with network and iSCSI support. As it was not working, I've tried
to localize the problem and I've added busybox with shell to initrd and let it
execute between the nash network command and iscsistart. I've found out, that
nash network is stating :
bringing up eth0
sending DHCP request on LPF/eth0/....
Unable to send packet: Network down!
but the interface is actualy down, so I issued "busybox ifconfig eth0 up ..."
and the network started to work. So I've type exit to continue boot process and
iscsistart mounted root successfully.
I've tried to add "busybox ifconfig static ip" before the "nash network", it
said interface is up (Something like: Link Up, 100Mbps, FullDuplex).
But nash network was again unable to send packet :-)
So I've added "ifconfig eth0 up static addr" to initrd and moved to testing
iSCSI again. (this took me quite a while and it was not primary task to do).
I'm using DHCP but for this simple case I've used static IP to not complicate it
with running dhcp client daemon from initrd,etc....
If iSCSI root will work enough I will return to this problem and will try to
solve it in more correct way.
This sounds exactly the same problem I am having.
In my case I have created an updated Fedora Core Distribution using the latest
2.6.19 kernel and have created an ISO boot CD to boot the system and then
install accross the network using an NFS mount.
In my case the installation fails when the Ethernet interface, an E1000, fails
to send packets during DHCP requests with teh same error messages as reported above.
In my case I have just changed the kernel package back to a 2.6.18 and all
worked fine, no changes to nash. Note that I have also booted the same 2.6.19
kernel using a PXE boot and used busybox to loat the E1000 module and bring up
the eth0 interface with ifconfig and had no problems.
So there is more to it than just the kernel. Seems like a nash/kernel
interaction problem ....
Just tried out stateless linux with the latest 2-6-19-1.2911.fc6 kernel and boot
fails as described. This is using the forcedeth network driver.
I note when successful with 2869 it outputs:
eth0: no link during initialization
eth0: link up
This is notable by its absence in the equivalent 2895 boot.
Just adding a 'me too' to this. However, we're not doing netboot of machines.
We're running into this exact issue using an updated install image using
kernel-2.6.20-1.2925.fc6 and nash-126.96.36.199.3-1 (and all other updates as of
03/22/2007) on x86_64. We're booting the kernel and initrd off cd (or usb flash
drive) and kickstarting off an NFS server and it cannot get a DHCP IP. Console
shows the same errors Alex reported. I also can confirm that the 'stock'
installer image works fine.
I can confirm, that the exact same nfsroot initrd works with 2.6.18-1.2798.fc6.
However without changing anything except the kernel version to 2.6.20-1.2944.fc6
it breaks as described by the initial posting.
All works with no fuss with the current FC7 kernel (2.6.21-1.3228.fc7).
So does anybody solved that issue? Is there any workarounds?
There's the obvious upgrade (upgrade to F7). Not that nash works right even
with RHEL5. I've found --mtu 9000 fails to do anything useful, and I have to
inject busybox in there to get the job done.
The *real* solution, which I hope to see in the not so distant future, is to get
rid of nash altogether.
(In reply to comment #13)
> There's the obvious upgrade (upgrade to F7). Not that nash works right even
> with RHEL5. I've found --mtu 9000 fails to do anything useful, and I have to
> inject busybox in there to get the job done.
John, thank you very much for answering, but could you please provide me, how
exactly can I use busybox, what should I do? I can't upgrade to FC7 and I have
to use FC6 for a while. Thanks again.
Boris - One way I found to work around this was to install
kernel-2.6.20-1.2944.fc6 in Fedora 7 and then use the F7 mkinitrd command to
generate the initrd for the FC6 kernel. To accomplish this, I downloaded the
appropriate FC6 kernel RPM into F7, and then installed it manually with RPM. It
worked perfectly once I copied the resulting initrd to my TFTP server.
One thing to remember is to edit grub.conf after installing the FC6 kernel into
F7 so that your F7 installation doesn't try to actually boot the kenel.
Hope that helps.