Bug 225479

Summary:	initrd nash network --device eth0 --bootproto dhcp fails during PXE boot before NFS mount of root file system
Product:	[Fedora] Fedora	Reporter:	Alexander Aminoff <aminoff>
Component:	mkinitrd	Assignee:	Peter Jones <pjones>
Status:	CLOSED CURRENTRELEASE	QA Contact:	David Lawrence <dkl>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6	CC:	d.lesca, grhodes40, johnh, lkoven, rhbz, rhladik, terry1
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	fc7	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-11-13 19:40:17 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Alexander Aminoff 2007-01-30 20:34:58 UTC

Description of problem:

Using the latest mkinitrd, the PXE boot loader gets its IP address from DHCP 
just fine, but once kernel is running and initrd is loaded, the nash script 
init runs the command

network --device eth0 --bootproto dhcp

and we see error messages listed below under actual results


Version-Release number of selected component (if applicable):

kernel 2.6.19-1.2895.fc6
mkinitrd-5.1.19.0.2-1
nash-5.1.19.0.2-1

This problem did not exist with vmlinuz-2.6.18-1.2869.fc6 and the then 
concurrent version of mkinitrd

How reproducible: always

Steps to Reproduce:
1. Have a diskless booting system that loads kernel and initrd from a tftp 
server using pxelinux
2. mkinitrd --with=nfs --fstab=<fstab showing NFS-mounted root FS>
3. put resulting initrd in tftp server's /tftpboot
4. boot
  
Actual results:

Bringing up eth0
waiting for link... 0 seconds
...
DHCPDISCOVER...
send_packet: Network is down

Expected results:

network interface comes up, root FS mounts over NFS

Additional info:

It seems like the command "network" in the nash script "init" in the initrd 
must be a built-in nash command, because there is no separate "network" 
executable in the initrd. So I would assign this as a bug to nash except that 
nash is not listed as a component.

Comment 1 Grady Rhodes 2007-02-03 00:10:42 UTC

Please try the work-around described in bug# 225363 comment #4 to see if it will
fix your problem.

Comment 2 Radek Hladik 2007-02-05 23:04:11 UTC

I have the same problem (on i686) with Realtek 8139. It looks like nash network
builtin do not bring interface up. 
I've added "busybox ifconfig eth0 up" before nash network and it said "Link Up"
but the "nash network" put it back down. I've added "busybox msh" after it and
interface was down. 
Simple "ifconfig eth0 up" "ifconfig eth0 10.1.1.1" worked and I was able to ping.

Comment 3 Alexander Aminoff 2007-02-06 15:37:39 UTC

Querstions for rhladik:

You say you added "busybox ifconfig eth0 up". Was this something you had as 
part of your nash script before seeing this bug, or something you did as part 
of bug investigation? Also, I presume by the way you brought it up eventually 
that you are not using DHCP but rather a simple static IP address.

If that is true, then the common thread between the behavior you are seeing 
and mine is that the NIC already is up and has an address when you reach 
"network ..." in the nash script.

Comment 4 Alexander Aminoff 2007-02-06 15:39:53 UTC

> Please try the work-around described in bug# 225363 comment #4 to see if it
> will fix your problem.

Thanks, but we are not using selinux at all, so that workaround is unlikely to 
help (and we can not apply it since we do not have selinux).

Comment 5 Radek Hladik 2007-02-07 01:13:15 UTC

(In reply to comment #3)
> Querstions for rhladik:
> 
> You say you added "busybox ifconfig eth0 up". Was this something you had as 
> part of your nash script before seeing this bug, or something you did as part 
> of bug investigation? Also, I presume by the way you brought it up eventually 
> that you are not using DHCP but rather a simple static IP address.
> 
> If that is true, then the common thread between the behavior you are seeing 
> and mine is that the NIC already is up and has an address when you reach 
> "network ..." in the nash script.
> 

Short version :I've added the busybox as part of investigation. And I'm using
DHCP but for this one case I've used static IP.

Long version:
I'm testing boot from iSCSI root device, so I've forced the mkinitrd script to
create ramdisk with network and iSCSI support. As it was not working, I've tried
to localize the problem and I've added busybox with shell to initrd and let it
execute between the nash network command and iscsistart. I've found out, that
nash network is stating :

bringing up eth0
sending DHCP request on LPF/eth0/....
Unable to send packet: Network down!
but the interface is actualy down, so I issued "busybox ifconfig eth0 up ..."
and the network started to work. So I've type exit to continue boot process and
iscsistart mounted root successfully. 
I've tried to add "busybox ifconfig static ip" before the "nash network", it
said interface is up (Something like: Link Up, 100Mbps, FullDuplex). 
But nash network was again unable to send packet :-)
So I've added "ifconfig eth0 up static addr" to initrd and moved to testing
iSCSI again. (this took me quite a while and it was not primary task to do).

I'm using DHCP but for this simple case I've used static IP to not complicate it
with running dhcp client daemon from initrd,etc.... 

If iSCSI root will work enough I will return to this problem and will try to
solve it in more correct way.

Comment 6 Terry Barnaby 2007-02-26 20:17:50 UTC

This sounds exactly the same problem I am having.
In my case I have created an updated Fedora Core Distribution using the latest
2.6.19 kernel and have created an ISO boot CD to boot the system and then
install accross the network using an NFS mount.
In my case the installation fails when the Ethernet interface, an E1000, fails
to send packets during DHCP requests with teh same error messages as reported above.

In my case I have just changed the kernel package back to a 2.6.18 and all
worked fine, no changes to nash. Note that I have also booted the same 2.6.19
kernel using a PXE boot and used busybox to loat the E1000 module and bring up
the eth0 interface with ifconfig and had no problems.

So there is more to it than just the kernel. Seems like a nash/kernel
interaction problem ....

Comment 7 John Hodrien 2007-03-05 09:40:59 UTC

Just tried out stateless linux with the latest 2-6-19-1.2911.fc6 kernel and boot
fails as described.  This is using the forcedeth network driver.

Comment 8 John Hodrien 2007-03-09 16:30:29 UTC

I note when successful with 2869 it outputs:

eth0: no link during initialization
eth0: link up

This is notable by its absence in the equivalent 2895 boot.

Comment 9 Leigh Koven 2007-03-23 13:30:14 UTC

Just adding a 'me too' to this. However, we're not doing netboot of machines.
We're running into this exact issue using an updated install image using
kernel-2.6.20-1.2925.fc6 and nash-5.1.19.0.3-1 (and all other updates as of
03/22/2007) on x86_64. We're booting the kernel and initrd off cd (or usb flash
drive) and kickstarting off an NFS server and it cannot get a DHCP IP. Console
shows the same errors Alex reported. I also can confirm that the 'stock'
installer image works fine.

Comment 10 Dr. Tilmann Bubeck 2007-04-29 08:54:14 UTC

I can confirm, that the exact same nfsroot initrd works with 2.6.18-1.2798.fc6.
However without changing anything except the kernel version to 2.6.20-1.2944.fc6
it breaks as described by the initial posting.

Comment 11 John Hodrien 2007-07-20 13:03:52 UTC

All works with no fuss with the current FC7 kernel (2.6.21-1.3228.fc7).

Comment 12 Boris B. Zhmurov 2007-10-17 20:49:17 UTC

So does anybody solved that issue? Is there any workarounds?

Comment 13 John Hodrien 2007-10-18 08:11:26 UTC

There's the obvious upgrade (upgrade to F7).  Not that nash works right even
with RHEL5.  I've found --mtu 9000 fails to do anything useful, and I have to
inject busybox in there to get the job done.

The *real* solution, which I hope to see in the not so distant future, is to get
rid of nash altogether.

Comment 14 Boris B. Zhmurov 2007-10-18 12:20:10 UTC

(In reply to comment #13)
> There's the obvious upgrade (upgrade to F7).  Not that nash works right even
> with RHEL5.  I've found --mtu 9000 fails to do anything useful, and I have to
> inject busybox in there to get the job done.

John, thank you very much for answering, but could you please provide me, how
exactly can I use busybox, what should I do? I can't upgrade to FC7 and I have
to use FC6 for a while. Thanks again.

Comment 15 Bryan Mason 2007-11-13 18:54:38 UTC

Boris - One way I found to work around this was to install
kernel-2.6.20-1.2944.fc6 in Fedora 7 and then use the F7 mkinitrd command to
generate the initrd for the FC6 kernel.  To accomplish this, I downloaded the
appropriate FC6 kernel RPM into F7, and then installed it manually with RPM.  It
worked perfectly once I copied the resulting initrd to my TFTP server.

One thing to remember is to edit grub.conf after installing the FC6 kernel into
F7 so that your F7 installation doesn't try to actually boot the kenel.

Hope that helps.