110036 – kickstart fails to get kickstartfile when using e1000 network driver

Bug 110036 - kickstart fails to get kickstartfile when using e1000 network driver

Summary: kickstart fails to get kickstartfile when using e1000 network driver

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	anaconda
Sub Component:
Version:	1
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jeremy Katz
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-11-14 11:02 UTC by Guenther Seybold
Modified:	2007-11-30 22:10 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-10-05 15:37:43 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
fix the bug. (2.18 KB, patch) 2003-11-14 11:08 UTC, Guenther Seybold	no flags	Details \| Diff
Patch file for pump library (422 bytes, patch) 2004-07-09 00:19 UTC, Trevin Beattie	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2004:518	0	normal	SHIPPED_LIVE	Updated anaconda and pump packages	2004-12-21 05:00:00 UTC

Description Guenther Seybold 2003-11-14 11:02:51 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1)
Gecko/20030225

Description of problem:
When we tried to install Fedora Core 1 via kickstart, we found
problems with 2 systems. Both systems use Intel Gigabit network
adapters, so we conclude that the e1000 driver exposes the
problems in anaconda: during initial phase of kickstart, anaconda
has insufficient network error recovery.
Anaconda uses UDP based network services, and since UDP is unsafe
transport by definition, anaconda is responsible for error
recovery.
The enclosed patch fixes the problem, and the activated logMessage
calls document the problem. Please see the following excerpt from
anaconda.log (on a system with e1000):
...
* probing buses
* finished bus probing
* modules to insert e1000 aic79xx
* loaded e1000 from /modules/modules.cgz
* loaded aic79xx from /modules/modules.cgz
* inserted /tmp/e1000.o
* inserted /tmp/aic79xx.o
* load module set done
* getting kickstart file
* sending dhcp request through device eth0
* waiting for link...
* 0 seconds.
* doing kickstart... setting it up
* url is 192.67.55.2:/fedora-1/192.67.55.58-kickstart
* file location: nfs://192.67.55.2:/fedora-1/192.67.55.58-kickstart
* calling nfsmount(192.67.55.2:/fedora-1, /tmp/mnt, &flags,
&extra_opts, &mount_opt, 0)
* pmap_getmaps failed, retrying
* pmap_getmaps success on 2. attempt ...
* calling mount(192.67.55.2:/fedora-1, /tmp/mnt, nfs, c0ed0001, 0x80b4420)
* setting up kickstart
* kickstartFromNfs
...
Without the patch, pmap_getmaps fails, is never retried, and the kickstart
file will not be found.

With RedHat 8.0, the problem was not observed.

With RedHat 9, the problem was reported under bugid 103952, and also
under bugid 104345. The patches which have been supplied with the two
bug reports have obviously been partially implemented for Fedora 1,
but a significant part of the problem still exists.

For Fedora 1, the enclosed patch is needed to fix this problem.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.try to kickstart via Intel gigabit ethernet adapter
2.using NFS method to access kickstart file
3.
    

Actual Results:  kickstart file not found

Expected Results:  kickstart file should be accessed via NFS.

Additional info:

Comment 1 Guenther Seybold 2003-11-14 11:08:05 UTC

Created attachment 95966 [details]
fix the bug.

Comment 2 Trevin Beattie 2004-07-06 22:11:56 UTC

I can confirm that this bug exists in anaconda-9.1.2 (RHEL WS3 u2) and
that the above patch does work around the problem.  The root of the
problem appears to be a long delay between the time the ethernet
interface is brought up (BCM 5704 NetXtreme) and when packets can
actually be sent over the network.

Testing reveals that anaconda brings up the interface twice: once to
fetch DHCP data to initialize the interface, and again to mount the
NFS directory holding the kickstart file.  The DHCP part worked, but
the initial NFS mount failed.

After applying the patch, anaconda finally succeeded in reading the
kickstart file.  The following message was shown on the diagnostic screen:

* pmap_getmaps success on 6. attempt ...

Comment 3 Jeremy Katz 2004-07-06 22:29:02 UTC

Can you try using the initrd located at
http://people.redhat.com/~katzj/initrd-link-2-i386.img with RHEL3 U2
(only really useful if you're doing a pxeboot, if you're boot another
way, you'll need to make a new boot.iso) and see if it helps?  It
takes a different approach toward working around the problem.

Comment 4 Trevin Beattie 2004-07-07 18:09:46 UTC

Using the bcm5700 driver, all I get is the DHCP packets, then it falls
back into interactive setup.  "Failed to mount nfs source" (no other
messages).

Using your unaltered initrd (with the tg3 driver), the DHCP part is a
little different.  All I get are a couple of broadcast DHCP Discover
packets from the client, followed by a pair of DHCP Offer packets from
the server.  There is no DHCP Request from the client (which should
have followed the offer); instead, the client prompts the user to
"Configure TCP/IP".  A second attempt at setting up the interface via
DHCP succeeds, but then as before it drops into interactive setup mode.

Comment 5 Trevin Beattie 2004-07-09 00:19:21 UTC

Created attachment 101735 [details]
Patch file for pump library

I believe I have found a much simpler solution to the problem.	It works for
both NFS and HTTP kickstarts (both of which suffer the same symptoms), and only
requires changing a single line of code.

As I stated before, the loader brings up the interface twice before grabbing
the kickstart file.  This is unnecessary.  I have traced the redundancy to the
pump library in pumpSetupInterface(), which starts off by disabling the
interface, then it sets the interface IP address, and brings it right back up
again.	AFAIK, it is not necessary to disable the interface before setting or
changing its IP address.  So I simply commented out the call to
pumpDisableInterface.  Then I rebuilt both pump and anaconda.

The result works great.  There is no additional delay after fetching the DHCP
configuration; instead, it reads the kickstart file immediately.  No retries
needed.  Works with both NFS and HTTP kickstarts (and I would assump FTP as
well).	Should work on any other system with an ethernet card that takes a long
time to initialize.

So is there any reason that pumpDisableInterface should be left in there?  (I
noticed it's used in several other places as well.)

Comment 6 Need Real Name 2004-08-07 00:05:49 UTC

I had the same problems with an HP Proliant dl360g3 on an HP procurve 
switch 2648.  I've seen it with Dell PowerEdge Servers and 
workstaions using the dell Power Connect switches as well.

I tried the initrd posted in this thread fixes the issue with the 
prolaint systems.  I will test on the dell ones.

Comment 7 Need Real Name 2004-08-07 01:19:57 UTC

Ok it was working, then it failed. Seems like a wierd timeout problem 
with the network as ppl have mentioned previously.  I did the 
following:

default bare-rhel3ws
label bare-rhel3ws
        kernel vmlinuz
        append ip=dhcp gateway=192.168.1.254 ksdevice=eth0 
ks=http://192.168.1.254/kickstart/rhel3ws/bare.cfg initrd=initrd-link-
2-i386.img nofb text utf8 ramdisk_size=100000 root=/dev/ram 
devfs=nomount

And so far working on two consecuvitve installs.  Will try more next 
week.

Comment 8 Joshua Weage 2004-08-10 15:20:39 UTC

I can verify that this is still a problem with anaconda-9.1.2-2.RHEL
(as shipped by Whitebox Linux, but it should be the same as RHEL U2).
 I have applied the pump patch and am rebuilding the boot cd to see if
that fixes the problem.

Comment 9 Joshua Weage 2004-08-11 13:34:39 UTC

The patch posted here for the pump library does seem to fix the
problem I had kickstarting with an e1000 interface.

Comment 10 Samuel Flory 2004-09-08 19:53:25 UTC

  I'm seeing the same issue on a Tyan 2735, but the updated initrd
doesn't seem to fix it.  Nor does RHEL AS U3.

Comment 11 Neil Horman 2004-09-14 19:49:24 UTC

this sounds a bit like 131475

Comment 12 Need Real Name 2004-09-16 16:39:17 UTC

Somtimes it works on U2 for i386 with the provided initrd.
I found that powering off the box and ensuring the switch (hp procurve
or dell power connect) don't have the mac in there, then powering up
to install will successfully install.

Comment 13 Jeremy Katz 2004-09-16 22:44:02 UTC

For RHEL3 U3, I have an updated initrd available at
http://people.redhat.com/~katzj/u3-test.img that might fix things. 
For Fedora, expect that the fix will percolate out in the week after
FC3 test2 is released.  Any confirmation of this helping would be
appreciated.

Comment 14 Need Real Name 2004-09-16 23:47:46 UTC

new update for dell power connect issue using U3 initrd images:

1) turn off spanning tree
and
2) turn on spanning tree port fast for all ports

Don't ask me why, but it works for U3 with x86_64 and i386 initrd 
images.

I will look into a similar feature in the hp procurve switches.
If you have a different type of switch I would suggest doing 
something similar to test.

Comment 15 Joshua Weage 2004-09-19 17:39:46 UTC

Spanning tree port fast should be disabled on Cisco switches as well
to make network kickstarting work.  If it isn't disabled, some of the
DHCP packets are blocked.

However, this didn't fix the e1000 & anaconda problem in RH 9 - RHEL U2.

Comment 16 Samuel Flory 2004-09-24 22:19:08 UTC

The current u3 appears to work for me if my cisco switch is configured
correctly.

Comment 17 Jeremy Katz 2004-10-05 15:37:43 UTC

Sounds like the current crop of issues should be resolved.  If anyone
continues to see problems like this in Fedora Core 3 test3 or RHEL3 U4
or later, please file a new report so that we can investigate and
track further things down.

Comment 18 Need Real Name 2004-11-08 17:58:12 UTC

There is a long dhcp request still occuring with the tg3 driver and 
hp procurve switches.  I tried setting the siwtches to the settings 
as I wrote in above about the dell switches, but it doesn't seem to 
be working.  I thought there was some sample initrd's for x86_64 to 
checkout as well?

Comment 19 John Flanagan 2004-12-21 14:49:31 UTC

An advisory has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2004-518.html

Comment 20 root 2005-03-23 04:59:46 UTC

This bug still appears to exist in RHEL4
(https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151872)

Comment 21 Brian Baker 2005-03-29 21:02:41 UTC

Is there a way to integrate this anaconda package into an Update 2 install to 
avoid this problem?

Note You need to log in before you can comment on or make changes to this bug.