Bug 110036
Summary: | kickstart fails to get kickstartfile when using e1000 network driver | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Guenther Seybold <guenther.seybold> | ||||||
Component: | anaconda | Assignee: | Jeremy Katz <katzj> | ||||||
Status: | CLOSED ERRATA | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 1 | CC: | brian.b, cdmaest, herrold, sflory, tao | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i386 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2004-10-05 15:37:43 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Guenther Seybold
2003-11-14 11:02:51 UTC
Created attachment 95966 [details]
fix the bug.
I can confirm that this bug exists in anaconda-9.1.2 (RHEL WS3 u2) and that the above patch does work around the problem. The root of the problem appears to be a long delay between the time the ethernet interface is brought up (BCM 5704 NetXtreme) and when packets can actually be sent over the network. Testing reveals that anaconda brings up the interface twice: once to fetch DHCP data to initialize the interface, and again to mount the NFS directory holding the kickstart file. The DHCP part worked, but the initial NFS mount failed. After applying the patch, anaconda finally succeeded in reading the kickstart file. The following message was shown on the diagnostic screen: * pmap_getmaps success on 6. attempt ... Can you try using the initrd located at http://people.redhat.com/~katzj/initrd-link-2-i386.img with RHEL3 U2 (only really useful if you're doing a pxeboot, if you're boot another way, you'll need to make a new boot.iso) and see if it helps? It takes a different approach toward working around the problem. Using the bcm5700 driver, all I get is the DHCP packets, then it falls back into interactive setup. "Failed to mount nfs source" (no other messages). Using your unaltered initrd (with the tg3 driver), the DHCP part is a little different. All I get are a couple of broadcast DHCP Discover packets from the client, followed by a pair of DHCP Offer packets from the server. There is no DHCP Request from the client (which should have followed the offer); instead, the client prompts the user to "Configure TCP/IP". A second attempt at setting up the interface via DHCP succeeds, but then as before it drops into interactive setup mode. Created attachment 101735 [details]
Patch file for pump library
I believe I have found a much simpler solution to the problem. It works for
both NFS and HTTP kickstarts (both of which suffer the same symptoms), and only
requires changing a single line of code.
As I stated before, the loader brings up the interface twice before grabbing
the kickstart file. This is unnecessary. I have traced the redundancy to the
pump library in pumpSetupInterface(), which starts off by disabling the
interface, then it sets the interface IP address, and brings it right back up
again. AFAIK, it is not necessary to disable the interface before setting or
changing its IP address. So I simply commented out the call to
pumpDisableInterface. Then I rebuilt both pump and anaconda.
The result works great. There is no additional delay after fetching the DHCP
configuration; instead, it reads the kickstart file immediately. No retries
needed. Works with both NFS and HTTP kickstarts (and I would assump FTP as
well). Should work on any other system with an ethernet card that takes a long
time to initialize.
So is there any reason that pumpDisableInterface should be left in there? (I
noticed it's used in several other places as well.)
I had the same problems with an HP Proliant dl360g3 on an HP procurve switch 2648. I've seen it with Dell PowerEdge Servers and workstaions using the dell Power Connect switches as well. I tried the initrd posted in this thread fixes the issue with the prolaint systems. I will test on the dell ones. Ok it was working, then it failed. Seems like a wierd timeout problem with the network as ppl have mentioned previously. I did the following: default bare-rhel3ws label bare-rhel3ws kernel vmlinuz append ip=dhcp gateway=192.168.1.254 ksdevice=eth0 ks=http://192.168.1.254/kickstart/rhel3ws/bare.cfg initrd=initrd-link- 2-i386.img nofb text utf8 ramdisk_size=100000 root=/dev/ram devfs=nomount And so far working on two consecuvitve installs. Will try more next week. I can verify that this is still a problem with anaconda-9.1.2-2.RHEL (as shipped by Whitebox Linux, but it should be the same as RHEL U2). I have applied the pump patch and am rebuilding the boot cd to see if that fixes the problem. The patch posted here for the pump library does seem to fix the problem I had kickstarting with an e1000 interface. I'm seeing the same issue on a Tyan 2735, but the updated initrd doesn't seem to fix it. Nor does RHEL AS U3. this sounds a bit like 131475 Somtimes it works on U2 for i386 with the provided initrd. I found that powering off the box and ensuring the switch (hp procurve or dell power connect) don't have the mac in there, then powering up to install will successfully install. For RHEL3 U3, I have an updated initrd available at http://people.redhat.com/~katzj/u3-test.img that might fix things. For Fedora, expect that the fix will percolate out in the week after FC3 test2 is released. Any confirmation of this helping would be appreciated. new update for dell power connect issue using U3 initrd images: 1) turn off spanning tree and 2) turn on spanning tree port fast for all ports Don't ask me why, but it works for U3 with x86_64 and i386 initrd images. I will look into a similar feature in the hp procurve switches. If you have a different type of switch I would suggest doing something similar to test. Spanning tree port fast should be disabled on Cisco switches as well to make network kickstarting work. If it isn't disabled, some of the DHCP packets are blocked. However, this didn't fix the e1000 & anaconda problem in RH 9 - RHEL U2. The current u3 appears to work for me if my cisco switch is configured correctly. Sounds like the current crop of issues should be resolved. If anyone continues to see problems like this in Fedora Core 3 test3 or RHEL3 U4 or later, please file a new report so that we can investigate and track further things down. There is a long dhcp request still occuring with the tg3 driver and hp procurve switches. I tried setting the siwtches to the settings as I wrote in above about the dell switches, but it doesn't seem to be working. I thought there was some sample initrd's for x86_64 to checkout as well? An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2004-518.html This bug still appears to exist in RHEL4 (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151872) Is there a way to integrate this anaconda package into an Update 2 install to avoid this problem? |