From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 Description of problem: When we tried to install Fedora Core 1 via kickstart, we found problems with 2 systems. Both systems use Intel Gigabit network adapters, so we conclude that the e1000 driver exposes the problems in anaconda: during initial phase of kickstart, anaconda has insufficient network error recovery. Anaconda uses UDP based network services, and since UDP is unsafe transport by definition, anaconda is responsible for error recovery. The enclosed patch fixes the problem, and the activated logMessage calls document the problem. Please see the following excerpt from anaconda.log (on a system with e1000): ... * probing buses * finished bus probing * modules to insert e1000 aic79xx * loaded e1000 from /modules/modules.cgz * loaded aic79xx from /modules/modules.cgz * inserted /tmp/e1000.o * inserted /tmp/aic79xx.o * load module set done * getting kickstart file * sending dhcp request through device eth0 * waiting for link... * 0 seconds. * doing kickstart... setting it up * url is 192.67.55.2:/fedora-1/192.67.55.58-kickstart * file location: nfs://192.67.55.2:/fedora-1/192.67.55.58-kickstart * calling nfsmount(192.67.55.2:/fedora-1, /tmp/mnt, &flags, &extra_opts, &mount_opt, 0) * pmap_getmaps failed, retrying * pmap_getmaps success on 2. attempt ... * calling mount(192.67.55.2:/fedora-1, /tmp/mnt, nfs, c0ed0001, 0x80b4420) * setting up kickstart * kickstartFromNfs ... Without the patch, pmap_getmaps fails, is never retried, and the kickstart file will not be found. With RedHat 8.0, the problem was not observed. With RedHat 9, the problem was reported under bugid 103952, and also under bugid 104345. The patches which have been supplied with the two bug reports have obviously been partially implemented for Fedora 1, but a significant part of the problem still exists. For Fedora 1, the enclosed patch is needed to fix this problem. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.try to kickstart via Intel gigabit ethernet adapter 2.using NFS method to access kickstart file 3. Actual Results: kickstart file not found Expected Results: kickstart file should be accessed via NFS. Additional info:
Created attachment 95966 [details] fix the bug.
I can confirm that this bug exists in anaconda-9.1.2 (RHEL WS3 u2) and that the above patch does work around the problem. The root of the problem appears to be a long delay between the time the ethernet interface is brought up (BCM 5704 NetXtreme) and when packets can actually be sent over the network. Testing reveals that anaconda brings up the interface twice: once to fetch DHCP data to initialize the interface, and again to mount the NFS directory holding the kickstart file. The DHCP part worked, but the initial NFS mount failed. After applying the patch, anaconda finally succeeded in reading the kickstart file. The following message was shown on the diagnostic screen: * pmap_getmaps success on 6. attempt ...
Can you try using the initrd located at http://people.redhat.com/~katzj/initrd-link-2-i386.img with RHEL3 U2 (only really useful if you're doing a pxeboot, if you're boot another way, you'll need to make a new boot.iso) and see if it helps? It takes a different approach toward working around the problem.
Using the bcm5700 driver, all I get is the DHCP packets, then it falls back into interactive setup. "Failed to mount nfs source" (no other messages). Using your unaltered initrd (with the tg3 driver), the DHCP part is a little different. All I get are a couple of broadcast DHCP Discover packets from the client, followed by a pair of DHCP Offer packets from the server. There is no DHCP Request from the client (which should have followed the offer); instead, the client prompts the user to "Configure TCP/IP". A second attempt at setting up the interface via DHCP succeeds, but then as before it drops into interactive setup mode.
Created attachment 101735 [details] Patch file for pump library I believe I have found a much simpler solution to the problem. It works for both NFS and HTTP kickstarts (both of which suffer the same symptoms), and only requires changing a single line of code. As I stated before, the loader brings up the interface twice before grabbing the kickstart file. This is unnecessary. I have traced the redundancy to the pump library in pumpSetupInterface(), which starts off by disabling the interface, then it sets the interface IP address, and brings it right back up again. AFAIK, it is not necessary to disable the interface before setting or changing its IP address. So I simply commented out the call to pumpDisableInterface. Then I rebuilt both pump and anaconda. The result works great. There is no additional delay after fetching the DHCP configuration; instead, it reads the kickstart file immediately. No retries needed. Works with both NFS and HTTP kickstarts (and I would assump FTP as well). Should work on any other system with an ethernet card that takes a long time to initialize. So is there any reason that pumpDisableInterface should be left in there? (I noticed it's used in several other places as well.)
I had the same problems with an HP Proliant dl360g3 on an HP procurve switch 2648. I've seen it with Dell PowerEdge Servers and workstaions using the dell Power Connect switches as well. I tried the initrd posted in this thread fixes the issue with the prolaint systems. I will test on the dell ones.
Ok it was working, then it failed. Seems like a wierd timeout problem with the network as ppl have mentioned previously. I did the following: default bare-rhel3ws label bare-rhel3ws kernel vmlinuz append ip=dhcp gateway=192.168.1.254 ksdevice=eth0 ks=http://192.168.1.254/kickstart/rhel3ws/bare.cfg initrd=initrd-link- 2-i386.img nofb text utf8 ramdisk_size=100000 root=/dev/ram devfs=nomount And so far working on two consecuvitve installs. Will try more next week.
I can verify that this is still a problem with anaconda-9.1.2-2.RHEL (as shipped by Whitebox Linux, but it should be the same as RHEL U2). I have applied the pump patch and am rebuilding the boot cd to see if that fixes the problem.
The patch posted here for the pump library does seem to fix the problem I had kickstarting with an e1000 interface.
I'm seeing the same issue on a Tyan 2735, but the updated initrd doesn't seem to fix it. Nor does RHEL AS U3.
this sounds a bit like 131475
Somtimes it works on U2 for i386 with the provided initrd. I found that powering off the box and ensuring the switch (hp procurve or dell power connect) don't have the mac in there, then powering up to install will successfully install.
For RHEL3 U3, I have an updated initrd available at http://people.redhat.com/~katzj/u3-test.img that might fix things. For Fedora, expect that the fix will percolate out in the week after FC3 test2 is released. Any confirmation of this helping would be appreciated.
new update for dell power connect issue using U3 initrd images: 1) turn off spanning tree and 2) turn on spanning tree port fast for all ports Don't ask me why, but it works for U3 with x86_64 and i386 initrd images. I will look into a similar feature in the hp procurve switches. If you have a different type of switch I would suggest doing something similar to test.
Spanning tree port fast should be disabled on Cisco switches as well to make network kickstarting work. If it isn't disabled, some of the DHCP packets are blocked. However, this didn't fix the e1000 & anaconda problem in RH 9 - RHEL U2.
The current u3 appears to work for me if my cisco switch is configured correctly.
Sounds like the current crop of issues should be resolved. If anyone continues to see problems like this in Fedora Core 3 test3 or RHEL3 U4 or later, please file a new report so that we can investigate and track further things down.
There is a long dhcp request still occuring with the tg3 driver and hp procurve switches. I tried setting the siwtches to the settings as I wrote in above about the dell switches, but it doesn't seem to be working. I thought there was some sample initrd's for x86_64 to checkout as well?
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2004-518.html
This bug still appears to exist in RHEL4 (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151872)
Is there a way to integrate this anaconda package into an Update 2 install to avoid this problem?