Bug 90774
Summary: | kickstart install failing when using dhcp | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Michael Young <m.a.young> | ||||||
Component: | pump | Assignee: | Jeremy Katz <katzj> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike McLean <mikem> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 3.0 | CC: | barber, herrold, jim, mkomarinski, tcallawa | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i386 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | U5/U1 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2005-09-21 20:38:47 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Michael Young
2003-05-13 17:00:52 UTC
Created attachment 91649 [details]
anaconda.log
Created attachment 91650 [details]
syslog
It looks like there is more than on NIC in the machine - do you have the appropriate configurations for ksdevice, etc? Are there any errors on the NFS server? Perhaps the machine doesn't have permission to access the file? We do installs like this all the time in our automated testing w/o problems so I'd guess it a configuration issue. The ksdevice setting is right, because the logs show pump doing some negotiation; it fails in a different way if you try to use the wrong network card. It is not permissions on the nfs server, because it still fails if you use http to get the kickstart file (and the web server logs show no sign of a request ever reaching it). However you can mount nfs or use http to get the files once a shell prompt is available, or if you manually enter the network configuration at the boot prompt. Moreover the reverse name lookup fails immediately prior to the nfs mount, which again points to some networking issue within anaconda. nslookups do work once the shell prompt is available. I have a report from an end user that this occurs on IBM hardware with two nic's (with the pesky Broadcom NIC variant on board) as well. Adding note and cc, so I can track it down What brand/model of hardware is this? Is it the only one you have like that? The machine I was using was a Viglen SX235. We tested another similar machine, but it doesn't show the problem, maybe because it is located on a different point on the network, and has to go through more switches/routers to talk to the dhcp servers. I've seen a lot of problems like this (intermittent or partial functioning of nics) caused by the auto-negotiation attempts between the nic and the switch port. It seems to be worse with the 10/100/1000 nics. Try locking the port to the appropriate speed and enable 'fast port spanning' or something like that (sorry the network guys did it for me). The dhcp request tends to time out before the port decides what speed to run. I could have believed the problem was something along those lines except that the logs I attached show that there is a dhcp reply, albeit 4 seconds after the request. I have not seen any problems such as you are describing. If there is a problem with how we handle DHCP this would be a pump issue most likely. We are also seeing this attempting install of RHL 9 via NFS/DHCP/Kickstart. Machine has 2 NIC's. Both NIC's tried. # lspci|grep Ether 03:04.0 Ethernet controller: Intel Corp. 82544GC Gigabit Ethernet Controller (LOM) (rev 02) 04:02.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0d) On the VC we see the message "pump told us No DHCP offer received". DHCP server is RHL 8.0 dhcp-3.0pl1-26. Logs on DHCP server show: DHCPDISCOVER from 00:30:48:24:aa:6b via eth0 DHCPOFFER on 10.0.0.199 to 00:30:48:24:aa:6b via eth0 and nothing further. Log entries seem to occur *after* timeout message from client. I did some further tests, including hacking the install script to retry the nfs mounts at 5 second intervals. The result was that the nfs mount worked on the 3rd or 4th attempt, so there is probably some timeout issue here. Also minor changes in the logging seemed to cause the dhcp to fail altogether, even though the changes were nowhere near the dhcp code. Seeing the same problem on two IBMs, one with the broadcom, one with the e1000. Both seem to send out pump requests before the interface is negotiated, both fail to read the ks.cfg file via NFS, but both will mount via NFS manually. While I do not have access to the DHCP server, the NFS server shows no attempts to mount the share. what machine is acting as the dhcp server? In my case probably ISC DHCP 2.0pl5 running on on solaris 8. This is what I got from our IT staff: We are running Lucent's dhcpd Version: 5.3 Build 8 - Lucent DHCP Server) Copyright (c) 2000-2003 Lucent Technologies Inc. on a box running Solaris 8. I am experiencing this problem was well. I have confirmed the problem exists for RH9 and RHEL 3.0 beta 1 & 2. RH8 did not have this issue. To do a kickstart, I am booting from CD 1 of the OS distribution, and at the splash screen entering: linux ks=http://172.16.22.93/ks/hostname.cfg After entering this, the system boots into an interactive install. I am using dual on-board Intel 82546EB 10/100/1000 NICs and have observed the same behavior on 3 like-configured systems. I have verified that the (Solaris 9 based) DHCP server does receive the DHCP request and sends back an offer, but the web server logs do not show any hits. The system then goes into an interactive install. The only way I have been able to get kickstart to work is to disable the DHCP server when doing a kickstart. When the DHCP times out, redhat allows me to manually enter the IP information. After entering the IP information, I get an error about it not finding the appropriate startup files on the http location I had provided on the command line. However, if I have it retry, it is successful and the kickstart works just fine. There seems to be a timing issue between when the NIC comes up after IP address assignment and when it tries to talk to the kickstart server. Again, I do not have this problem with RH8 when using the same hardware, network, and DHCP server configurations. RH9, RHEL3 beta 1 & 2 (mis)behave nearly identically with regard to this problem. I can confirm the same problem with Dell 2650. The system has 2 on-board Broadcom 1000/100/10 NICs. I disabled one of them. The system is connected to 100Mbps switch. I am using PXE boot. The system can get IP address and can download pxelinux.0. The bootloader is able to download vmlinuz and initrd.img and then it passes control to the downloaded kernel. Anaconda is trying to get IP address again through DHCP (using pump) and can't do this. Kickstart installation stops. I have this problem with AS 2.1 update 2 and RHEL 3 RC1.0. RHEL3 U5 and RHEL4 U1 both have the complete set of fixes for all known problems in this area. If you are still experiencing problems with one of these releases, please file a SEPARATE bug so that we can look into your specific case as opposed to the confusion which results from 10 people seeing the same symptom but from different root causes |