Bug 90774

Summary: kickstart install failing when using dhcp
Product: Red Hat Enterprise Linux 3 Reporter: Michael Young <m.a.young>
Component: pumpAssignee: Jeremy Katz <katzj>
Status: CLOSED CURRENTRELEASE QA Contact: Mike McLean <mikem>
Severity: medium Docs Contact:
Priority: high    
Version: 3.0CC: barber, herrold, jim, mkomarinski, tcallawa
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: U5/U1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-21 20:38:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
anaconda.log
none
syslog none

Description Michael Young 2003-05-13 17:00:52 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.2.1) Gecko/20030313

Description of problem:
We have been having trouble doing a kickstart install from a particular machine
(from either floppies, or using the first cdrom as a boot disk, and via nfs or
http). The problem occurs when we try to use dhcp to set the IP address
automatically and the system can't find the kickstart file on the remote server.

I eventually got the install to work by manually specifying the network
configurations on the command line and kickstart file, though even then the nfs
filesystem took 2 attempts to load - a manual configuration box appeared but
pressing okay without changing any settings got the install working.

I will attach the anaconda.log and syslog files from /tmp in the install copied
via the shell prompt on F2 when the system defaulted into a manual install from
the CD - no additional setup of the network was required to do this.
My current guess as to what the problem is is that we are trying to install on a
relatively fast machine, and the kickstart install is trying to use the network
before the network has had time to finish configuring itself.

Comment 1 Michael Young 2003-05-13 17:01:57 UTC
Created attachment 91649 [details]
anaconda.log

Comment 2 Michael Young 2003-05-13 17:02:32 UTC
Created attachment 91650 [details]
syslog

Comment 3 Michael Fulbright 2003-05-15 18:50:17 UTC
It looks like there is more than on NIC in the machine - do you have the
appropriate configurations for ksdevice, etc?

Are there any errors on the NFS server? Perhaps the machine doesn't have
permission to access the file?

We do installs like this all the time in our automated testing w/o problems so
I'd guess it a configuration issue.

Comment 4 Michael Young 2003-05-16 09:59:48 UTC
The ksdevice setting is right, because the logs show pump doing some
negotiation; it fails in a different way if you try to use the wrong network card.

It is not permissions on the nfs server, because it still fails if you use http
to get the kickstart file (and the web server logs show no sign of a request
ever reaching it). However you can mount nfs or use http to get the files once a
shell prompt is available, or if you manually enter the network configuration at
the boot prompt.

Moreover the reverse name lookup fails immediately prior to the nfs mount, which
again points to some networking issue within anaconda. nslookups do work once
the shell prompt is available.

Comment 5 R P Herrold 2003-05-17 04:01:17 UTC
I have a report from an end user that this occurs on IBM hardware with two nic's
(with the pesky Broadcom NIC variant on board) as well.   Adding note and cc, so
I can track it down

Comment 6 Jim Wildman 2003-05-17 16:16:10 UTC
What brand/model of hardware is this?  Is it the only one you have like that?

Comment 7 Michael Young 2003-05-21 10:08:00 UTC
The machine I was using was a Viglen SX235. We tested another similar machine,
but it doesn't show the problem, maybe because it is located on a different
point on the network, and has to go through more switches/routers to talk to the
dhcp servers.

Comment 8 Jim Wildman 2003-05-21 13:27:22 UTC
I've seen a lot of problems like this (intermittent or partial functioning of 
nics) caused by the auto-negotiation attempts between the nic and the switch 
port.  It seems to be worse with the 10/100/1000 nics.  Try locking the port to 
the appropriate speed and enable 'fast port spanning' or something like that 
(sorry the network guys did it for me).  The dhcp request tends to time out 
before the port decides what speed to run.  

Comment 9 Michael Young 2003-05-21 14:23:18 UTC
I could have believed the problem was something along those lines except that
the logs I attached show that there is a dhcp reply, albeit 4 seconds after the
request.

Comment 10 Michael Fulbright 2003-05-27 18:47:55 UTC
I have not seen any problems such as you are describing.  If there is a problem
with how we handle DHCP this would be a pump issue most likely.

Comment 11 e 2003-06-13 05:18:44 UTC
We are also seeing this attempting install of RHL 9 via NFS/DHCP/Kickstart.

Machine has 2 NIC's. Both NIC's tried.
# lspci|grep Ether
03:04.0 Ethernet controller: Intel Corp. 82544GC Gigabit Ethernet Controller
(LOM) (rev 02)
04:02.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0d)

On the VC we see the message "pump told us No DHCP offer received".

DHCP server is RHL 8.0 dhcp-3.0pl1-26.
Logs on DHCP server show:
DHCPDISCOVER from 00:30:48:24:aa:6b via eth0
DHCPOFFER on 10.0.0.199 to 00:30:48:24:aa:6b via eth0

and nothing further. Log entries seem to occur *after* timeout message from
client.

Comment 12 Michael Young 2003-07-07 09:47:31 UTC
I did some further tests, including hacking the install script to retry the nfs
mounts at 5 second intervals. The result was that the nfs mount worked on the
3rd or 4th attempt, so there is probably some timeout issue here. Also minor
changes in the logging seemed to cause the dhcp to fail altogether, even though
the changes were nowhere near the dhcp code.

Comment 13 Mark Komarinski 2003-07-14 14:38:28 UTC
Seeing the same problem on two IBMs, one with the broadcom, one with the e1000.
 Both seem to send out pump requests before the interface is negotiated, both
fail to read the ks.cfg file via NFS, but both will mount via NFS manually.
While I do not have access to the DHCP server, the NFS server shows no attempts
to mount the share.

Comment 14 Eido Inoue 2003-08-20 20:03:14 UTC
what machine is acting as the dhcp server?

Comment 15 Michael Young 2003-08-20 20:54:16 UTC
In my case probably ISC DHCP 2.0pl5 running on on solaris 8.

Comment 16 Mark Komarinski 2003-08-21 14:03:58 UTC
This is what I got from our IT staff:

We are running Lucent's dhcpd
    Version: 5.3 Build 8 - Lucent DHCP Server) Copyright
    (c) 2000-2003 Lucent Technologies Inc.
on a box running Solaris 8.

Comment 17 Michael R. Barber 2003-08-28 16:18:34 UTC
I am experiencing this problem was well.  I have confirmed the problem exists 
for RH9 and RHEL 3.0 beta 1 & 2.  RH8 did not have this issue.

To do a kickstart, I am booting from CD 1 of the OS distribution, and at the 
splash screen entering:

     linux ks=http://172.16.22.93/ks/hostname.cfg

After entering this, the system boots into an interactive install.

I am using dual on-board Intel 82546EB 10/100/1000 NICs and have observed the 
same behavior on 3 like-configured systems.

I have verified that the (Solaris 9 based) DHCP server does receive the DHCP
request and sends back an offer, but the web server logs do not show any
hits.  The system then goes into an interactive install.

The only way I have been able to get kickstart to work is to disable the DHCP 
server when doing a kickstart.  When the DHCP times out, redhat allows me to 
manually enter the IP information.  After entering the IP information, I get an
error about it not finding the appropriate startup files on the http location I 
had provided on the command line.  However, if I have it retry, it is 
successful and the kickstart works just fine.

There seems to be a timing issue between when the NIC comes up after IP address 
assignment and when it tries to talk to the kickstart server.

Again, I do not have this problem with RH8 when using the same hardware, 
network, and DHCP server configurations.  RH9, RHEL3 beta 1 & 2 (mis)behave 
nearly identically with regard to this problem.


Comment 18 Aleksandr Brezhnev 2003-10-21 17:02:09 UTC
I can confirm the same problem with Dell 2650.
The system has 2 on-board Broadcom 1000/100/10 NICs. I disabled one of them. 
The system is connected to 100Mbps switch.

I am using PXE boot. The system can get IP address and can download
pxelinux.0. The bootloader is able to download vmlinuz and initrd.img and 
then it passes control to the downloaded kernel. 
Anaconda is trying to get IP address again through DHCP (using pump) 
and can't do this. Kickstart installation stops. 

I have this problem with AS 2.1 update 2 and RHEL 3 RC1.0.

Comment 21 Jeremy Katz 2005-09-21 20:38:47 UTC
RHEL3 U5 and RHEL4 U1 both have the complete set of fixes for all known problems
in this area.  If you are still experiencing problems with one of these
releases, please file a SEPARATE bug so that we can look into your specific case
as opposed to the confusion which results from 10 people seeing the same symptom
but from different root causes