Bug 165587

Summary: kickstart fails to fetch ks.cfg from a network location without DHCP
Product: Red Hat Enterprise Linux 3 Reporter: Noam Meltzer <tsnoam>
Component: pumpAssignee: Jeremy Katz <katzj>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: pat.lampert
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-03-19 20:14:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Noam Meltzer 2005-08-10 16:42:54 UTC
Description of problem:

Using BladeCenter of IBM, I'm trying to perform a kickstart install without
DHCP, using the following method:

0. I'm using RHEL3 update5 - AS
1. I have a FTP server on the same subnet and inside the same BladeCenter. The
kickstart data is available using anonymous user and on the root (/) directory.
2. I have SpanningTree enabled on the network switch module.
3. I'm using a bootable CDROM burned from the ISO avail. on the first CD.
4. When getting to the syslinux prompt, i'm typing the following line:
    linux ks=ftp://<serverIP>/ ip=<clientIP> netmask=<netMask> ksdevice=eth0

What happens is the following:
The linux boots up, starting the installation loading drivers for the system
(SCSI/HDs/network)
Then the installation is trying to fetch the ks.cfg file from the server, but it
claims that it doesn't work.

Running tcpdump on the server shows the interesting data:
the installation sends a SYN packet to the server over the FTP tcp port.
the server replies with a SYN-ACK packet.
after a little while (around 120 ms. if i recall correctly) the installation
sends a RST packet.
(and here the installation claims that it failed)

After some debugging I have found the following:
1. If I use DHCP to get the IP (just for the testing, i can't use DHCP on
production), there is no problem.
2. If I do not use DHCP but disable the SpanningTree (which I need to be
working) from the switch, the installation has no problem.

This leads me to the following conclustion:
When the driver is first loaded by the OS(Installation) and "plumbed" there is a
short period which takes the OS to "recognize" the network around it and to the
switch to "recognize" the network adapter and its MAC-address.
The problem is that this "short" period is too long while using SpanningTree,
because in that time the installation already tries to fetch the ks.cfg file and
fails.
Getting the IP by dhcp works-around the problem because the dhcp sends a first
dhcp-discover and fails but retries and then succeeds, then trying to fetch the
ks.cfg should be no problem because the network is really up.




Suggested Resolution:

1. Add a "sleep" period between taking up the network to trying to fetch the
ks.cfg of around 5-10 seconds.
2. Add a retry feature to the installer for the ks.cfg stage.




Version-Release number of selected component (if applicable):
RHEL3 update5 AS

Comment 1 Jeremy Katz 2005-08-15 18:30:33 UTC
If you have spanning tree enabled, then you also need to enable portfast. 
Otherwise, there is no way to programatically tell that the switch is just
dropping all traffic.  

Doing a "sleep" like this would penalize all users who have their network set up
properly with about a minute of additional time per install.

Comment 2 Noam Meltzer 2005-08-16 06:51:14 UTC
Hello,
I disagree with the fact that this isn't a bug.
I tried using the 32bit version on my environment and there I had no problem, it
worked flowlessly.
Then I noticed that on the 32bit version, the loader process *do wait* until the
link will be truly up.

So I believe that this is a bug in the x86_64 ver. and should be fixed.

Regarding the claim that this will penalize other users in about a minute of
additional time, this is wrong - adding a "sleep" of 5-10 sec. is not 60 sec.
and besides if it is, then it is a very small price to pay in exchange to the
fact that problematic users (like me) will be able to kickstart their linux.
Additionaly, we're talking about the installation procedure, which is a one time
procedure - unlike the bootup of an installed OS, for example.

About the "portfast" - I will check it.

Noam

Comment 3 Dan Timmons 2005-09-29 14:55:48 UTC
Also set trunk off.  I had spanning tree with portfast and still had problems
timing out.  Turned off trunking for the port and the problems went away.

Hope this helps,
Dan

Comment 4 Jeremy Katz 2005-10-03 18:00:33 UTC
Yeah, trunking can also cause problems and, per the manufacturer, shouldn't be
enabled for ports which are connected to machines (as opposed to routers)

Comment 5 Noam Meltzer 2005-10-04 10:18:08 UTC
Hi,
I do not agree with the claim that trunking shouldn't be enabled for machines.
At some configurations it is most needed (for example - you want to double your
bandwidth).

Anyhow,
please don't forget that this error does not exist on the x86_32bit ver. of
RHEL3 but does exist on the 64bit version. Thus, it *is* a bug and must be fixed.

Noam

Comment 6 Red Hat Bugzilla 2007-02-05 18:59:02 UTC
REOPENED status has been deprecated. ASSIGNED with keyword of Reopened is preferred.

Comment 7 Pat Lampert 2009-02-23 21:52:19 UTC
HP would like to submit a suggestion that Red Hat consider enhancing the network capabilities of kickstart. Increasingly, major customers of HP and Red Hat are setting up systems with multiple NICS and often use aggregate NIC bonding pairs. The use of aggregate NIC bonding requires that the ports the NICS are connected to be trunked. 

Because the simple network interface in kickstart is not able to emplement the final network configuration desired by customers it becomes inconvenient to use kickstart because the switch needs to be manually reconfigured after each installation.