Bug 136482

Summary: DHCP timeouts during Kickstart
Product: Red Hat Enterprise Linux 3 Reporter: Matt Walburn <matt>
Component: anacondaAssignee: Jeremy Katz <katzj>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: jon.stanley, nobody+pnasrat
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-04-24 18:19:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matt Walburn 2004-10-20 13:16:26 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)
Gecko/20040913 Firefox/0.10.1

Description of problem:
As of RHEL 3 Update 2 I cannot Kickstart via DHCP.

This problem is due to the way network interfaces are being brought
up. I'm not sure if the problem lies in Prior to RHEL 3 Update 2,
anaconda would not bring the network interface down and then back up
in order initiate a DHCP request, it would simply do a
"hot-reconfiguration" of the Kickstart interface. In other words, the
i nterface doesn't lose its link with the switch to get the address. 

Now, when kickstart requests a DHCP address it completely downs the
interface and then brings it back up. This is a problem for us because
the DHCP timeout is shorter than the time it takes our switch ports
(Cisco 2900) to go into a forwarding state. As a result, our servers
are never able to get their DHCP lease.

I'm not sure if this problem is in anaconda, dhclient, initscripts, or
the tg3 driver itself.

This problem is in RHEL 3 Updates 2 & 3, as well as RHEL 4 Beta 1.

Version-Release number of selected component (if applicable):
RHEL 3 Updates 2 & 3 

How reproducible:
Always

Steps to Reproduce:
1. Request a new DHCP address via Anaconda/Kickstart

Actual Results:  The interface is disabled entirely, then re-enabled,
which causes the switchport to be reset every time.

Expected Results:  The interface should not be completely turned off
then on to get a DHCP address.

Additional info:

This is a new behavior for Red Hat Linux. In previous releases (RHEL 3
Update 1 and before) it could get a DHCP address without resetting the
interface.

Comment 1 Jeremy Katz 2004-10-20 13:42:51 UTC
Update 2 actually didn't change the behavior at all, but some drivers
changed and seem to exacerbate the behavior more.  Update 3 adds some
fixes and Update 4 (beta to be released soon) adds another set.

Comment 2 Matt Walburn 2004-10-20 13:52:46 UTC
I know that when I use the boot.iso from the initial release of RHEL 3
and Update 1 that I don't have this problem. I never lose the link
between my NIC and the switch during DHCP requests. However, on
Updates 2 and 3, I do. This problem persists on RHEL 4 Beta 1.

Comment 4 Jim Wildman 2004-11-02 18:53:20 UTC
I observed the same symptoms with U2, U3, and RH4 Beta 1 on a new HP 
DL585.  If I used a static ip and was willing to cycle through 
the "Can't find server" message a few times (1-3), it would go ahead 
and install.  I don't have access to the switch to tell what it was 
seeing.  

lspci yields...
lspci...
02:06.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 
Gigabit Ethernet (rev 10)
02:06.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 
Gigabit Ethernet (rev 10)

Comment 6 Marc Tamsky 2005-02-16 13:56:31 UTC
This is the same bug as Bug#15896 which was marked WONTFIX many years ago.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=15896

http://lists.us.dell.com/pipermail/linux-poweredge/2004-March/037152.html
has more info -- it's related to spanning tree convergence time, which
exceeds the dhcp retry timout period for the second dhcp sequence --
the one where anaconda is just about to mount the nfs install media.

Comment 7 Jon Stanley 2006-03-15 17:34:44 UTC
I need to disagree with the WONTFIX of the other bug.  I have this problem, and
it's very pervasive on Cisco hardware.

The workaround for this on the network side is to turn on 'spanning-tree
portfast' on an IOS based switch.  However, this is not viable in all network
topologies or with all network administration practices.

The purpose of portfast is to cause a port to go into STP forwarding state,
immediately when link comes up, rather than listening for BPDU's, and then
deciding to forward.  With portfast turned on, if there is a loop in the network
(for instance someone hooks a switch up to the port, with two uplinks into the
layer 2 infrastructure, you have a loop).

Comment 8 Jeremy Katz 2006-04-24 18:19:15 UTC
Mass-closing lots of old bugs which are in MODIFIED (and thus presumed to be
fixed).  If any of these are still a problem, please reopen or file a new bug
against the release which they're occurring in so they can be properly tracked.