Bug 145386

Summary:	Unable to run kickstart install of AS 2.1 U6 on a Dell 2850
Product:	Red Hat Enterprise Linux 2.1	Reporter:	Kyle Powell <kpowell>
Component:	kernel	Assignee:	John W. Linville <linville>
Status:	CLOSED WONTFIX	QA Contact:	Mike McLean <mikem>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	2.1	CC:	katzj, linville, tao
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2005-05-04 18:38:43 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Kyle Powell 2005-01-17 21:57:44 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3)
Gecko/20040924

Description of problem:
On a Dell 2850, if I attempt a kickstart install (i.e. "linux text
ks=nfs:xxx.xxx.xxx.xxx:/somedir/ks.cfg") the installer hangs when it
attempts to obtain a dhcp lease. I can actually see the link light go
out when the dhcp attempt is made. The installer eventually times out
when it is unable to obtain a dhcp lease. If I've booted from cdrom,
anaconda then loads from the cd. Once I can get to a shell prompt, I
can see the e1000 module is still loaded, but I'm not able to pass any
network traffic and the link light is still out. If I `ifconfig eth0
down` then `rmmod e1000` then `modprobe e1000`, the link light comes
back on. Then if I `ifconfig eth0 up` and assign it an ip address, I
am able to ping the NFS/DHCP server.

If I do not specify a kickstart install, I am able to perform a
network installation via nfs (using a bootnet floppy and a driver disk).

I have tested the same kickstart file on an IBM x345 (which also has
an e1000) using the same boot image and everything works fine.

I should also mention that we are able to successfully kickstart a
2850 with U5, once we work around the megaraid/megaraid2 issue with
the PERC4/Di.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
Attempt to install AS 2.1 U6 on a Dell 2850 using a kickstart file on
an NFS share.

Actual Results:  Link light on eth0 goes out after the installer
attempts to obtain a dhcp lease. Eventually the installer errors out
and reports that it cannot mount the nfs share to access the ks.cfg file.

Additional info:

Comment 1 Jeremy Katz 2005-01-18 02:22:00 UTC

There weren't any changes to anaconda other than moving the ncr driver
from boot.img -> drvblock.img.  Does the U6 _kernel_ alone work fine?

John -- any changes you can think of that might have some sort of
effect like this in the e1000 driver?

Comment 2 Kyle Powell 2005-01-18 14:51:05 UTC

Yes, the kernel alone works fine. No problems on boxes that have been
updated to .e57 or with nfs installs using .e57-BOOT, as long as I
don't specify a kickstart file. The other change affecting the 2850
between U5 and U6 was the chance from megaraid to megaraid2. U5's
pcitable listed the PERC4/Di as megaraid, even though the megaraid
module didn't support it, megaraid2 was required. That's been fixed in
U6. Not sure how that could be causing this problem though. I just
thought I'd mention the other U6 change I was aware of.

I also want to reiterate that I'm not pointing the finger at the e1000
(or the megaraid2) module. A kickstart install of an IBM x345 works
fine, and the x345 has an Intel GigE card too. The e1000 module also
works just fine in the 2850 during a non-kickstart install from an NFS
share.

Comment 3 Kyle Powell 2005-01-18 19:34:57 UTC

More info:

I was able to duplicate the problem on a Dell 1850 as well. That's no
surprise since they have the same motherboard. The pci id for the
Intel GigE card is 8086:1076. We also disabled the RAID support on the
PERC so it would use the mptfusion module instead of megaraid2. We
encountered the same issue with the mptfusion module loaded as with
the megaraid2 module loaded. What other info can I provide to help
determine if this is an installer issue or a module issue?

Comment 10 Adrian Miranda 2005-01-19 18:37:15 UTC

I apologize in advance for mentioning a non-redhat problem, but it
sounds like it is the same problem you are running into, and I have a
few bits of evidence to add.

I see the same problem with on a 2850 with CentOS 3.4 (which I assume
means it will also happen with RHEL 3 update 4, but I don't have an
extra RHEL license to try right now).  What's interesting is that the
DHCP request apparently succeeds, only then do the link lights go out.
 I can see this by switching to virtual terminal 4 - it claims it got
a DHCP response.  If I set a static address, everything seems to work
fine.  The problem happens whenever I try to use DHCP, whether I'm
trying to do a kickstart or a manual network install.  It happens
whether I start with a floppy or boot.iso.  It happens on either
ethernet port on the 2850.

Once the system is installed, I can use DHCP just fine.  I guess DHCP
must be handled differently by the installation software (anaconda?)

Centos 3 update 3 does work with DHCP on the 2850, but the onboard
RAID controller isn't recognized, haven't had a chance to get around that.

Again, I'm sorry for reporting a non redhat problem, but it sounds
like it affects redhat as well, so I thought it might help shed a
little light on what is happening.

Comment 13 John W. Linville 2005-01-21 18:49:28 UTC

Is there any perceptible difference between the older e1000 driver and
the current one in the amount of time it takes for the DHCP to complete?

I seem to remember that anaconda sometimes doesn't like it if a driver
takes too long to come-up?  Any chance that something like that is in
play?

Comment 14 Jeremy Katz 2005-01-21 18:57:23 UTC

The timeouts come into play more with RHEL 3 than RHEL 2.1.  RHEL2.1
has a entirely different order of loading and bringing up network
interfaces, etc.  It _could_ be causing problems, but only if it's
taking more than 30+ seconds to bring up the link.

Comment 15 Kyle Powell 2005-01-21 19:47:12 UTC

No difference between the failures of the two module versions that I
can perceive. Can I get the installer to spit out more debug info somehow?

Comment 16 John W. Linville 2005-01-21 19:55:46 UTC

Kyle, actually in comment 13 I meant to ask if there was a difference
between the _working_ e1000 driver and the _broken_ one w.r.t. DHCP
completion time.

Comment 18 John W. Linville 2005-04-05 14:14:19 UTC

As we approach U7...is this still an issue?

Comment 19 John W. Linville 2005-05-04 18:38:43 UTC

Closed due to lack of response.  Please reopen if the requested information 
becomes available.