From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3) Gecko/20040924 Description of problem: On a Dell 2850, if I attempt a kickstart install (i.e. "linux text ks=nfs:xxx.xxx.xxx.xxx:/somedir/ks.cfg") the installer hangs when it attempts to obtain a dhcp lease. I can actually see the link light go out when the dhcp attempt is made. The installer eventually times out when it is unable to obtain a dhcp lease. If I've booted from cdrom, anaconda then loads from the cd. Once I can get to a shell prompt, I can see the e1000 module is still loaded, but I'm not able to pass any network traffic and the link light is still out. If I `ifconfig eth0 down` then `rmmod e1000` then `modprobe e1000`, the link light comes back on. Then if I `ifconfig eth0 up` and assign it an ip address, I am able to ping the NFS/DHCP server. If I do not specify a kickstart install, I am able to perform a network installation via nfs (using a bootnet floppy and a driver disk). I have tested the same kickstart file on an IBM x345 (which also has an e1000) using the same boot image and everything works fine. I should also mention that we are able to successfully kickstart a 2850 with U5, once we work around the megaraid/megaraid2 issue with the PERC4/Di. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: Attempt to install AS 2.1 U6 on a Dell 2850 using a kickstart file on an NFS share. Actual Results: Link light on eth0 goes out after the installer attempts to obtain a dhcp lease. Eventually the installer errors out and reports that it cannot mount the nfs share to access the ks.cfg file. Additional info:
There weren't any changes to anaconda other than moving the ncr driver from boot.img -> drvblock.img. Does the U6 _kernel_ alone work fine? John -- any changes you can think of that might have some sort of effect like this in the e1000 driver?
Yes, the kernel alone works fine. No problems on boxes that have been updated to .e57 or with nfs installs using .e57-BOOT, as long as I don't specify a kickstart file. The other change affecting the 2850 between U5 and U6 was the chance from megaraid to megaraid2. U5's pcitable listed the PERC4/Di as megaraid, even though the megaraid module didn't support it, megaraid2 was required. That's been fixed in U6. Not sure how that could be causing this problem though. I just thought I'd mention the other U6 change I was aware of. I also want to reiterate that I'm not pointing the finger at the e1000 (or the megaraid2) module. A kickstart install of an IBM x345 works fine, and the x345 has an Intel GigE card too. The e1000 module also works just fine in the 2850 during a non-kickstart install from an NFS share.
More info: I was able to duplicate the problem on a Dell 1850 as well. That's no surprise since they have the same motherboard. The pci id for the Intel GigE card is 8086:1076. We also disabled the RAID support on the PERC so it would use the mptfusion module instead of megaraid2. We encountered the same issue with the mptfusion module loaded as with the megaraid2 module loaded. What other info can I provide to help determine if this is an installer issue or a module issue?
I apologize in advance for mentioning a non-redhat problem, but it sounds like it is the same problem you are running into, and I have a few bits of evidence to add. I see the same problem with on a 2850 with CentOS 3.4 (which I assume means it will also happen with RHEL 3 update 4, but I don't have an extra RHEL license to try right now). What's interesting is that the DHCP request apparently succeeds, only then do the link lights go out. I can see this by switching to virtual terminal 4 - it claims it got a DHCP response. If I set a static address, everything seems to work fine. The problem happens whenever I try to use DHCP, whether I'm trying to do a kickstart or a manual network install. It happens whether I start with a floppy or boot.iso. It happens on either ethernet port on the 2850. Once the system is installed, I can use DHCP just fine. I guess DHCP must be handled differently by the installation software (anaconda?) Centos 3 update 3 does work with DHCP on the 2850, but the onboard RAID controller isn't recognized, haven't had a chance to get around that. Again, I'm sorry for reporting a non redhat problem, but it sounds like it affects redhat as well, so I thought it might help shed a little light on what is happening.
Is there any perceptible difference between the older e1000 driver and the current one in the amount of time it takes for the DHCP to complete? I seem to remember that anaconda sometimes doesn't like it if a driver takes too long to come-up? Any chance that something like that is in play?
The timeouts come into play more with RHEL 3 than RHEL 2.1. RHEL2.1 has a entirely different order of loading and bringing up network interfaces, etc. It _could_ be causing problems, but only if it's taking more than 30+ seconds to bring up the link.
No difference between the failures of the two module versions that I can perceive. Can I get the installer to spit out more debug info somehow?
Kyle, actually in comment 13 I meant to ask if there was a difference between the _working_ e1000 driver and the _broken_ one w.r.t. DHCP completion time.
As we approach U7...is this still an issue?
Closed due to lack of response. Please reopen if the requested information becomes available.