anaconda-10.1.1.46-1 When using a tg3 card, under some circumstances (see bug 186634), we need to use "nicdelay=..." to avoid the link not being ready when we probe it. But ksdevice=link doesn't use nicdelay at all, which causes the link detection to fail. Patch attached.
Created attachment 136870 [details] anaconda-rhel4-wait-for-ksdevice-link.patch
Rather than this hack, why not increase the timeout? That way it will just work for everyone. I don't see how anyone could identify this problem and discover the nicdelay option without spending a full day troubleshooting. Since the PXE DHCP works and the Anaconda DHCP does not, you'll be led down a totally wrong path. Altering the network to sniff it by adding a hub hides the problem. It is not clearly stated anywhere that someone encountering this problem should try the nicdelay option. The advisory (https://rhn.redhat.com/errata/RHEA-2006-0443.html) mentions the existence of nicdelay, but not what it does or when you would need it. Also, I tried nicdelay and it does not even work for me. It does cause anaconda to sleep for that length of time at some point during the DHCP attempt, but DHCP still fails. I solved this problem by changing net.c's doDhcp to the following. It increases the default retries from 5 to 10, and default timeout per try from 30 to 90 seconds. char * doDhcp(char * ifname, struct networkDeviceConfig *dev, char * dhcpclass) { struct pumpOverrideInfo override; memset(&override, 0, sizeof(override)); override.numRetries = 10; override.timeout = 90; setupWireless(dev); logMessage("running dhcp for %s", ifname); return pumpDhcpClassRun(ifname, 0, 0, NULL, dhcpclass ? dhcpclass : "anaconda", &dev->dev, &override); }
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Patched anaconda based on comment #2, but modified. Instead of 90 seconds, I'm using 45 seconds. Set the number of retries to 10. The patch originally attached to this bug was already in anaconda, so I'm not sure what release the patch was intended for. Setting bug to modified.
Created attachment 157495 [details] anaconda-RHEL4-pump-wait-45secs.patch
Hi, (In reply to comment #3) > Customer is using an IBM x336 x86_64 machine with 2 onboard nics. He is > attempting install using RHEL 4 U4. > > The customer passes a "ksdevice=link" option. The link is not detected and > the t hey get instead an interactive prompt. > > Passing ksdevice=eth0 works as expected. The anaconda logs both detect > failures in detecting link Note, that since ksdevice=eth0 seems to work where as ksdevice=link does not, this is not really a nicdelay problem or a problem with getting the dhcp address. From loader2/net.c::chooseNetworkInterface(): -------------------------------------------------- if (loaderData->netDev && (loaderData->netDev_set == 1)) { if (!strcmp(loaderData->netDev, devs[i]->device)) { foundDev = 1; } else { ... ... if ((loaderData->netDev && (loaderData->netDev_set) == 1) && !strcmp(loaderData->netDev, "link")) { logMessage("looking for first netDev with link"); ... ... -------------------------------------------------- ie: if eth0 is specified, it has had no problem initializing, getting the dhcp address and continuing with the install. Whereas with ksdevice=link, anaconda seems to have trouble detecting the link on the NIC ... > --8<-- 16:18:51 INFO : looking for first netDev with link 16:18:56 > WARNING : wanted netdev with link, but none present. prompting --8<-- Once the NIC has been manually selected, the install continues with its dhcp request as expected. > > and --8<-- 16:18:56 WARNING : wanted netdev with link, but none present. > prompting 16:19:01 INFO : sending dhcp request through device eth0 > --8<-- This seems to be a problem with RHEL 5 too. The BZ tracking the RHEL 5 issue is (bug #223435). I have a issue tracker ticket open that describes this problem. I shall attach that to this BZ shortly. - steve
Engineering: We have a report of this issue for a tg3 on a BL860c (a ia64 system -- I'll attach the sysreport of the system, shortly). This has been reported for both RHEL 4 as well as RHEL 5. Here is the description as provided by the customer: ------------------------------------------------------------------------ Description of problem: Testing a kickstart install on a BL860c, I found that even with specifying 'ksdevice=link' on the kernel command line, I still was prompted to select a specific device to use for the install. I tried specifying 'ksdevice=eth0' and it worked, as does explicitly selecting eth0 when prompted. However, this is a big problem for us, as we use kickstart to preinstall RHEL4.5 in our factory, and any interaction will require a manual work-around, which simply isn't feasible. How reproducible: 100% Steps to Reproduce: 1) Do a kickstart install on a BL860c; make sure eth0 is plugged in. 2) Specify 'ksdevice=link' on the kernel command line. 3) Watch the prompt 4) repeat, with 'ksdevice=eth0' 5) Notice how it works fine. My conclusion from this is that the driver in the initrd is failing to detect a link. If I check after the install finishes, I see the driver is correctly detecting the link at that time: # ethtool eth0 | grep Link Link detected: yes # I don't know if this is related to IT #106442, but the results look a bit similar. Actual results: Even with ksdevice=link, user is prompted to select a NIC to install through. Expected results: ksdevice=link should automatically detect the active device. ------------------------------------------------------------------------ I have been trying to track the cause of this in the RHEL 4 version of the issue tracker ticket. I'll attach the debug patch and the results that I got from it for the RHEL 4 version. Basically, what I managed to figure out till now was that although the NICs have been properly detected with their drivers loaded, neither ethtool nor mii-tool can detect the link status, nor do the ioctl()s return with an error. Please let me know if you need additional information. - steve This event sent from IssueTracker by sfernand issue 121391
Created attachment 158863 [details] sysreport of the affected system
Created attachment 158864 [details] Patch used to generate a debug initrd for RHEL 4.5 and the resulting log
ksdevice=link is failing in RHEL4-U6-re20070731.nightly, the system tries to get a link from eth0 instead of eth2 which has the cable plugged in. specifying ksdevice=eth2 allows the system to install correctly. using initrd.img provided by pjones fixes the problem.
A fix for this issue should have been included in the packages contained in the RHEL4.6 Beta released on RHN (also available at partners.redhat.com). Requested action: Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message to Issue Tracker and I will change the status for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0816.html