Bug 207546 - ksdevice=link doesn't use nicdelay
ksdevice=link doesn't use nicdelay
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: anaconda (Show other bugs)
4.0
All Linux
high Severity high
: ---
: ---
Assigned To: David Cantrell
: OtherQA, Regression, TestBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-09-21 11:54 EDT by Bastien Nocera
Modified: 2010-10-22 02:07 EDT (History)
6 users (show)

See Also:
Fixed In Version: RHBA-2007-0816
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-15 11:35:17 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
anaconda-rhel4-wait-for-ksdevice-link.patch (782 bytes, patch)
2006-09-21 11:54 EDT, Bastien Nocera
no flags Details | Diff
anaconda-RHEL4-pump-wait-45secs.patch (923 bytes, patch)
2007-06-20 17:02 EDT, David Cantrell
no flags Details | Diff
sysreport of the affected system (382.26 KB, application/x-bzip2)
2007-07-10 12:03 EDT, Steve
no flags Details
Patch used to generate a debug initrd for RHEL 4.5 and the resulting log (6.17 KB, application/x-gzip)
2007-07-10 12:08 EDT, Steve
no flags Details

  None (edit)
Description Bastien Nocera 2006-09-21 11:54:30 EDT
anaconda-10.1.1.46-1

When using a tg3 card, under some circumstances (see bug 186634), we need to use
"nicdelay=..." to avoid the link not being ready when we probe it.

But ksdevice=link doesn't use nicdelay at all, which causes the link detection
to fail.

Patch attached.
Comment 1 Bastien Nocera 2006-09-21 11:54:34 EDT
Created attachment 136870 [details]
anaconda-rhel4-wait-for-ksdevice-link.patch
Comment 2 Chuck Berg 2006-12-06 17:03:22 EST
Rather than this hack, why not increase the timeout? That way it will just work
for everyone. I don't see how anyone could identify this problem and discover
the nicdelay option without spending a full day troubleshooting. Since the PXE
DHCP works and the Anaconda DHCP does not, you'll be led down a totally wrong
path. Altering the network to sniff it by adding a hub hides the problem. It is
not clearly stated anywhere that someone encountering this problem should try
the nicdelay option. The advisory
(https://rhn.redhat.com/errata/RHEA-2006-0443.html) mentions the existence of
nicdelay, but not what it does or when you would need it.

Also, I tried nicdelay and it does not even work for me. It does cause anaconda
to sleep for that length of time at some point during the DHCP attempt, but DHCP
still fails.

I solved this problem by changing net.c's doDhcp to the following. It increases
the default retries from 5 to 10, and default timeout per try from 30 to 90 seconds.

char * doDhcp(char * ifname,
              struct networkDeviceConfig *dev, char * dhcpclass) {

    struct pumpOverrideInfo override;
    memset(&override, 0, sizeof(override));
    override.numRetries = 10;
    override.timeout = 90;

    setupWireless(dev);
    logMessage("running dhcp for %s", ifname);
    return pumpDhcpClassRun(ifname, 0, 0, NULL,
                            dhcpclass ? dhcpclass : "anaconda",
                            &dev->dev, &override);

}
Comment 9 RHEL Product and Program Management 2007-05-09 05:34:07 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 13 David Cantrell 2007-06-20 16:58:32 EDT
Patched anaconda based on comment #2, but modified.  Instead of 90 seconds, I'm
using 45 seconds.  Set the number of retries to 10.  The patch originally
attached to this bug was already in anaconda, so I'm not sure what release the
patch was intended for.

Setting bug to modified.
Comment 14 David Cantrell 2007-06-20 17:02:21 EDT
Created attachment 157495 [details]
anaconda-RHEL4-pump-wait-45secs.patch
Comment 15 Steve 2007-07-10 11:56:28 EDT
Hi,

(In reply to comment #3)
> Customer is using an IBM x336 x86_64 machine with 2 onboard nics. He is
> attempting install using RHEL 4 U4.
>
> The customer passes a "ksdevice=link" option. The link is not detected and
> the t hey get instead an interactive prompt.
>
> Passing ksdevice=eth0 works as expected. The anaconda logs both detect
> failures in detecting link

Note, that since ksdevice=eth0 seems to work where as ksdevice=link does
not, this is not really a nicdelay problem or a problem with getting the
dhcp address.

From loader2/net.c::chooseNetworkInterface():
--------------------------------------------------
        if (loaderData->netDev && (loaderData->netDev_set == 1)) {
            if (!strcmp(loaderData->netDev, devs[i]->device)) {
                foundDev = 1;
            } else {
...
...
    if ((loaderData->netDev && (loaderData->netDev_set) == 1) &&
        !strcmp(loaderData->netDev, "link")) {
        logMessage("looking for first netDev with link");
...
...
--------------------------------------------------

ie: if eth0 is specified, it has had no problem initializing, getting the
dhcp address and continuing with the install. Whereas with ksdevice=link,
anaconda seems to have trouble detecting the link on the NIC ...

> --8<-- 16:18:51 INFO    : looking for first netDev with link 16:18:56
> WARNING : wanted netdev with link, but none present.  prompting --8<--

Once the NIC has been manually selected, the install continues with its dhcp
request as expected.

>
> and --8<-- 16:18:56 WARNING : wanted netdev with link, but none present.
> prompting 16:19:01 INFO    : sending dhcp request through device eth0
> --8<--

This seems to be a problem with RHEL 5 too. The BZ tracking the RHEL 5
issue is (bug #223435).

I have a issue tracker ticket open that describes this problem. I shall
attach that to this BZ shortly.

- steve

Comment 16 Issue Tracker 2007-07-10 12:01:39 EDT
Engineering: We have a report of this issue for a tg3 on a BL860c (a ia64
system -- I'll attach the sysreport of the system, shortly).

This has been reported for both RHEL 4 as well as RHEL 5. Here is the
description as provided by the customer:

------------------------------------------------------------------------
Description of problem:
Testing a kickstart install on a BL860c, I found that even with
specifying
'ksdevice=link' on the kernel command line, I still was prompted to
select a specific device to use for the install.  I tried specifying
'ksdevice=eth0' and it worked, as does explicitly selecting eth0 when
prompted.

However, this is a big problem for us, as we use kickstart to preinstall
RHEL4.5 in our factory, and any interaction will require a manual
work-around, which simply isn't feasible.

How reproducible:
100%
Steps to Reproduce:
1) Do a kickstart install on a BL860c; make sure eth0 is plugged in.  
2) Specify 'ksdevice=link' on the kernel command line.
3) Watch the prompt
4) repeat, with 'ksdevice=eth0'
5) Notice how it works fine.

My conclusion from this is that the driver in the initrd is failing to
detect a link.  If I check after the install finishes, I see the driver
is
correctly detecting the link at that time:

# ethtool eth0 | grep Link
       Link detected: yes
#

I don't know if this is related to IT #106442, but the results look a
bit
similar.

Actual results:

Even with ksdevice=link, user is prompted to select a NIC to install
through.

Expected results:

ksdevice=link should automatically detect the active device.
------------------------------------------------------------------------

I have been trying to track the cause of this in the RHEL 4 version of
the
issue tracker ticket. I'll attach the debug patch and the results that I
got from it for the RHEL 4 version. 

Basically, what I managed to figure out till now was that although the
NICs have been properly detected with their drivers loaded, neither
ethtool nor mii-tool can detect the link status, nor do the ioctl()s
return with an error.

Please let me know if you need additional information.

- steve



This event sent from IssueTracker by sfernand 
 issue 121391
Comment 17 Steve 2007-07-10 12:03:38 EDT
Created attachment 158863 [details]
sysreport of the affected system
Comment 18 Steve 2007-07-10 12:08:00 EDT
Created attachment 158864 [details]
Patch used to generate a debug initrd for RHEL 4.5 and the resulting log
Comment 20 Bill Peck 2007-08-01 14:58:10 EDT
ksdevice=link is failing in RHEL4-U6-re20070731.nightly, the system tries to get
a link from eth0 instead of eth2 which has the cable plugged in.

specifying ksdevice=eth2 allows the system to install correctly.

using initrd.img provided by pjones fixes the problem.

Comment 27 John Poelstra 2007-08-29 00:27:09 EDT
A fix for this issue should have been included in the packages contained in the
RHEL4.6 Beta released on RHN (also available at partners.redhat.com).  

Requested action: Please verify that your issue is fixed to ensure that it is
included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message to Issue Tracker and
I will change the status for you.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.
Comment 30 errata-xmlrpc 2007-11-15 11:35:17 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0816.html

Note You need to log in before you can comment on or make changes to this bug.