Bug 429968 - Anaconda stage 1 installer does NOT work with network installs on ports about eth10
Anaconda stage 1 installer does NOT work with network installs on ports about...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: anaconda (Show other bugs)
5.2
ia64 Linux
high Severity high
: rc
: ---
Assigned To: David Cantrell
Alexander Todorov
: Reopened
: 320841 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-01-23 19:45 EST by Bill Hayes
Modified: 2013-08-02 17:52 EDT (History)
8 users (show)

See Also:
Fixed In Version: RHBA-2008-0397
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 11:32:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logfile from install showing errors on eth12 and eth13 (17.25 KB, text/plain)
2008-02-21 12:11 EST, Doug Chapman
no flags Details
26 Ethernet port server - anconda.log file (28.75 KB, text/plain)
2008-02-26 13:07 EST, Bill Hayes
no flags Details
anaconda.log (61.88 KB, text/plain)
2008-03-05 17:31 EST, Bill Hayes
no flags Details

  None (edit)
Description Bill Hayes 2008-01-23 19:45:04 EST
Description of problem:

On systems with more than 12 Ethernet ports, the Anaconda stage 1 installer does
NOT work with network installs on Ethernet ports with port above eth10. 
Generally eth0 thru eth9 will work fine, but eth10 and above will not work. The
problem does not show up on systems with 8 or less Ethernet ports.  The problem
generally is that you can't bring up ports above eth10 on the "Configure TCP/IP"
within the stage 1 installer.  Sometimes you can bring up the port by trying
"Configure TCP/IP" a second time.  In other hardware configuration retrying will
not bring up the port.

I have seen this failure occur on Ethernet hardware that used the e1000, tg3 and
s2io hardware.  This appears to be an Anaconda problem and not related to a
specific Ethernet hardware or Linux driver.

How reproducible:

I have seen this problem on very large server (like 256 CPUs) but I have also
seen this problem on a very small blade server with 16 Ethernet ports in the system.

We have reported this problem twice before but in general we believed that the
problem was hardware related.  See 
https://bugzilla.redhat.com/show_bug.cgi?id=320841 and
https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=108235&header_entry=1.
 My recenting testing in the RH IT 108235 problem, showed that the problem
actually started at eth11 and was not directly related to the s2io driver.

On a system with the following Ethernet configuration: 

	eth0 - Intel Corporation 82546GB Gigabit Ethernet Controller
	eth1 - Intel Corporation 82546GB Gigabit Ethernet Controller
	eth2 - Intel Corporation 82546GB Gigabit Ethernet Controller
	eth3 - Intel Corporation 82546GB Gigabit Ethernet Controller
	eth4 - Intel Corporation 82571EB Gigabit Ethernet Controller
	eth5 - Intel Corporation 82571EB Gigabit Ethernet Controller
	eth6 - Intel Corporation 82546GB Gigabit Ethernet Controller
	eth7 - Intel Corporation 82546GB Gigabit Ethernet Controller
	eth8 - Intel Corporation 82546GB Gigabit Ethernet Controller
	eth9 - Intel Corporation 82546GB Gigabit Ethernet Controller
	eth10 - Intel Corporation 82571EB Gigabit Ethernet Controller
	eth11 - Intel Corporation 82571EB Gigabit Ethernet Controller
	eth12 - Intel Corporation 82571EB Gigabit Ethernet Controller
	eth13 - Intel Corporation 82571EB Gigabit Ethernet Controller
	eth14 - Intel Corporation 82571EB Gigabit Ethernet Controller
	eth15 - Intel Corporation 82571EB Gigabit Ethernet Controller
	eth16 - Intel Corporation 82571EB Gigabit Ethernet Controller
	eth17 - Intel Corporation 82571EB Gigabit Ethernet Controller
	eth18 - Intel Corporation 82546GB Gigabit Ethernet Controller
	eth19 - Intel Corporation 82546GB Gigabit Ethernet Controller
	eth20 - Intel Corporation 82546GB Gigabit Ethernet Controller
	eth21 - Intel Corporation 82546GB Gigabit Ethernet Controller
	eth22 - S2io Inc. Xframe 10 Gigabit Ethernet PCI-X
	eth23 - S2io Inc. Xframe II 10Gbps Ethernet
	eth24 - S2io Inc. Xframe II 10Gbps Ethernet
	eth25 - S2io Inc. Xframe 10 Gigabit Ethernet PCI-X

Ports eth0 thru eth10 would come up fine.  Ports eth11 thru eth25 would not come
up at all.

We would like to be able to support 32 Ethernet ports and have install from any
of those ports work.

Version-Release number of selected component (if applicable):

We have seen this problem since RHEL 5.0.  The problem does not show up on RHEL
4.  I have also seen this problem on Fedora 8.

Steps to Reproduce:
1.  Configure a system with 12 or more Ethernet ports
2.  Get into the stage 1 Anaconda installer
3.  Try to bring up each of the Ethernet ports with the "Configure TCP/IP"
screen.  Use 'Back' to go back to select the next Ethernet port.

Actual results:

Not all of the Ethernet ports will work within the stage 1 Anaconda installer.

Expected results:

All of the Ethernet ports will work within the stage 1 Anaconda installer.
Comment 3 David Cantrell 2008-02-14 16:32:10 EST
Are we sure this isn't a dupe of 303681?  I committed a patch on February 5th to
fix that:

commit 0dcf8192c048324b718c3b0c2d212d1dfa584ac4
Author: David Cantrell <dcantrell@redhat.com>
Date:   Tue Feb 5 12:15:36 2008 -1000

    Use libnl to read MAC and IP addresses (#303681).
    
    This patches reduces nl.c in libisys to just what we need to talk
    to libnl.  libnl provides the netlink cache for interfaces and should
    allow us to see all NICs in the system and gather the MAC and IP
    addresses for each.

Can someone try a current RHEL 5.2 nightly on a system with more than 10 NICs?
Comment 4 David Cantrell 2008-02-14 16:44:02 EST

*** This bug has been marked as a duplicate of 303681 ***
Comment 5 Don Domingo 2008-02-14 19:49:08 EST
Ronald, as per previous comment, looks like this issue is resolved. clearing
requires_release_notes flag.

please reset requires_release_notes flag if this issue is unresolved and needs
to be documented (please include workarounds, if any).
Comment 6 Doug Chapman 2008-02-21 11:59:09 EST
I am re-opening this bug as it is _not_ the same as the issue it was closed as a
dup of.  More info to follow shortly....
Comment 7 Doug Chapman 2008-02-21 12:09:35 EST
The problem appears to be with "high numbered tg3 devices".  I.e. when a tg3
device has an eth number of > ~10 anaconda stage1 fails to be able to configure
it to be used to install over.

There has been much confusion on this issue.  I am NOT saying that you cannot
set up the device to be configured for the installed os.  I am saying that in
stage 1 if you want to do and http/nfs/ftp etc install over this device THAT is
what fails (sorry for being blunt but there has been much confusion here).

What happens is you are prompted for the list of available network devices as
expected, in this case the tg3 devies are eth12 and eth13.  When I try to select
either of them, then on the next page try to configure with dhcp, then click OK
it tries to get a dhcp address then just goes back to that same page (and yes,
our network is configured to provide dhcp to these devices).

If I select on of the other devices it works OK.

If we removed some of the other devices so the same tg3 cards are still there
but they have lower ethX numbers they work OK.

Looking at the anaconda logfile I find this:
16:53:23 ERROR   : nic_by_name: no interface named eth13 found

16:53:23 CRITICAL: dhcp_nic: net_get_by_name(eth13) failed

16:53:23 DEBUG   : dhcp: DHCP configuration failed

16:53:28 DEBUG   : waiting for link eth13...

16:53:28 DEBUG   :    0 seconds.

16:53:28 DEBUG   : sleep (nicdelay) for 0 secs first

16:53:28 DEBUG   : continuing...


the device however does appear to be there.  This is from dropping to a shell in
anaconda stage2:

sh-3.2# ifconfig eth13

eth13     Link encap:Ethernet  HWaddr 00:17:A4:99:8F:CA  

          BROADCAST MULTICAST  MTU:1500  Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000 

          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

          Interrupt:75 



and from dmesg all appears to be OK:
<6>eth13: Tigon3 [partno(BCM95700A6) rev 2100 PHY(5704)] (PCIX:66MHz:64-bit) 1

 10/100/1000Base-T Ethernet 00:17:a4:99:8f:ca

<6>eth13: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[1] TSOcap[1]

<6>eth13: dma_rwctrl[769f0000] dma_mask[64-bit]




I will attach the full anaconda log as an attachment.

Comment 8 Doug Chapman 2008-02-21 12:11:15 EST
Created attachment 295540 [details]
logfile from install showing errors on eth12 and eth13
Comment 9 David Cantrell 2008-02-21 15:32:55 EST
Doug,

I think I have found the problem you are hitting.  I need some time to work up
some patches (for anaconda and libdhcp), but I'll post here when I have more info.

Yes, this issue has been confusing to me.  Mostly because I have a lot of
networking related bug reports for RHEL5 right now and all of the bug reports
are valid, but the reporters are also hitting other networking bugs I already
know about...hence, thinking some are dupes and some aren't.

This particular failure is happening in nic_get_links() in nic.c in libdhcp,
which is how we are caching the netlink device information.  More information
when I have more to tell.

Thanks.
Comment 10 Bill Hayes 2008-02-26 13:05:40 EST
David,

I have setup another server with 26 Ethernet ports.  All the ports are Intel,
but some ports use the e1000e and some ports use the e1000 driver.  eth25,
eth22, eth19, eth17, eth13, eth12, eth11 and eth10 failed to come up.  eth9,
eth0 and eth4 would come up.  eth0-7 are e1000e and eth8-25 are e1000.

The failing ports in the anaconda.log behave the same as what Doug reported: 

17:25:49 INFO    : going to pick interface
17:27:57 INFO    : going to do getNetConfig
17:27:57 INFO    : eth25 is not a wireless adapter
17:27:58 DEBUG   : waiting for link eth25...
17:27:58 DEBUG   :    0 seconds.
17:27:58 DEBUG   : sleep (nicdelay) for 0 secs first
17:27:58 DEBUG   : continuing...
requesting dhcp timeout 45
17:27:58 ERROR   : nic_by_name: no interface named eth25 found
17:27:58 CRITICAL: dhcp_nic: net_get_by_name(eth25) failed
17:27:58 DEBUG   : dhcp: DHCP configuration failed
17:28:20 DEBUG   : waiting for link eth25...
17:28:20 DEBUG   :    0 seconds.
17:28:20 DEBUG   : sleep (nicdelay) for 0 secs first
17:28:20 DEBUG   : continuing...
requesting dhcp timeout 45
17:28:20 ERROR   : nic_by_name: no interface named eth25 found
17:28:20 CRITICAL: dhcp_nic: net_get_by_name(eth25) failed
17:28:20 DEBUG   : dhcp: DHCP configuration failed

I will attach the anaconda.log file next.

Bill

  │ eth0 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ↑ │  
  │ eth1 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ▮ │  
lo│ eth2 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ▒ │ 
lo│ eth3 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ▒ │ n
ed│ eth4 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ▒ │ 
lo│ eth5 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ↓ │ 
  │ eth6 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ↑ │  
ed│ eth7 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ▒ │ 
  │ eth8 - Intel Corporation 82546GB Gigabit Ethernet Controller           ▮ │  
  │ eth9 - Intel Corporation 82546GB Gigabit Ethernet Controller           ▒ │  
  │ eth10 - Intel Corporation 82546GB Gigabit Ethernet Controller          ▒ │  
lo│ eth11 - Intel Corporation 82546GB Gigabit Ethernet Controller          ↓ │ 
  │ eth12 - Intel Corporation 82546GB Gigabit Ethernet Controller          ↑ │  
  │ eth13 - Intel Corporation 82546GB Gigabit Ethernet Controller          ▒ │  
  │ eth14 - Intel Corporation 82546GB Gigabit Ethernet Controller          ▮ │  
  │ eth15 - Intel Corporation 82546GB Gigabit Ethernet Controller          ▒ │  
  │ eth16 - Intel Corporation 82546GB Gigabit Ethernet Controller          ▒ │  
lo│ eth17 - Intel Corporation 82546GB Gigabit Ethernet Controller          ↓ │ 
  │ eth18 - Intel Corporation 82546GB Gigabit Ethernet Controller          ↑ │  
  │ eth19 - Intel Corporation 82546GB Gigabit Ethernet Controller          ▒ │  
  │ eth20 - Intel Corporation 82546GB Gigabit Ethernet Controller          ▒ │  
  │ eth21 - Intel Corporation 82546GB Gigabit Ethernet Controller          ▮ │  
  │ eth22 - Intel Corporation 82546GB Gigabit Ethernet Controller          ▒ │  
lo│ eth23 - Intel Corporation 82546GB Gigabit Ethernet Controller          ↓ │  
  │ eth24 - Intel Corporation 82546GB Gigabit Ethernet Controller          ▒ │  
lo│ eth25 - Intel Corporation 82546GB Gigabit Ethernet Controller          ↓ │ 
Comment 11 Bill Hayes 2008-02-26 13:07:36 EST
Created attachment 295966 [details]
26 Ethernet port server - anconda.log file
Comment 12 David Cantrell 2008-02-28 14:26:25 EST
This should be fixed in anaconda-11.1.2.105-1 and later.
Comment 13 Bill Hayes 2008-03-05 17:30:51 EST
I tested the first drop of RHEL 5.2.  I had anaconda-11.1.2.105-1.  I was able
to bring up all 26 Ethernet ports on the same system that I used in comment #10.
 I will attach the anaconda.log file from this test.

billh@lart:~$ grep -i dhcprequest anaconda.log 
21:54:44 INFO    : DHCPREQUEST on eth0 to 255.255.255.255 port 67
21:55:47 INFO    : DHCPREQUEST on eth1 to 255.255.255.255 port 67
21:56:22 INFO    : DHCPREQUEST on eth2 to 255.255.255.255 port 67
21:57:04 INFO    : DHCPREQUEST on eth3 to 255.255.255.255 port 67
21:57:36 INFO    : DHCPREQUEST on eth4 to 255.255.255.255 port 67
21:58:11 INFO    : DHCPREQUEST on eth5 to 255.255.255.255 port 67
21:58:48 INFO    : DHCPREQUEST on eth6 to 255.255.255.255 port 67
21:59:27 INFO    : DHCPREQUEST on eth7 to 255.255.255.255 port 67
22:00:02 INFO    : DHCPREQUEST on eth8 to 255.255.255.255 port 67
22:00:36 INFO    : DHCPREQUEST on eth9 to 255.255.255.255 port 67
22:01:10 INFO    : DHCPREQUEST on eth10 to 255.255.255.255 port 67
22:01:40 INFO    : DHCPREQUEST on eth11 to 255.255.255.255 port 67
22:02:23 INFO    : DHCPREQUEST on eth12 to 255.255.255.255 port 67
22:02:59 INFO    : DHCPREQUEST on eth13 to 255.255.255.255 port 67
22:03:34 INFO    : DHCPREQUEST on eth14 to 255.255.255.255 port 67
22:04:08 INFO    : DHCPREQUEST on eth15 to 255.255.255.255 port 67
22:04:45 INFO    : DHCPREQUEST on eth16 to 255.255.255.255 port 67
22:05:40 INFO    : DHCPREQUEST on eth17 to 255.255.255.255 port 67
22:06:15 INFO    : DHCPREQUEST on eth18 to 255.255.255.255 port 67
22:06:58 INFO    : DHCPREQUEST on eth19 to 255.255.255.255 port 67
22:07:31 INFO    : DHCPREQUEST on eth20 to 255.255.255.255 port 67
22:08:14 INFO    : DHCPREQUEST on eth21 to 255.255.255.255 port 67
22:08:52 INFO    : DHCPREQUEST on eth22 to 255.255.255.255 port 67
22:09:28 INFO    : DHCPREQUEST on eth23 to 255.255.255.255 port 67
22:10:03 INFO    : DHCPREQUEST on eth24 to 255.255.255.255 port 67
22:10:39 INFO    : DHCPREQUEST on eth25 to 255.255.255.255 port 67
billh@lart:~$ 
Comment 14 Bill Hayes 2008-03-05 17:31:56 EST
Created attachment 296953 [details]
anaconda.log
Comment 15 Bill Hayes 2008-03-05 17:59:39 EST
In case my #13 update is unclear, the first drop of RHEL 5.2 fixes this problem.
  I was able to bring up all 26 Ethernet ports and get a DHCP addresses for each
port.

I will try this on other systems also.

Comment 16 Bill Hayes 2008-03-05 20:03:33 EST
I tried a smaller server with 14 Ethernet ports and all the ports came up just
fine and got DHCP addresses.

  │ eth0 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ↑ │  
  │ eth1 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ▮ │  
  │ eth2 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ▒ │  
  │ eth3 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ▒ │  
  │ eth4 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ▒ │  
  │ eth5 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ↓ │  
  │ eth6 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ↑ │  
  │ eth7 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)  ▒ │  
  │ eth8 - Digital Equipment Corporation DECchip 21142/43                  ▮ │  
  │ eth9 - Digital Equipment Corporation DECchip 21142/43                  ▒ │  
  │ eth10 - Digital Equipment Corporation DECchip 21142/43                 ▒ │  
  │ eth11 - Digital Equipment Corporation DECchip 21142/43                 ↓ │  
  │ eth12 - Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet        ▮ │  
  │ eth13 - Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet        ↓ │  
Comment 17 Bill Hayes 2008-03-06 16:10:34 EST
David,

I just tested a blade with 16 Ethernet ports and it was also fine.  All of these
hardware configurations would have failed on RHEL 5 or RHEL 5.1.  Thanks for the
fix.

Bill

   │ eth0 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter   ↑ │  
  │ eth1 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter   ▮ │  
  │ eth2 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter   ▒ │  
  │ eth3 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter   ▒ │  
  │ eth4 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter   ▒ │  
  │ eth5 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter   ↓ │  
  │ eth6 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter   ↑ │  
  │ eth7 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter   ▒ │  
  │ eth8 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter   ▒ │  
  │ eth9 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter   ▮ │  
  │ eth10 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter  ▒ │  
  │ eth11 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter  ↓ │
  │ eth12 - Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet       ▒ │  
  │ eth13 - Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet       ▒ │  
  │ eth14 - Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet       ▮ │  
  │ eth15 - Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet       ↓ │  
Comment 18 David Cantrell 2008-03-06 16:18:47 EST
Bill,

Thanks for all the feedback and thanks for testing this out.  Glad to hear the fixes are working.
Comment 22 Andrius Benokraitis 2008-03-07 12:55:05 EST
*** Bug 320841 has been marked as a duplicate of this bug. ***
Comment 26 errata-xmlrpc 2008-05-21 11:32:44 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0397.html

Note You need to log in before you can comment on or make changes to this bug.