Description of problem: On systems with more than 12 Ethernet ports, the Anaconda stage 1 installer does NOT work with network installs on Ethernet ports with port above eth10. Generally eth0 thru eth9 will work fine, but eth10 and above will not work. The problem does not show up on systems with 8 or less Ethernet ports. The problem generally is that you can't bring up ports above eth10 on the "Configure TCP/IP" within the stage 1 installer. Sometimes you can bring up the port by trying "Configure TCP/IP" a second time. In other hardware configuration retrying will not bring up the port. I have seen this failure occur on Ethernet hardware that used the e1000, tg3 and s2io hardware. This appears to be an Anaconda problem and not related to a specific Ethernet hardware or Linux driver. How reproducible: I have seen this problem on very large server (like 256 CPUs) but I have also seen this problem on a very small blade server with 16 Ethernet ports in the system. We have reported this problem twice before but in general we believed that the problem was hardware related. See https://bugzilla.redhat.com/show_bug.cgi?id=320841 and https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=108235&header_entry=1. My recenting testing in the RH IT 108235 problem, showed that the problem actually started at eth11 and was not directly related to the s2io driver. On a system with the following Ethernet configuration: eth0 - Intel Corporation 82546GB Gigabit Ethernet Controller eth1 - Intel Corporation 82546GB Gigabit Ethernet Controller eth2 - Intel Corporation 82546GB Gigabit Ethernet Controller eth3 - Intel Corporation 82546GB Gigabit Ethernet Controller eth4 - Intel Corporation 82571EB Gigabit Ethernet Controller eth5 - Intel Corporation 82571EB Gigabit Ethernet Controller eth6 - Intel Corporation 82546GB Gigabit Ethernet Controller eth7 - Intel Corporation 82546GB Gigabit Ethernet Controller eth8 - Intel Corporation 82546GB Gigabit Ethernet Controller eth9 - Intel Corporation 82546GB Gigabit Ethernet Controller eth10 - Intel Corporation 82571EB Gigabit Ethernet Controller eth11 - Intel Corporation 82571EB Gigabit Ethernet Controller eth12 - Intel Corporation 82571EB Gigabit Ethernet Controller eth13 - Intel Corporation 82571EB Gigabit Ethernet Controller eth14 - Intel Corporation 82571EB Gigabit Ethernet Controller eth15 - Intel Corporation 82571EB Gigabit Ethernet Controller eth16 - Intel Corporation 82571EB Gigabit Ethernet Controller eth17 - Intel Corporation 82571EB Gigabit Ethernet Controller eth18 - Intel Corporation 82546GB Gigabit Ethernet Controller eth19 - Intel Corporation 82546GB Gigabit Ethernet Controller eth20 - Intel Corporation 82546GB Gigabit Ethernet Controller eth21 - Intel Corporation 82546GB Gigabit Ethernet Controller eth22 - S2io Inc. Xframe 10 Gigabit Ethernet PCI-X eth23 - S2io Inc. Xframe II 10Gbps Ethernet eth24 - S2io Inc. Xframe II 10Gbps Ethernet eth25 - S2io Inc. Xframe 10 Gigabit Ethernet PCI-X Ports eth0 thru eth10 would come up fine. Ports eth11 thru eth25 would not come up at all. We would like to be able to support 32 Ethernet ports and have install from any of those ports work. Version-Release number of selected component (if applicable): We have seen this problem since RHEL 5.0. The problem does not show up on RHEL 4. I have also seen this problem on Fedora 8. Steps to Reproduce: 1. Configure a system with 12 or more Ethernet ports 2. Get into the stage 1 Anaconda installer 3. Try to bring up each of the Ethernet ports with the "Configure TCP/IP" screen. Use 'Back' to go back to select the next Ethernet port. Actual results: Not all of the Ethernet ports will work within the stage 1 Anaconda installer. Expected results: All of the Ethernet ports will work within the stage 1 Anaconda installer.
See Issue Tracker: https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=161661
Are we sure this isn't a dupe of 303681? I committed a patch on February 5th to fix that: commit 0dcf8192c048324b718c3b0c2d212d1dfa584ac4 Author: David Cantrell <dcantrell> Date: Tue Feb 5 12:15:36 2008 -1000 Use libnl to read MAC and IP addresses (#303681). This patches reduces nl.c in libisys to just what we need to talk to libnl. libnl provides the netlink cache for interfaces and should allow us to see all NICs in the system and gather the MAC and IP addresses for each. Can someone try a current RHEL 5.2 nightly on a system with more than 10 NICs?
*** This bug has been marked as a duplicate of 303681 ***
Ronald, as per previous comment, looks like this issue is resolved. clearing requires_release_notes flag. please reset requires_release_notes flag if this issue is unresolved and needs to be documented (please include workarounds, if any).
I am re-opening this bug as it is _not_ the same as the issue it was closed as a dup of. More info to follow shortly....
The problem appears to be with "high numbered tg3 devices". I.e. when a tg3 device has an eth number of > ~10 anaconda stage1 fails to be able to configure it to be used to install over. There has been much confusion on this issue. I am NOT saying that you cannot set up the device to be configured for the installed os. I am saying that in stage 1 if you want to do and http/nfs/ftp etc install over this device THAT is what fails (sorry for being blunt but there has been much confusion here). What happens is you are prompted for the list of available network devices as expected, in this case the tg3 devies are eth12 and eth13. When I try to select either of them, then on the next page try to configure with dhcp, then click OK it tries to get a dhcp address then just goes back to that same page (and yes, our network is configured to provide dhcp to these devices). If I select on of the other devices it works OK. If we removed some of the other devices so the same tg3 cards are still there but they have lower ethX numbers they work OK. Looking at the anaconda logfile I find this: 16:53:23 ERROR : nic_by_name: no interface named eth13 found 16:53:23 CRITICAL: dhcp_nic: net_get_by_name(eth13) failed 16:53:23 DEBUG : dhcp: DHCP configuration failed 16:53:28 DEBUG : waiting for link eth13... 16:53:28 DEBUG : 0 seconds. 16:53:28 DEBUG : sleep (nicdelay) for 0 secs first 16:53:28 DEBUG : continuing... the device however does appear to be there. This is from dropping to a shell in anaconda stage2: sh-3.2# ifconfig eth13 eth13 Link encap:Ethernet HWaddr 00:17:A4:99:8F:CA BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:75 and from dmesg all appears to be OK: <6>eth13: Tigon3 [partno(BCM95700A6) rev 2100 PHY(5704)] (PCIX:66MHz:64-bit) 1 10/100/1000Base-T Ethernet 00:17:a4:99:8f:ca <6>eth13: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[1] TSOcap[1] <6>eth13: dma_rwctrl[769f0000] dma_mask[64-bit] I will attach the full anaconda log as an attachment.
Created attachment 295540 [details] logfile from install showing errors on eth12 and eth13
Doug, I think I have found the problem you are hitting. I need some time to work up some patches (for anaconda and libdhcp), but I'll post here when I have more info. Yes, this issue has been confusing to me. Mostly because I have a lot of networking related bug reports for RHEL5 right now and all of the bug reports are valid, but the reporters are also hitting other networking bugs I already know about...hence, thinking some are dupes and some aren't. This particular failure is happening in nic_get_links() in nic.c in libdhcp, which is how we are caching the netlink device information. More information when I have more to tell. Thanks.
David, I have setup another server with 26 Ethernet ports. All the ports are Intel, but some ports use the e1000e and some ports use the e1000 driver. eth25, eth22, eth19, eth17, eth13, eth12, eth11 and eth10 failed to come up. eth9, eth0 and eth4 would come up. eth0-7 are e1000e and eth8-25 are e1000. The failing ports in the anaconda.log behave the same as what Doug reported: 17:25:49 INFO : going to pick interface 17:27:57 INFO : going to do getNetConfig 17:27:57 INFO : eth25 is not a wireless adapter 17:27:58 DEBUG : waiting for link eth25... 17:27:58 DEBUG : 0 seconds. 17:27:58 DEBUG : sleep (nicdelay) for 0 secs first 17:27:58 DEBUG : continuing... requesting dhcp timeout 45 17:27:58 ERROR : nic_by_name: no interface named eth25 found 17:27:58 CRITICAL: dhcp_nic: net_get_by_name(eth25) failed 17:27:58 DEBUG : dhcp: DHCP configuration failed 17:28:20 DEBUG : waiting for link eth25... 17:28:20 DEBUG : 0 seconds. 17:28:20 DEBUG : sleep (nicdelay) for 0 secs first 17:28:20 DEBUG : continuing... requesting dhcp timeout 45 17:28:20 ERROR : nic_by_name: no interface named eth25 found 17:28:20 CRITICAL: dhcp_nic: net_get_by_name(eth25) failed 17:28:20 DEBUG : dhcp: DHCP configuration failed I will attach the anaconda.log file next. Bill │ eth0 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ↑ │ │ eth1 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ▮ │ lo│ eth2 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ▒ │ lo│ eth3 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ▒ │ n ed│ eth4 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ▒ │ lo│ eth5 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ↓ │ │ eth6 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ↑ │ ed│ eth7 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ▒ │ │ eth8 - Intel Corporation 82546GB Gigabit Ethernet Controller ▮ │ │ eth9 - Intel Corporation 82546GB Gigabit Ethernet Controller ▒ │ │ eth10 - Intel Corporation 82546GB Gigabit Ethernet Controller ▒ │ lo│ eth11 - Intel Corporation 82546GB Gigabit Ethernet Controller ↓ │ │ eth12 - Intel Corporation 82546GB Gigabit Ethernet Controller ↑ │ │ eth13 - Intel Corporation 82546GB Gigabit Ethernet Controller ▒ │ │ eth14 - Intel Corporation 82546GB Gigabit Ethernet Controller ▮ │ │ eth15 - Intel Corporation 82546GB Gigabit Ethernet Controller ▒ │ │ eth16 - Intel Corporation 82546GB Gigabit Ethernet Controller ▒ │ lo│ eth17 - Intel Corporation 82546GB Gigabit Ethernet Controller ↓ │ │ eth18 - Intel Corporation 82546GB Gigabit Ethernet Controller ↑ │ │ eth19 - Intel Corporation 82546GB Gigabit Ethernet Controller ▒ │ │ eth20 - Intel Corporation 82546GB Gigabit Ethernet Controller ▒ │ │ eth21 - Intel Corporation 82546GB Gigabit Ethernet Controller ▮ │ │ eth22 - Intel Corporation 82546GB Gigabit Ethernet Controller ▒ │ lo│ eth23 - Intel Corporation 82546GB Gigabit Ethernet Controller ↓ │ │ eth24 - Intel Corporation 82546GB Gigabit Ethernet Controller ▒ │ lo│ eth25 - Intel Corporation 82546GB Gigabit Ethernet Controller ↓ │
Created attachment 295966 [details] 26 Ethernet port server - anconda.log file
This should be fixed in anaconda-11.1.2.105-1 and later.
I tested the first drop of RHEL 5.2. I had anaconda-11.1.2.105-1. I was able to bring up all 26 Ethernet ports on the same system that I used in comment #10. I will attach the anaconda.log file from this test. billh@lart:~$ grep -i dhcprequest anaconda.log 21:54:44 INFO : DHCPREQUEST on eth0 to 255.255.255.255 port 67 21:55:47 INFO : DHCPREQUEST on eth1 to 255.255.255.255 port 67 21:56:22 INFO : DHCPREQUEST on eth2 to 255.255.255.255 port 67 21:57:04 INFO : DHCPREQUEST on eth3 to 255.255.255.255 port 67 21:57:36 INFO : DHCPREQUEST on eth4 to 255.255.255.255 port 67 21:58:11 INFO : DHCPREQUEST on eth5 to 255.255.255.255 port 67 21:58:48 INFO : DHCPREQUEST on eth6 to 255.255.255.255 port 67 21:59:27 INFO : DHCPREQUEST on eth7 to 255.255.255.255 port 67 22:00:02 INFO : DHCPREQUEST on eth8 to 255.255.255.255 port 67 22:00:36 INFO : DHCPREQUEST on eth9 to 255.255.255.255 port 67 22:01:10 INFO : DHCPREQUEST on eth10 to 255.255.255.255 port 67 22:01:40 INFO : DHCPREQUEST on eth11 to 255.255.255.255 port 67 22:02:23 INFO : DHCPREQUEST on eth12 to 255.255.255.255 port 67 22:02:59 INFO : DHCPREQUEST on eth13 to 255.255.255.255 port 67 22:03:34 INFO : DHCPREQUEST on eth14 to 255.255.255.255 port 67 22:04:08 INFO : DHCPREQUEST on eth15 to 255.255.255.255 port 67 22:04:45 INFO : DHCPREQUEST on eth16 to 255.255.255.255 port 67 22:05:40 INFO : DHCPREQUEST on eth17 to 255.255.255.255 port 67 22:06:15 INFO : DHCPREQUEST on eth18 to 255.255.255.255 port 67 22:06:58 INFO : DHCPREQUEST on eth19 to 255.255.255.255 port 67 22:07:31 INFO : DHCPREQUEST on eth20 to 255.255.255.255 port 67 22:08:14 INFO : DHCPREQUEST on eth21 to 255.255.255.255 port 67 22:08:52 INFO : DHCPREQUEST on eth22 to 255.255.255.255 port 67 22:09:28 INFO : DHCPREQUEST on eth23 to 255.255.255.255 port 67 22:10:03 INFO : DHCPREQUEST on eth24 to 255.255.255.255 port 67 22:10:39 INFO : DHCPREQUEST on eth25 to 255.255.255.255 port 67 billh@lart:~$
Created attachment 296953 [details] anaconda.log
In case my #13 update is unclear, the first drop of RHEL 5.2 fixes this problem. I was able to bring up all 26 Ethernet ports and get a DHCP addresses for each port. I will try this on other systems also.
I tried a smaller server with 14 Ethernet ports and all the ports came up just fine and got DHCP addresses. │ eth0 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ↑ │ │ eth1 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ▮ │ │ eth2 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ▒ │ │ eth3 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ▒ │ │ eth4 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ▒ │ │ eth5 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ↓ │ │ eth6 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ↑ │ │ eth7 - Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) ▒ │ │ eth8 - Digital Equipment Corporation DECchip 21142/43 ▮ │ │ eth9 - Digital Equipment Corporation DECchip 21142/43 ▒ │ │ eth10 - Digital Equipment Corporation DECchip 21142/43 ▒ │ │ eth11 - Digital Equipment Corporation DECchip 21142/43 ↓ │ │ eth12 - Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet ▮ │ │ eth13 - Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet ↓ │
David, I just tested a blade with 16 Ethernet ports and it was also fine. All of these hardware configurations would have failed on RHEL 5 or RHEL 5.1. Thanks for the fix. Bill │ eth0 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter ↑ │ │ eth1 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter ▮ │ │ eth2 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter ▒ │ │ eth3 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter ▒ │ │ eth4 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter ▒ │ │ eth5 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter ↓ │ │ eth6 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter ↑ │ │ eth7 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter ▒ │ │ eth8 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter ▒ │ │ eth9 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter ▮ │ │ eth10 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter ▒ │ │ eth11 - Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter ↓ │ │ eth12 - Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet ▒ │ │ eth13 - Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet ▒ │ │ eth14 - Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet ▮ │ │ eth15 - Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet ↓ │
Bill, Thanks for all the feedback and thanks for testing this out. Glad to hear the fixes are working.
*** Bug 320841 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0397.html