Description of problem: We are using Tyan Tempest i5000PX (S5380) server mainboards with RHEL 4. The onboard NICs are recognized by the e1000 driver, but only work with speeds up to 100 MBit. If the NICs are connected to a gigabit switch, the network link switches to 1000 MBit full duplex (according to the kernel log), but the connection is not usable. Neither incoming nor outgoing connections are possible. We upgraded to the latest RHEL kernel 2.6.9-42.0.8.ELsmp but the problem persists. Knoppix/Debian kernels (2.6.18) do not show this behaviour, the NICs work properly at all speeds (10/100/1000). Version-Release number of selected component (if applicable): See attached lspci output. How reproducible: Easily. Steps to Reproduce: 1. Connect NIC to gigabit switch 2. Observe complete non-connectivity 3. Actual results: No connectivity. Expected results: Connectivity at gigabit speed. Additional info:
Created attachment 148108 [details] lspci output plain, verbose and numerical
Can we see the output of ethtool and mii-tool on the NICs in question?
(In reply to comment #2) > Can we see the output of ethtool and mii-tool on the NICs in question? With working 100MBit link: [root@aleph oracle_tables]# mii-tool -v eth0: negotiated 100baseTx-FD flow-control, link ok product info: vendor 00:50:43, model 10 rev 2 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control eth1: no link product info: vendor 00:50:43, model 10 rev 2 basic mode: autonegotiation enabled basic status: no link capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control [root@aleph oracle_tables]# ethtool eth0 Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: umbg Wake-on: g Current message level: 0x00000007 (7) Link detected: yes
Created attachment 149285 [details] es2kick.patch There is an upstream patch that might address this issue, but I'm not certain. Here's the description: commit bb8e3311ef9de8e72f45f910e4a977c313c7009c Author: Jeff Garzik <jeff> Date: Fri Dec 15 11:06:17 2006 -0500 e1000: workaround for the ESB2 NIC RX unit issue In rare occasions, ESB2 systems would end up started without the RX unit being turned on. Add a check that runs post-init to work around this issue. Originally from Jesse Brandeburg <jesse.brandeburg>, rewritten to use feature flags by me. Signed-off-by: Jeff Garzik <jeff> Can you tell me if you narrowed this problem down to one that is related to RX, TX, or both? For example, can you receive frames by running tcpdump/wireshark on this interface? What about generating some traffic with 'arping' and checking whether or not it came out onto the wire? This might help me since I don't have that specific hardware around.
Created attachment 149828 [details] Ethereal screenshot
Both send and receive seem to work, but the TCP connection gets out of step after a few packets. See attached ethereal screenshot.
Do you have firewalls running on either the .45 or the .53 host? Can you turn them off for the purposes of testing. The screenshot you are providing suggests that you have an iptables rule running that is misbehaving and dropping some tcp frames that it shouldn't be.
There is no firewall running on the servers in question. There are no iptable rules set. We can send you a full tcpdump log file, but there is little more to see than in the already posted screenshot: TCP connections do not work once the server is connected to a gigabit switch. I'd like to repeat: Simply plugging the server into a 100 MBit switch solves the problem completely. With a debian/knoppix kernel gigabit connections work with no problems at all.
Magnus, Thanks for the information. I don't see any patches that immediately address this issue, but I will keep looking. As a data point, could you disable TSO on the Tyan system and see if that helps? You can do this with ethtool: ethtool -K ethX tso [on|off] Thanks.
Hello, we tried to disable TSO as suggested and tried a few other ethtool switches for good measure. The issue remains the same. Yours, Magnus Pfeffer
Andy, as the server is supposed to enter productive use in May, we decided to buy an additional PCIe Gigabit LAN card. Can you suggest a maker/model that would definitely work with RHEL AS 4.0? The hardware compatibility lists we found only listed complete systems. Thanks, Magnus
Hello, using the latest test kernel from http://people.redhat.com/linville/kernels/rhel4/ solved the problem. Yours, Magnus
That is excellent news. There are no patches in Linville's latest test kernels that won't appear in the next update, so this should be resolved in 4.5. If you would like to test kernels to be sure, you can grab them here: http://people.redhat.com/jbaron/rhel4/
Have the new kernels for RHEL 4.5 resolved this issue?
Kernel 2.6.9-55.ELsmp fixed the issue. Please close the bug. Thanks for the support.