This service will be undergoing maintenance at 00:00 UTC, 2016-09-28. It is expected to last about 1 hours
Bug 228825 - e1000 driver does not work properly with Tyan Tempest i5000PX (S5380) onboard nic
e1000 driver does not work properly with Tyan Tempest i5000PX (S5380) onboard...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Andy Gospodarek
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-02-15 08:06 EST by Magnus Pfeffer
Modified: 2014-06-29 18:58 EDT (History)
6 users (show)

See Also:
Fixed In Version: 2.6.9-55.ELsmp
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-06-27 17:15:22 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
lspci output plain, verbose and numerical (16.80 KB, application/octet-stream)
2007-02-15 08:06 EST, Magnus Pfeffer
no flags Details
es2kick.patch (1.99 KB, patch)
2007-03-05 15:18 EST, Andy Gospodarek
no flags Details | Diff
Ethereal screenshot (121.23 KB, image/gif)
2007-03-12 11:40 EDT, Magnus Pfeffer
no flags Details

  None (edit)
Description Magnus Pfeffer 2007-02-15 08:06:19 EST
Description of problem:

We are using Tyan Tempest i5000PX (S5380) server mainboards with RHEL 4.

The onboard NICs are recognized by the e1000 driver, but only work with speeds
up to 100 MBit. If the NICs are connected to a gigabit switch, the network link
switches to 1000 MBit full duplex (according to the kernel log), but the
connection is not usable. Neither incoming nor outgoing connections are possible.

We upgraded to the latest RHEL kernel 2.6.9-42.0.8.ELsmp but the problem persists.

Knoppix/Debian kernels (2.6.18) do not show this behaviour, the NICs work
properly at all speeds (10/100/1000).

Version-Release number of selected component (if applicable):

See attached lspci output.


How reproducible:
Easily.

Steps to Reproduce:
1. Connect NIC to gigabit switch
2. Observe complete non-connectivity
3. 
  
Actual results:
No connectivity.

Expected results:
Connectivity at gigabit speed.

Additional info:
Comment 1 Magnus Pfeffer 2007-02-15 08:06:19 EST
Created attachment 148108 [details]
lspci output plain, verbose and numerical
Comment 2 John W. Linville 2007-02-28 14:53:34 EST
Can we see the output of ethtool and mii-tool on the NICs in question?
Comment 3 Magnus Pfeffer 2007-03-01 11:37:05 EST
(In reply to comment #2)
> Can we see the output of ethtool and mii-tool on the NICs in question?

With working 100MBit link:

[root@aleph oracle_tables]# mii-tool -v
eth0: negotiated 100baseTx-FD flow-control, link ok
  product info: vendor 00:50:43, model 10 rev 2
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
  link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
eth1: no link
  product info: vendor 00:50:43, model 10 rev 2
  basic mode:   autonegotiation enabled
  basic status: no link
  capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

[root@aleph oracle_tables]# ethtool eth0
Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: umbg
        Wake-on: g
        Current message level: 0x00000007 (7)
        Link detected: yes

Comment 4 Andy Gospodarek 2007-03-05 15:18:12 EST
Created attachment 149285 [details]
es2kick.patch

There is an upstream patch that might address this issue, but I'm not certain. 
Here's the description:

commit bb8e3311ef9de8e72f45f910e4a977c313c7009c
Author: Jeff Garzik <jeff@garzik.org>
Date:	Fri Dec 15 11:06:17 2006 -0500

    e1000: workaround for the ESB2 NIC RX unit issue

    In rare occasions, ESB2 systems would end up started without the RX
    unit being turned on. Add a check that runs post-init to work around
    this issue.

    Originally from Jesse Brandeburg <jesse.brandeburg@intel.com>,
    rewritten to use feature flags by me.

    Signed-off-by: Jeff Garzik <jeff@garzik.org>

Can you tell me if you narrowed this problem down to one that is related to RX,
TX, or both?  For example, can you receive frames by running tcpdump/wireshark
on this interface?  What about generating some traffic with 'arping' and
checking whether or not it came out onto the wire?  This might help me since I
don't have that specific hardware around.
Comment 5 Magnus Pfeffer 2007-03-12 11:40:39 EDT
Created attachment 149828 [details]
Ethereal screenshot
Comment 6 Magnus Pfeffer 2007-03-12 11:56:56 EDT
Both send and receive seem to work, but the TCP connection gets out of step
after a few packets. See attached ethereal screenshot.
Comment 7 Neil Horman 2007-03-12 13:24:41 EDT
Do you have firewalls running on either the .45 or the .53 host?  Can you turn
them off for the purposes of testing.  The screenshot you are providing suggests
that you have an iptables rule running that is misbehaving and dropping some tcp
frames that it shouldn't be.
Comment 8 Magnus Pfeffer 2007-03-15 03:22:56 EDT
There is no firewall running on the servers in question. There are no iptable
rules set. We can send you a full tcpdump log file, but there is little more to
see than in the already posted screenshot: TCP connections do not work once the
server is connected to a gigabit switch. 

I'd like to repeat: Simply plugging the server into a 100 MBit switch solves the
problem completely. With a debian/knoppix kernel gigabit connections work with
no problems at all.

Comment 9 Andy Gospodarek 2007-03-15 09:46:42 EDT
Magnus,

Thanks for the information.  I don't see any patches that immediately address
this issue, but I will keep looking.  

As a data point, could you disable TSO on the Tyan system and see if that helps?
 You can do this with ethtool:

ethtool -K ethX tso [on|off]

Thanks.
Comment 10 Magnus Pfeffer 2007-03-23 04:52:20 EDT
Hello,

we tried to disable TSO as suggested and tried a few other ethtool switches for
good measure. The issue remains the same.

Yours,

Magnus Pfeffer
Comment 11 Magnus Pfeffer 2007-03-26 03:18:16 EDT
Andy,

as the server is supposed to enter productive use in May, we decided to buy an
additional PCIe Gigabit LAN card. Can you suggest a maker/model that would
definitely work with RHEL AS 4.0? The hardware compatibility lists we found only
listed complete systems.

Thanks,

Magnus
Comment 12 Magnus Pfeffer 2007-03-26 08:13:39 EDT
Hello,

using the latest test kernel from
http://people.redhat.com/linville/kernels/rhel4/ solved the problem. 

Yours,

Magnus
Comment 13 Andy Gospodarek 2007-03-26 09:43:39 EDT
That is excellent news.  There are no patches in Linville's latest test kernels
that won't appear in the next update, so this should be resolved in 4.5.  If you
would like to test kernels to be sure, you can grab them here:

http://people.redhat.com/jbaron/rhel4/
Comment 14 Andy Gospodarek 2007-05-03 10:58:28 EDT
Have the new kernels for RHEL 4.5 resolved this issue?

Comment 15 Magnus Pfeffer 2007-06-26 12:09:46 EDT
Kernel 2.6.9-55.ELsmp fixed the issue. Please close the bug.

Thanks for the support.

Note You need to log in before you can comment on or make changes to this bug.