Bug 462373

Summary: Infamous r8169 (PCI/PCCard) link autoneg issue: Chicken-egg
Product: Red Hat Enterprise Linux 5 Reporter: Bryan J Smith <brsmith>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED DEFERRED QA Contact: Martin Jenner <mjenner>
Severity: low Docs Contact:
Priority: low    
Version: 5.2CC: awaizman, fche, ivecera, james
Target Milestone: rc   
Target Release: ---   
Hardware: i386   
OS: Linux   
URL: ftp://66.104.77.130/cn/nic/r8169-6.007.00.tar.bz2
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-10-29 22:04:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bryan J Smith 2008-09-15 18:14:21 UTC
Description of problem:

Various auto-negotiation (autoneg) continue to plague newer Realtek RTL8110S-32, RTL8110SB(L), RTL8169SB(L), RTL8169SC(L) and RTL8169 PCI and PCCard (aka CardBus) products (likely PCIe and ExpressCard as well, but not tested here, only PCI/PCCard).  A known "workaround" is to disable autoneg at driver load.

The downloadable Realtek r8169 driver version 6.007.00-NAPI (PCI/PCCard) release from 2008-08-07 includes a "deprecated" option "autoneg".  When "autoneg=0" is set in an options line in /etc/modprobe.conf (e.g., "options r8169 autoneg=0"), it _prevents_ the infamous up/down/up/down and lack of link status, and works (as well as offers full GbE support).

The included Red Hat Enterprise Linux (RHEL) 5.2 "r8169" kernel releases 2.6.18-92.1.1.el5 (r8169 driver version 2.2LK-NAPI) and earlier (tested 2.6.18-53.1.21.el5) do not offer this "deprecated" option.  Attempts to disable autoneg post-insert/load have no effect (e.g., "ethtool -s eth2 autoneg off" returns an error about eth2 not existing, even if "r8169" is loaded).

Other attempts to add to ifup and other scripts also have no effect -- again, chicken-egg.  E.g., one Ubuntu user is able to add this to their interface script:  
http://adam.rosi-kessel.org/weblog/2008/06/21/a-much-simpler-fix-for-the-r8169-link-down-problem  

This may _not_ work in all cases, and some users have noted it has not worked for Ubuntu.  I.e., in my case, the second the kernel module loads, it _refuses_ to take any autoneg changes via ethtool.  So it's unlikely that any network-script modification will address it.

ADDITIONAL NOTE:  Once the device enters the up/down non-sense, the system will have all sorts of issues (loops?).  I.e., in the case of this PCCard, removing the card often doesn't work, or can actually hang or make the system unstable.  Correspondingly, the "r8139" driver will not unload ... _ever_.  Attempting to manually use PCCard CTL functions do not make any difference.

Version-Release number of selected component (if applicable):
2.6.18-92.1.1.el5 (r8169 2.2LK-NAPI)
2.6.18-53.1.21.el5

How reproducible:
Always with:  
- 2.6.18-53.21.1.el5 with included r8169
- 2.6.18-92.1.1.el5 with updated r8169 from Realtek (no options) 
- 2.6.18-53.21.1.el5 with included r8169
- 2.6.18-92.1.1.el5 with updated r8169 from Realtek (no options)

Using the updated r8169 (6.007.00-NAPI) and the following option in /etc/modprobe.conf prior to load:  
  options r8169 autoneg=0

Works without issue, including -- in my case -- full PCCard CTL remove/insert, power-down, etc...

Steps to Reproduce:
1.  Insert PCCard (see following info) 
2.  LEDs flicker too fast for eye (rarely catch)
3.  /var/log/messages shows up/down, up/down (see following)

Actual results:

  kernel: pccard: CardBus card inserted into slot 0
  kernel: r8169 Gigabit Ethernet driver 2.2LK-NAPI loaded
  kernel: PCI: Enabling device 0000:16:00.0 (0000 -> 0003)
  kernel: ACPI: PCI Interrupt 0000:16:00.0[A] -> GSI 16 (level, low) -> IRQ 201
  kernel: eth2: RTL8169 at 0xf8e78000, 00:13:3b:02:b4:50, IRQ 201
  kernel: r8169: eth2: link up
  kernel: r8169: eth2: link down
  kernel: r8169: eth2: link up
  kernel: r8169: eth2: link down
  (several hundreds per second)

Expected results:

Output in /var/log/messages with updated r8169 (6.007.00-NAPI) and deprecated /etc/modprobe.conf option (e.g., "options r8169 autoneg=0"):  

  kernel: pccard: CardBus card inserted into slot 0
  kernel: r8169 Gigabit Ethernet driver 6.007.00-NAPI loaded
  kernel: PCI: Enabling device 0000:16:00.0 (0000 -> 0003)
  kernel: ACPI: PCI Interrupt 0000:16:00.0[A] -> GSI 16 (level, low) -> IRQ 201
  kernel: r8169 0000:16:00.0: no MSI. Back to INTx.
  kernel: r8169: This product is covered by one or more of the following patents: US5,307,459, US5,434,872, US 5,732,094, US6,570,884, US6,115,776, and US6,327,625.
  kernel: eth2: RTL8169S/8110S at 0xf8bac000, 00:13:3b:02:b4:50, IRQ 201
  kernel: r8169: eth2: link up
  kernel: r8169: eth2: link up

Additional info:

Have noted the following, recent BZ entries for this driver in RHEL 5:  
  BZ452761  r8169 driver broken in 2.6.18-92+ kernels (I still have issue in 2.6.18-53)
  BZ453563  RTL8111/8168B network card does not work (some have related issues?)

PERSONAL VIEWPOINT:  Despite the "deprecated" nature of kernel module options that are redundant with ethtool and preference for ethtool, the "autoneg" option should probably be made availble in (with the "deprecated" notice, of course) the r8169 driver for now, because of these link issues.  This is likely an upstream consideration, and not Red Hat, but I wanted to make it.

Card info ...

# lspci -vvv
...
16:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
        Subsystem: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 64 (8000ns min, 16000ns max), Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 201
        Region 0: I/O ports at 9000 [size=256]
        Region 1: Memory at e6000000 (32-bit, non-prefetchable) [size=512]
        [virtual] Expansion ROM at e0000000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

# pccardctl ls -vvv
Socket 0 Bridge:        [yenta_cardbus]         (bus ID: 0000:15:00.0)
        Configuration:  state: on       ready: yes
                        Voltage: 3.3V Vcc: 3.3V Vpp: 3.3V
                        Available IRQs: 3, 4, 5, 6, 7, 10, 11
                        Available ioports:      0x00000100 - 0x000003af
                                                0x000003e0 - 0x000004ff
                                                0x00000820 - 0x000008ff
                                                0x00000a00 - 0x00000aff
                                                0x00000c00 - 0x00000cf7
                                                0x00009000 - 0x0000cfff
                        Available iomem:        0x000c0000 - 0x000fffff
                                                0x60000000 - 0x60ffffff
                                                0xa0000000 - 0xa0ffffff
                                                0xe0000000 - 0xe3ffffff
                                                0xe4300000 - 0xe7ffffff
  CardBus card -- see "lspci" for more information

OTHER NOTE (Switch):

PCCard was plugged into a Vitesse VSC7385 () switch.  With the integrated drivers, it reported a 100Mbps connection.  With the Realtek upgraded driver, with the autoneg=0, the switch reported a 1000Mbps (1Gbps) connection.

Comment 1 Bryan J Smith 2008-09-15 20:18:41 UTC
Additional issue, PCCard is unreliable (lost packets).  Used different combinations of Speed, Duplex, etc... in driver and ethtool -- none ddress issue.

So this card may have other issues altogether.  Trying to hunt down a Windows XP notebook to verify if card is defective or not with vendor included drivers.  If so, will attempt to procure another card.

Comment 2 Ivan Vecera 2008-10-09 12:37:29 UTC
Please try testing kernel available at:
http://people.redhat.com/dzickus/el5/

There is upgraded r8169 driver in version 2.6.18-115 and above. This version practically corresponds with the latest upstream driver.

Comment 3 Bryan J Smith 2008-10-29 22:04:55 UTC
No change with the 5.3 Beta -120.el5 kernel.  It seems to still be PHY issue.  E.g., the port says "MII" (not "TP") and no lights come on.  Various "ethtool -s" settings have no effect.

Again, I need to take the time to test the PCCard unit in a Windows system.
Just don't seem to have an older CardBus system with Windows on it to
test it in.  ;)

I'm still wondering if this hardware unit is buggy, or has support issues with the PHY with even the RealTek provided/updated r8169 driver.  It certainly gets "hot enough" to cause the serial number/barcode ink to be rubbed off.

Being that many other fixes and upstream backports have been made in -120.el5, I'm reducing the Priority/Severity to "low" and marking it CLOSED/DEFERRED for now, awaiting confirmation that my unit is not faulty (which I have not been able to verify under Windows with the included drivers as of yet).