Bug 521132 - r8169 driver fails during gigabit transfer mode
Summary: r8169 driver fails during gigabit transfer mode
Keywords:
Status: CLOSED DUPLICATE of bug 514589
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Ivan Vecera
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-03 19:09 UTC by Brent Holden
Modified: 2009-11-24 19:01 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-11-24 19:01:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
testing results (11.47 KB, text/plain)
2009-09-03 19:09 UTC, Brent Holden
no flags Details
kernel 2.6.18-175 stress testing (100.59 KB, application/octet-stream)
2009-11-24 14:43 UTC, Brent Holden
no flags Details

Description Brent Holden 2009-09-03 19:09:03 UTC
Created attachment 359728 [details]
testing results

Description of problem:

The r8169 driver stops working randomly during gigabit transfers.  After putting it into 1000/full the machine will be able to transfer a couple of files (as tested through HTTP).  It will stop seemingly after a couple of transfers and afterwards the machine is no longer reachable on the network.  The corrective action is to reinsert the module (modprobe -r 8169 && modprobe 8169) to get networking functioning again.

If the card is put into 100/full mode manually it works as expected.


Version-Release number of selected component (if applicable):

[root@polaris ~]# uname -a
Linux polaris.XXXXXX.net 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

/sbin/lspci:
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

/var/log/messages:
eth0: RTL8168c/8111c at 0xffffc20000020000, 00:1c:c0:db:16:7b, XID 3c4000c0 IRQ 58

[root@polaris ~]# ethtool -i eth0
driver: r8169
version: 2.3LK-NAPI
firmware-version: 
bus-info: 0000:01:00.0

[root@polaris ~]# modinfo r8169
filename:       /lib/modules/2.6.18-164.el5/kernel/drivers/net/r8169.ko
version:        2.3LK-NAPI
license:        GPL
description:    RealTek RTL-8169 Gigabit Ethernet driver
author:         Realtek and the Linux r8169 crew <netdev.org>
srcversion:     93E4A706F00E0BF1581C38D
alias:          pci:v00000001d00008168sv*sd00002410bc*sc*i*
alias:          pci:v00001737d00001032sv*sd00000024bc*sc*i*
alias:          pci:v000016ECd00000116sv*sd*bc*sc*i*
alias:          pci:v00001259d0000C107sv*sd*bc*sc*i*
alias:          pci:v00001186d00004300sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008169sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008168sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008167sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008136sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008129sv*sd*bc*sc*i*
depends:        mii
vermagic:       2.6.18-164.el5 SMP mod_unload gcc-4.1
parm:           rx_copybreak:Copy breakpoint for copy-only-tiny-frames (int)
parm:           use_dac:Enable PCI DAC. Unsafe on 32 bit PCI slot. (int)
parm:           debug:Debug verbosity level (0=none, ..., 16=all) (int)
module_sig:	883f3504a8b7cc4bd273d74512bb1124e7d0a089244d8d7722c097c58eed463224543a32891309e377def9c7cce9f7b1d2b20218b3fa738895ecd


How reproducible:


Steps to Reproduce:
1. Setup server running RHEL with NIC supported by r8169 (8168B)
2. Setup httpd to serve files
3. Server should autoneg to gigabit, but it can be tested by manually setting it with ethool
4. Transfer files to another server using wget on the same switch.

  
Actual results:

See attached file for test results


Expected results:

Server expected to have networking still working after several file transfers

Comment 1 Brent Holden 2009-09-08 21:37:48 UTC
Some updates:

After reading: <http://patchwork.kernel.org/patch/13610/>

I found that booting the RHEL5 2.6.18-164 kernel with pci=nomsi the system boots and transfers normally.

Comment 2 Ivan Vecera 2009-10-13 16:18:34 UTC
There is one fix regarding MSI that is not present in 2.6.18-164. Could you please try a test kernel available at:
http://people.redhat.com/dzickus/el5/169.el5/

Thanks, Ivan.

Comment 3 Ivan Vecera 2009-11-05 11:46:56 UTC
Brent, any results?

Comment 4 Ivan Vecera 2009-11-24 09:30:33 UTC
Brent, could you please report if the proposed kernels for 5.5 are solving the issue?
The latest are at:
http://people.redhat.com/dzickus/el5/175.el5/

Comment 5 Brent Holden 2009-11-24 14:42:30 UTC
Ivan,

I stress tested the 5.5 kernel and it performed perfectly.  Here's the kernel information:

[root@polaris ~]# uname -a
Linux polaris.descension.net 2.6.18-175.el5 #1 SMP Fri Nov 20 19:32:16 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

From a second server I executed the following command to simulate 1000 downloads of a 87mb file off of the server running the new kernel:
for i in `seq 1 10000`; do wget -nv --delete-after --append-output=polaris-175_stress-test.log http://polaris/rhel5/images/stage2.img; done

It performed well and I did not have a single failure, where before with the RHEL 5.4 kernel it would fail after just a few downloads of the same file.  I have attached the log to this bug report.

Thanks!

Comment 6 Brent Holden 2009-11-24 14:43:30 UTC
Created attachment 373444 [details]
kernel 2.6.18-175 stress testing

Comment 7 Ivan Vecera 2009-11-24 19:01:49 UTC
This upstream commit solves this issue:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f11a377b3f4e897d11f0e8d1fc688667e2f19708

This one was also used for solving bug #514589, so closing this one as a duplicate.

*** This bug has been marked as a duplicate of bug 514589 ***


Note You need to log in before you can comment on or make changes to this bug.