Created attachment 359728 [details] testing results Description of problem: The r8169 driver stops working randomly during gigabit transfers. After putting it into 1000/full the machine will be able to transfer a couple of files (as tested through HTTP). It will stop seemingly after a couple of transfers and afterwards the machine is no longer reachable on the network. The corrective action is to reinsert the module (modprobe -r 8169 && modprobe 8169) to get networking functioning again. If the card is put into 100/full mode manually it works as expected. Version-Release number of selected component (if applicable): [root@polaris ~]# uname -a Linux polaris.XXXXXX.net 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux /sbin/lspci: 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02) /var/log/messages: eth0: RTL8168c/8111c at 0xffffc20000020000, 00:1c:c0:db:16:7b, XID 3c4000c0 IRQ 58 [root@polaris ~]# ethtool -i eth0 driver: r8169 version: 2.3LK-NAPI firmware-version: bus-info: 0000:01:00.0 [root@polaris ~]# modinfo r8169 filename: /lib/modules/2.6.18-164.el5/kernel/drivers/net/r8169.ko version: 2.3LK-NAPI license: GPL description: RealTek RTL-8169 Gigabit Ethernet driver author: Realtek and the Linux r8169 crew <netdev.org> srcversion: 93E4A706F00E0BF1581C38D alias: pci:v00000001d00008168sv*sd00002410bc*sc*i* alias: pci:v00001737d00001032sv*sd00000024bc*sc*i* alias: pci:v000016ECd00000116sv*sd*bc*sc*i* alias: pci:v00001259d0000C107sv*sd*bc*sc*i* alias: pci:v00001186d00004300sv*sd*bc*sc*i* alias: pci:v000010ECd00008169sv*sd*bc*sc*i* alias: pci:v000010ECd00008168sv*sd*bc*sc*i* alias: pci:v000010ECd00008167sv*sd*bc*sc*i* alias: pci:v000010ECd00008136sv*sd*bc*sc*i* alias: pci:v000010ECd00008129sv*sd*bc*sc*i* depends: mii vermagic: 2.6.18-164.el5 SMP mod_unload gcc-4.1 parm: rx_copybreak:Copy breakpoint for copy-only-tiny-frames (int) parm: use_dac:Enable PCI DAC. Unsafe on 32 bit PCI slot. (int) parm: debug:Debug verbosity level (0=none, ..., 16=all) (int) module_sig: 883f3504a8b7cc4bd273d74512bb1124e7d0a089244d8d7722c097c58eed463224543a32891309e377def9c7cce9f7b1d2b20218b3fa738895ecd How reproducible: Steps to Reproduce: 1. Setup server running RHEL with NIC supported by r8169 (8168B) 2. Setup httpd to serve files 3. Server should autoneg to gigabit, but it can be tested by manually setting it with ethool 4. Transfer files to another server using wget on the same switch. Actual results: See attached file for test results Expected results: Server expected to have networking still working after several file transfers
Some updates: After reading: <http://patchwork.kernel.org/patch/13610/> I found that booting the RHEL5 2.6.18-164 kernel with pci=nomsi the system boots and transfers normally.
There is one fix regarding MSI that is not present in 2.6.18-164. Could you please try a test kernel available at: http://people.redhat.com/dzickus/el5/169.el5/ Thanks, Ivan.
Brent, any results?
Brent, could you please report if the proposed kernels for 5.5 are solving the issue? The latest are at: http://people.redhat.com/dzickus/el5/175.el5/
Ivan, I stress tested the 5.5 kernel and it performed perfectly. Here's the kernel information: [root@polaris ~]# uname -a Linux polaris.descension.net 2.6.18-175.el5 #1 SMP Fri Nov 20 19:32:16 EST 2009 x86_64 x86_64 x86_64 GNU/Linux From a second server I executed the following command to simulate 1000 downloads of a 87mb file off of the server running the new kernel: for i in `seq 1 10000`; do wget -nv --delete-after --append-output=polaris-175_stress-test.log http://polaris/rhel5/images/stage2.img; done It performed well and I did not have a single failure, where before with the RHEL 5.4 kernel it would fail after just a few downloads of the same file. I have attached the log to this bug report. Thanks!
Created attachment 373444 [details] kernel 2.6.18-175 stress testing
This upstream commit solves this issue: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f11a377b3f4e897d11f0e8d1fc688667e2f19708 This one was also used for solving bug #514589, so closing this one as a duplicate. *** This bug has been marked as a duplicate of bug 514589 ***