Bug 514589
Summary: | r8169 stopping all activity until the link is reset | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Simon Matter <simon.matter> | ||||
Component: | kernel | Assignee: | Ivan Vecera <ivecera> | ||||
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 5.4 | CC: | bholden, dhoward, dzickus, jpirko, jplans, lancelassetter, lemenkov, mhlavink, parasitaliya, qcai | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-03-30 06:53:44 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 529366, 533192 | ||||||
Attachments: |
|
Description
Simon Matter
2009-07-29 18:39:28 UTC
Created attachment 355605 [details]
avoid dead link on r8169
The patched kernel works but I can not yet confirm that the bug is gone because it doesn't happen very often in my case.
I'd like to confirm that I didn't see any error again with this patch after moving some TB of data through it on my test box. Also another computer which has shown errors almost daily has not shown any errors again since installing the new kernel 4 days ago. That was I real show stopper for us on all RTL8168 NICs which is widely used on Atom based system these days. Packages are located at: http://people.redhat.com/ivecera/rhel-5-ivtest/ Simon, could you please test them? Any chance you could post a i686 build there? The boxes in question are Atom N270 based and we run them on 32bit (I'm not even sure they could run x86_64). Regards, Simon No problem Simon, I will post it ASAP. Simon, i686 packages are also there. Could you please test them? Ivan, it doesn't seem to work. I tried to make the link stop by sending large amount of data trough it. While speed is usually as expected the transfer stops after some activity and will resume later. The time used to transfer 10G of data is ~3 times higher than what it should be. I have tested with 2.6.18-162.el5, 2.6.18-162.el5.ivtest.1 and 2.6.18-160 and they all show the same issue, while 2.6.18-160.invoca1.el5 performs fine. Actual results: (running 2.6.18-162.el5.ivtest.1) [root@client140 ~]# dd if=/dev/zero bs=1024k count=10240 > /dev/tcp/delta64/7777 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 677.99 seconds, 15.8 MB/s [root@client140 ~]# dd if=/dev/zero bs=1024k count=10240 > /dev/tcp/delta64/7777 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 512.643 seconds, 20.9 MB/s [root@client140 ~]# dd if=/dev/zero bs=1024k count=10240 > /dev/tcp/delta64/7777 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 478.893 seconds, 22.4 MB/s Expected results: (running 2.6.18-160.invoca1.el5) [root@client140 ~]# dd if=/dev/zero bs=1024k count=10240 > /dev/tcp/delta64/7777 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 146.771 seconds, 73.2 MB/s [root@client140 ~]# dd if=/dev/zero bs=1024k count=10240 > /dev/tcp/delta64/7777 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 146.289 seconds, 73.4 MB/s [root@client140 ~]# dd if=/dev/zero bs=1024k count=10240 > /dev/tcp/delta64/7777 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 143.643 seconds, 74.8 MB/s I know the bug I posted was not about performance but about dead link. But they seem related because with my patch in 2.6.18-160.invoca1.el5 speed and "dead link" are fine. I can confirm that both issues are related. Just got a call from one of our users where I installed the 2.6.18-162.el5.ivtest.1 kernel and the following logs showed up today: Aug 17 08:08:13 dhcp-1-149 kernel: NETDEV WATCHDOG: eth0: transmit timed out Aug 17 08:08:13 dhcp-1-149 kernel: r8169: eth0: link up Aug 17 08:11:26 dhcp-1-149 dhclient: DHCPREQUEST on eth0 to 192.168.1.10 port 67 Aug 17 08:18:31 dhcp-1-149 kernel: NETDEV WATCHDOG: eth0: transmit timed out Aug 17 08:18:31 dhcp-1-149 kernel: r8169: eth0: link up Aug 17 13:17:18 dhcp-1-149 dhclient: DHCPREQUEST on eth0 to 192.168.1.10 port 67 Aug 17 15:03:01 dhcp-1-149 kernel: NETDEV WATCHDOG: eth0: transmit timed out Aug 17 15:03:01 dhcp-1-149 kernel: r8169: eth0: link up BTW, my description of the "dead link" is not always correct. Sometimes the link somehow slows down but doesn't get dead. Maybe that's what happened in my tests shown above. Note that the exact same happens with unpatched 2.6.18-162.el5. Simon, there are new packages (2.6.18-164...) at: http://people.redhat.com/ivecera/rhel-5-ivtest/ Could you please test them? Hi Ivan, 2.6.18-164.el5.ivtest.1 works fine. It shows exactly the same behavior like my own patched 2.6.18-160.invoca1.el5. [root@client140 ~]# uname -a Linux client140.bi.corp.invoca.ch 2.6.18-164.el5.ivtest.1 #1 SMP Mon Aug 24 11:18:49 EDT 2009 i686 i686 i386 GNU/Linux [root@client140 ~]# dd if=/dev/zero bs=1024k count=10240 > /dev/tcp/delta64/7777 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 146.228 seconds, 73.4 MB/s [root@client140 ~]# dd if=/dev/zero bs=1024k count=10240 > /dev/tcp/delta64/7777 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 149.719 seconds, 71.7 MB/s [root@client140 ~]# dd if=/dev/zero bs=1024k count=10240 > /dev/tcp/delta64/7777 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 148.641 seconds, 72.2 MB/s I hope this patch will make it into 5.4 as well as current 5.3. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. in kernel-2.6.18-169.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified. kernel-2.6.18-169.el5 performs well in my tests: [root@client140 ~]# dd if=/dev/zero bs=1024k count=10240 > /dev/tcp/delta64/7777 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 148.176 seconds, 72.5 MB/s *** Bug 521132 has been marked as a duplicate of this bug. *** An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html I can confirm this bug on Fedora 18 testing. I fixed with adding to Grub bootloader: clocksource=acpi_pm It seems for me it was an AMD PowerNow and timing issue with power management and Linux. Lance |