Bug 719449 - RTL8168c/8111c NIC (r8169 kernel driver) requires much shorter MTU after upgrade to F15
Summary: RTL8168c/8111c NIC (r8169 kernel driver) requires much shorter MTU after upgr...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: John W. Linville
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-06 21:46 UTC by Jonathan Kamens
Modified: 2012-09-04 13:44 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2012-09-04 13:44:56 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
RX FIFO overflow fixes (4.85 KB, patch)
2011-12-05 06:39 UTC, Francois Romieu
no flags Details | Diff
Indexes races (2.64 KB, patch)
2011-12-05 06:40 UTC, Francois Romieu
no flags Details | Diff

Description Jonathan Kamens 2011-07-06 21:46:03 UTC
I have an RTL8168c/8111c on-board NIC on my motherboard. From dmesg on boot:

[    8.831748] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    8.832161] r8169 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[    8.832492] r8169 0000:02:00.0: setting latency timer to 64
[    8.832541] r8169 0000:02:00.0: irq 44 for MSI/MSI-X
[    8.832682] r8169 0000:02:00.0: eth0: RTL8168c/8111c at 0xffffc90012660000, 90:e6:ba:ba:47:2e, XID 1c4000c0 IRQ 44

Immediately after I upgraded from F14 to F15, my network performance went totally to hell. After various troubleshooting attempts, I discovered that if I reduced the MTU from 1500, the default, to 1000, my network performance went back up to an acceptable level.

I then booted back into the F14 kernel to confirm that this was a kernel issue. Various other things didn't work in that kernel because of the kernel changes in F15, but it was good enough to be able to boot, and when booted into the F14 kernel, my network performance was good even with my MTU set to 1500.

Note: There's no switch involved -- I'm plugged directly into my cable modem. I tested with a different ethernet cable and it didn't make any difference. I tested plugging my Windows 7 laptop directly into the cable modem, and it worked just fine. So it seems fairly clear that there's something up with the new kernel.

Comment 1 Jonathan Kamens 2011-07-06 22:08:11 UTC
I actually have two NICs on my PC, since it serves as the router / firewall for my home network. I just swapped the configs and cables on the two NICs, and lo and behold, when the other NIC is plugged into the cable modem, the MTU problem goes away, thus further confirming that there is a problem in the kernel with the particular NIC mentioned above.

Comment 2 Raymond Rodgers 2011-09-28 15:20:54 UTC
I just built a brand new system using the Asus Sabertooth 990FX motherboard which includes the Realtek 8111E as the on-board NIC and I've been getting terrible performance out of it under a fresh install of Fedora 15. My typical download speed under Windows and on Fedora 15 on my old system (different NIC) would be in excess of 500KB/s, often reaching 1-2MB/second. With the MTU set to automatic on this new system with the 8111E, I was lucky to get 100KB/second, would frequently see my connection to my router drop altogether, and even had problems just maintaining a connection to my IM providers. I just adjusted my MTU to 1000 per Jonathan's comments, and my performance improved significantly!

Comment 3 Raymond Rodgers 2011-10-06 16:50:50 UTC
(In reply to comment #2)
> I just built a brand new system using the Asus Sabertooth 990FX motherboard
> which includes the Realtek 8111E as the on-board NIC and I've been getting
> terrible performance out of it under a fresh install of Fedora 15. My typical
> download speed under Windows and on Fedora 15 on my old system (different NIC)
> would be in excess of 500KB/s, often reaching 1-2MB/second. With the MTU set to
> automatic on this new system with the 8111E, I was lucky to get 100KB/second,
> would frequently see my connection to my router drop altogether, and even had
> problems just maintaining a connection to my IM providers. I just adjusted my
> MTU to 1000 per Jonathan's comments, and my performance improved significantly!

To clarify this comment, with the default MTU, not only would my transfer rate be significantly less than 100KB/second but my connection even to my own router would be disrupted and I wouldn't even be able to ping it. Since setting the MTU to 1000, my connection has at least remained stable, but my transfer rate is somewhat variable: sometimes I can get as much as 200KB/second, while other times I crawl along at 6-15 KB/second, which makes even updating Fedora painful. It averages out to be better than the performance of the default setting, but I shouldn't be seeing anything less, on average, than 400-500 KB/second with typical performance in the 1-2 MB/second range.

Comment 4 Raymond Rodgers 2011-10-07 02:17:34 UTC
I downloaded the Fedora 16 beta live image and tested it from a flash drive and the issue has been resolved in F16; I managed to get 1-3 MB/second data transfers and the connection remained stable. Please back port the F16 driver to F15!

Comment 5 John W. Linville 2011-10-17 15:13:56 UTC
What kernel version is giving the performance problems on F15?

Comment 6 Jonathan Kamens 2011-10-20 01:06:47 UTC
This is the version I encountered the problem in:

Jul 06 09:47:24 Installed: kernel-2.6.38.8-32.fc15.x86_64
Jul 06 10:00:23 Installed: kernel-devel-2.6.38.8-32.fc15.x86_64
Jul 06 10:10:08 Updated: kernel-headers-2.6.38.8-32.fc15.x86_64
Jul 06 10:49:27 Updated: kernel-doc-2.6.38.8-32.fc15.noarch

Comment 7 Raymond Rodgers 2011-10-20 13:12:44 UTC
I'm also seeing it in 2.6.40.4-5.fc15.x86_64 and 2.6.40.6-0.fc15.x86_64.

Comment 8 Francois Romieu 2011-11-19 11:32:45 UTC
(In reply to comment #7)
> I'm also seeing it in 2.6.40.4-5.fc15.x86_64 and 2.6.40.6-0.fc15.x86_64.

I wonder what the specific version of your chipset is. A 8168evl - as opposed
opposed to a pure 8168e - can not work with this kernel version. The firmware
version may make a difference too (it is not strictly required... until it is).

Can you grep for a r8169 'XID' line in dmesg and send the output of 'ethtool -i'
for your NIC ?

Thanks.

-- 
Ueimor

Comment 9 Raymond Rodgers 2011-11-28 01:47:03 UTC
From Fedora 16 where everything seems to be fine:

dmesg grep:
r8169 0000:09:00.0: eth0: RTL8168evl/8111evl at 0xffffc9001278c000, 14:da:e9:21:2a:a4, XID 0c900800 IRQ 88

ethtool:
driver: r8169
version: 2.3LK-NAPI
firmware-version: rtl_nic/rtl8168e-3.fw
bus-info: 0000:09:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes

Comment 10 Francois Romieu 2011-12-01 08:59:00 UTC
(In reply to comment #9)
> From Fedora 16 where everything seems to be fine:
> 
> dmesg grep:
> r8169 0000:09:00.0: eth0: RTL8168evl/8111evl at 0xffffc9001278c000,
> 14:da:e9:21:2a:a4, XID 0c900800 IRQ 88

8168evl support was included in kernel version between v3.0 and v3.1 and
F15 kernel is 3.0-stable based. So everything behaves as expected.

A backport would not be hard but I have no free time for it. Using a F16
kernel is the easier option imho.

Jonathan's bug is a different story : he owns a 8168c. The version of the
last known working F14 kernel would be really welcome since his bug qualifies
as a regression.

-- 
Ueimor

Comment 11 Josh Boyer 2011-12-01 12:47:10 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > From Fedora 16 where everything seems to be fine:
> > 
> > dmesg grep:
> > r8169 0000:09:00.0: eth0: RTL8168evl/8111evl at 0xffffc9001278c000,
> > 14:da:e9:21:2a:a4, XID 0c900800 IRQ 88
> 
> 8168evl support was included in kernel version between v3.0 and v3.1 and
> F15 kernel is 3.0-stable based. So everything behaves as expected.
> 
> A backport would not be hard but I have no free time for it. Using a F16
> kernel is the easier option imho.

F15 is now at 2.6.41.x, which is based on the 3.1-stable series.  If whatever support you are discussing was indeed included between 3.0 and 3.1, the latest F15 update kernel should already have it.

FYI.

Comment 12 Francois Romieu 2011-12-05 06:39:38 UTC
Created attachment 540648 [details]
RX FIFO overflow fixes

Comment 13 Francois Romieu 2011-12-05 06:40:23 UTC
Created attachment 540649 [details]
Indexes races

Comment 14 Francois Romieu 2011-12-05 06:41:40 UTC
Jonathan, can you try the attached patches against a recent -rc kernel ?

They should help with your 8168c.

Thanks.

-- 
Ueimor

Comment 15 Jonathan Kamens 2011-12-05 15:48:25 UTC
Build me an x86_64 kernel RPM and I'll try it. I don't have time to build the kernel myself.

Note that I'm on F16 now, so it'll have to be an F16-based RPM.

Note that it's rather a time-consuming pain for me to do this, even independent of the kernel, since I have to reconfigure my system to use the broken NIC (which I'm totally not using right now -- I replaced it with a PCI card that doesn't have the problem), and since I have to deal with RCN on the phone to convince them that I'm not trying to run multiple PCs on my home network (since I'm only allowed one and my MAC will change when I try the NIC).

So please be Pretty Darn Sure that this is going to fix the problem before asking me to expend a lot of time and hassle testing it out. Thanks.

Comment 16 Alec Leamas 2011-12-08 09:17:42 UTC
Possibly OT, a 8168d is showing similar symptoms after upgrading directly  from F14->F16: sluggish performance, dl speeds are ~10 kB/s, should be (was) 1-3 MB/s. Upload speeds are OK.

As for Jonatan, adjusting MTU to 1000 makes speed acceptable, ~400kB/s, but still slower than it used to and should be. My interface is bridged, so I need to adjust the MTU both for the enslaved interface and the bridge.

Tried to apply patches to current F16 kernel 3.1.4-1, but there is no change, dl speed still ~10kB/s with MTU=1500 and ~350kB/s with MTU=1000.

Rebuilt SRPM: 
    ftp://mumin.dnsalias.net/pub/kernel-3.1.4-1.rhbz719449.fc16.src.rpm

dmesg:
[   14.057641] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[   14.057693] r8169 0000:04:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[   14.057750] r8169 0000:04:00.0: setting latency timer to 64
[   14.057821] r8169 0000:04:00.0: irq 58 for MSI/MSI-X
[   14.058335] r8169 0000:04:00.0: eth0: RTL8168d/8111d at 0xf8238000, bc:ae:c5:b5:1e:56, XID 083000c0 IRQ 58

Is it meaningful to try the rawhide 3.2.0 kernel in this context?

Comment 17 Alec Leamas 2011-12-19 11:41:32 UTC
Updated to 3.1.5-6.fc16.i686.PAE, no change.

However, doing a complete power cycle (not just reset) makes my speeds perfectly OK. I have download speed == upload speed == 50-60 mbit/s which is what's expected. This is with mtu=1000.

Comment 18 Josh Boyer 2012-06-06 15:54:28 UTC
Is this still a problem with the 3.3 kernel in F16 at the moment?


Note You need to log in before you can comment on or make changes to this bug.