Bug 1671958 - Slow RX speed with RTL8168
Summary: Slow RX speed with RTL8168
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 29
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-02 10:56 UTC by Tonino
Modified: 2019-12-04 06:46 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-17 20:01:43 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Boot log (349.01 KB, text/plain)
2019-02-05 06:44 UTC, Tonino
no flags Details
Boot log with 4.18 (345.37 KB, text/plain)
2019-02-05 11:47 UTC, Tonino
no flags Details
screenshot showing differences in register dumps using "meld" (90.38 KB, image/png)
2019-02-05 19:23 UTC, Steve
no flags Details
9999-patch-rtl_hw_start_8168h_1.patch applies to 4.20.6 (830 bytes, patch)
2019-02-06 11:33 UTC, Steve
no flags Details | Diff
lspci of affected system running a99790bf5c7f + comment 69 patch (31.70 KB, text/plain)
2019-04-02 04:09 UTC, Alex Williamson
no flags Details
lspci with comment 74 patch (31.79 KB, text/plain)
2019-04-04 22:53 UTC, Alex Williamson
no flags Details

Description Tonino 2019-02-02 10:56:21 UTC
1. Please describe the problem:
   The RX speed of the GbE is limited to about 15Mb/s, while the TX is up to ~1Gb/s. On Windows10 everything works as expected, so no hardware issue.

2. What is the Version-Release number of the kernel:
   It's been like this since I received this PC, a couple of months ago, and no kernel update made any difference. I'm currently running 4.20.4-200.fc29.x86_64

       product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
       vendor: Realtek Semiconductor Co., Ltd.
       physical id: 0
       bus info: pci@0000:01:00.0
       logical name: enp2s0
       version: 15
       size: 1Gbit/s
       capacity: 1Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress msix bus_master cap_list ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=r8169 duplex=full firmware=rtl8168h-2_0.0.2 02/26/15 ip=10.1.1.21 latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
       resources: irq:23 ioport:e000(size=256) memory:a1104000-a1104fff memory:a1100000-a1103fff

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
   It never worked better than this.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
   It's always there.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
   The install failed due to mismatched keys.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
   No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.
   There's nothing in the log. It's a performance issue.

Comment 1 Steve 2019-02-04 04:01:37 UTC
>    The RX speed of the GbE is limited to about 15Mb/s, while the TX is up to
> ~1Gb/s. On Windows10 everything works as expected, so no hardware issue.

How are you measuring those speeds?

Comment 2 Steve 2019-02-04 05:29:49 UTC
> On Windows10 everything works as expected, so no hardware issue.

OK, but could you confirm that the problem persists with a new Cat 5e or Cat 6 cable?

And for completeness, what hardware is on the other end?

Comment 3 Tonino 2019-02-04 08:59:36 UTC
I played with cables for a while when I noticed the problem, since it was my first thought, but it seems that cables have nothing to do with it. On Windows10 I can see ~1Gb/s both ways, whatever cable I use, probably also because they are very short (0.5m).
On the other side there's a hub for fiber broadband.

Comment 4 Heiner Kallweit 2019-02-04 11:46:22 UTC
Some more info would be needed:
- full dmesg log
- iperf results
- statistics (to check dropped and / or missed rx packets)
  - from tool ip
  - adapter statistics via "ethtool -S <if>"

And we need the info whether it's a regression: Please test previous kernel versions (especially 4.18).

Comment 5 Steve 2019-02-04 14:02:13 UTC
(In reply to Tonino from comment #3)
> I played with cables for a while when I noticed the problem, since it was my
> first thought, but it seems that cables have nothing to do with it. On
> Windows10 I can see ~1Gb/s both ways, whatever cable I use, probably also
> because they are very short (0.5m).
> On the other side there's a hub for fiber broadband.

OK. Heiner is asking for several reports. Could you start by attaching the output from:

$ journalctl -b --no-hostname > journalctl-1.log

For the others, you need to generate some network traffic. How are you measuring your networking speeds?

Comment 6 Steve 2019-02-04 15:18:14 UTC
(In reply to Heiner Kallweit from comment #4)
...
> - iperf results
...

I haven't used "iperf" before. Could you suggest a configuration for using it?

1. For a first test, would it be sufficient to run the client and the server on the same machine?
2. Are there any specific command-line options that you recommend?

Tonino: You may need to install the "iperf" package on one or two machines. Do you have a second machine on your local network that you can use as an "iperf" server?

Comment 7 Heiner Kallweit 2019-02-04 16:34:45 UTC
Best use iperf3, it's available for Windows as well. iperf3 client and server should be on different machines.
There's not many command line options. On one machine start the server, and on the other machine the client for either UDP or TCP.
Then swap the roles to test both directions.

Comment 8 Steve 2019-02-04 17:18:47 UTC
(In reply to Heiner Kallweit from comment #7)
> Best use iperf3, it's available for Windows as well. iperf3 client and
> server should be on different machines.
> There's not many command line options. On one machine start the server, and
> on the other machine the client for either UDP or TCP.
> Then swap the roles to test both directions.

Thanks.

$ rpm -q iperf3
iperf3-3.6-3.fc29.x86_64

Tonino: For a practice run with one machine, start iperf3 as a server in a terminal window:
$ iperf3 -s

In a separate terminal window, run iperf3 as a client:
$ iperf3 -c localhost

For a test run with two machines, you may need to open the firewall for port 5201 on both machines.

The port number can be changed with the "-p" option.

Documentation: "man iperf3"

Comment 9 Tonino 2019-02-05 06:32:56 UTC
Here is what I managed to do so far:

-------------------------------------------------------------------------- Before RX iperf3:
        RX packets 10782745  bytes 11307125256 (10.5 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5125807  bytes 3754328043 (3.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
NIC statistics:
     tx_packets: 5125824
     rx_packets: 10782764
     tx_errors: 0
     rx_errors: 0
     rx_missed: 42749
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 10544459
     broadcast: 91164
     multicast: 147141
     tx_aborted: 0
     tx_underrun: 0
------------------------------------------------------------------------- iperf3 -s:
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  7.13 MBytes  59.8 Mbits/sec                  
[  5]   1.00-2.00   sec  7.31 MBytes  61.3 Mbits/sec                  
[  5]   2.00-3.00   sec  7.10 MBytes  59.6 Mbits/sec                  
[  5]   3.00-4.00   sec  7.18 MBytes  60.2 Mbits/sec                  
[  5]   4.00-5.00   sec  7.07 MBytes  59.3 Mbits/sec                  
[  5]   5.00-6.00   sec  7.04 MBytes  59.0 Mbits/sec                  
[  5]   6.00-7.00   sec  7.44 MBytes  62.4 Mbits/sec                  
[  5]   7.00-8.00   sec  6.85 MBytes  57.4 Mbits/sec                  
[  5]   8.00-9.00   sec  6.99 MBytes  58.7 Mbits/sec                  
[  5]   9.00-10.00  sec  7.32 MBytes  61.4 Mbits/sec                  
[  5]  10.00-10.03  sec   221 KBytes  53.8 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.03  sec  71.6 MBytes  59.9 Mbits/sec                  receiver
-------------------------------------------------------------------------- After RX iperf3:
        RX packets 10835315  bytes 11385792077 (10.6 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5132635  bytes 3754797072 (3.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
NIC statistics:
     tx_packets: 5132637
     rx_packets: 10835404
     tx_errors: 0
     rx_errors: 0
     rx_missed: 44549
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 10597044
     broadcast: 91170
     multicast: 147190
     tx_aborted: 0
     tx_underrun: 0
------------------------------------------------------------------------- iperf3 -c:
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   107 MBytes   895 Mbits/sec    0    293 KBytes       
[  5]   1.00-2.00   sec   113 MBytes   944 Mbits/sec    0    307 KBytes       
[  5]   2.00-3.00   sec   112 MBytes   940 Mbits/sec    0    324 KBytes       
[  5]   3.00-4.00   sec   112 MBytes   937 Mbits/sec    0    341 KBytes       
[  5]   4.00-5.00   sec   112 MBytes   937 Mbits/sec    0    359 KBytes       
[  5]   5.00-6.00   sec   112 MBytes   937 Mbits/sec    0    379 KBytes       
[  5]   6.00-7.00   sec   110 MBytes   925 Mbits/sec    0    397 KBytes       
[  5]   7.00-8.00   sec   112 MBytes   939 Mbits/sec    0    397 KBytes       
[  5]   8.00-9.00   sec   112 MBytes   943 Mbits/sec    0    430 KBytes       
[  5]   9.00-10.00  sec   113 MBytes   944 Mbits/sec    0    430 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.09 GBytes   934 Mbits/sec    0             sender
[  5]   0.00-10.03  sec  1.09 GBytes   929 Mbits/sec                  receiver
-------------------------------------------------------------------------- After TX iperf3:
        RX packets 10996783  bytes 11396486000 (10.6 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5937806  bytes 4973690140 (4.6 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
NIC statistics:
     tx_packets: 5937812
     rx_packets: 10996788
     tx_errors: 0
     rx_errors: 0
     rx_missed: 44549
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 10758398
     broadcast: 91178
     multicast: 147212
     tx_aborted: 0
     tx_underrun: 0

I'll reboot it to give you a log with less junk. If you can point me to an older kernel I can get with dnf without having to mess around, it would be helpful.

Comment 10 Tonino 2019-02-05 06:44:53 UTC
Created attachment 1527030 [details]
Boot log

Comment 11 Tonino 2019-02-05 06:48:16 UTC
I forgot to mention that the test above was performed using a second fc29 machine (it's not the practice run on loopback, which gives 20Gb/s, just for info)

Comment 12 Steve 2019-02-05 10:34:01 UTC
> If you can point me to an older kernel I can get with dnf without having to mess around, it would be helpful.

kernel 4.18.16-300 is in the "fedora" repo:

# dnf -q repoquery kernel --repo=fedora
kernel-0:4.18.16-300.fc29.x86_64

Install with:

# dnf --setopt=installonly_limit=0 install kernel-4.18.16-300.fc29.x86_64

(The installonly_limit=0 option stops dnf from removing older kernels. See also: /etc/dnf/dnf.conf.)

NB: kernel 4.18.16-300 will be listed first in the grub2 menu.

Tested in an F29 VM.

Comment 13 Steve 2019-02-05 11:15:05 UTC
The iperf3 "-R" option might simplify running tests:

$ man iperf3
...
CLIENT SPECIFIC OPTIONS
...
       -R, --reverse
              reverse the direction of a test, so that the server sends data to the client
...

$ iperf3 -R -c localhost
Connecting to host localhost, port 5201
Reverse mode, remote host localhost is sending
...

Comment 14 Tonino 2019-02-05 11:44:30 UTC
It seems that 4.18 is much better indeed (sorry for the reverse order of the test)

-------------------------------------------------------------------------- Before TX iperf3:
        RX packets 4587  bytes 3298785 (3.1 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1761  bytes 308183 (300.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
NIC statistics:
     tx_packets: 1799
     rx_packets: 4635
     tx_errors: 0
     rx_errors: 0
     rx_missed: 0
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 4568
     broadcast: 31
     multicast: 36
     tx_aborted: 0
     tx_underrun: 0
------------------------------------------------------------------------- iperf3 -c:
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   114 MBytes   952 Mbits/sec    0    267 KBytes       
[  5]   1.00-2.00   sec   112 MBytes   940 Mbits/sec    0    280 KBytes       
[  5]   2.00-3.00   sec   113 MBytes   945 Mbits/sec    0    280 KBytes       
[  5]   3.00-4.00   sec   112 MBytes   943 Mbits/sec    0    293 KBytes       
[  5]   4.00-5.00   sec   112 MBytes   937 Mbits/sec    0    307 KBytes       
[  5]   5.00-6.00   sec   112 MBytes   942 Mbits/sec    0    307 KBytes       
[  5]   6.00-7.00   sec   112 MBytes   942 Mbits/sec    0    324 KBytes       
[  5]   7.00-8.00   sec   112 MBytes   940 Mbits/sec    0    324 KBytes       
[  5]   8.00-9.00   sec   112 MBytes   940 Mbits/sec    2    273 KBytes       
[  5]   9.00-10.00  sec   112 MBytes   940 Mbits/sec   10    257 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec   12             sender
[  5]   0.00-10.04  sec  1.10 GBytes   938 Mbits/sec                  receiver
-------------------------------------------------------------------------- After TX iperf3:
        RX packets 174810  bytes 16952113 (16.1 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 815821  bytes 1230948463 (1.1 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
NIC statistics:
     tx_packets: 815812
     rx_packets: 174794
     tx_errors: 0
     rx_errors: 0
     rx_missed: 0
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 174697
     broadcast: 39
     multicast: 58
     tx_aborted: 0
     tx_underrun: 0
------------------------------------------------------------------------- iperf3 -s:
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  86.4 MBytes   725 Mbits/sec                  
[  5]   1.00-2.00   sec  89.6 MBytes   751 Mbits/sec                  
[  5]   2.00-3.00   sec  89.6 MBytes   751 Mbits/sec                  
[  5]   3.00-4.00   sec  89.6 MBytes   751 Mbits/sec                  
[  5]   4.00-5.00   sec  89.6 MBytes   751 Mbits/sec                  
[  5]   5.00-6.00   sec  89.6 MBytes   751 Mbits/sec                  
[  5]   6.00-7.00   sec  89.6 MBytes   751 Mbits/sec                  
[  5]   7.00-8.00   sec  89.5 MBytes   751 Mbits/sec                  
[  5]   8.00-9.00   sec  89.6 MBytes   751 Mbits/sec                  
[  5]   9.00-10.00  sec  89.6 MBytes   751 Mbits/sec                  
[  5]  10.00-10.03  sec  3.13 MBytes   751 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.03  sec   896 MBytes   749 Mbits/sec                  receiver
-------------------------------------------------------------------------- After RX iperf3:
        RX packets 823757  bytes 999154432 (952.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 877419  bytes 1235024236 (1.1 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
NIC statistics:
     tx_packets: 877422
     rx_packets: 823760
     tx_errors: 0
     rx_errors: 0
     rx_missed: 0
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 823632
     broadcast: 43
     multicast: 85
     tx_aborted: 0
     tx_underrun: 0

It's not "perfect", but it's definitely much better than with kernel 4.20.

Comment 15 Tonino 2019-02-05 11:47:51 UTC
Created attachment 1527124 [details]
Boot log with 4.18

Comment 16 Heiner Kallweit 2019-02-05 16:47:14 UTC
OK, thanks. Could you please compare a register dump (ethtool -d <if>) from 4.18 and 4.20 ?
Then you could try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the end of rtl_hw_start_8168h_1() being disabled.
If this doesn't show an improvement then a bisect is needed. Can you do this (requires little git experience and you have to build kernels)?

Comment 17 Tonino 2019-02-05 17:35:36 UTC
-----------------------------------------------------------------------4.18:
Unknown RealTek chip (TxConfig: 0x57100f80)
Offset		Values
------		------
0x0000:		84 39 be 95 00 b2 00 00 40 01 40 10 80 00 80 00 
0x0010:		00 c0 1d 70 02 00 00 00 a9 0c 46 00 00 00 00 00 
0x0020:		00 c0 14 54 02 00 00 00 00 00 00 00 00 00 00 00 
0x0030:		00 00 00 00 00 00 00 0c 00 00 00 00 3f 80 00 00 
0x0040:		80 0f 10 57 0e cf 02 00 00 00 00 00 00 00 00 00 
0x0050:		10 00 cf 3c 60 11 02 01 00 00 00 00 00 00 00 00 
0x0060:		00 00 00 00 ec 10 23 01 2c f0 00 80 93 00 c0 f0 
0x0070:		00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 00 00 
0x0080:		8b 06 01 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x0090:		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x00a0:		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x00b0:		7f 04 00 00 00 00 00 00 e1 c1 05 d2 00 00 00 00 
0x00c0:		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x00d0:		21 00 04 32 0e 00 00 00 00 00 00 40 c5 4a fd 00 
0x00e0:		e1 20 51 51 00 f0 71 55 02 00 00 00 27 00 00 00 
0x00f0:		3f 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 
-----------------------------------------------------------------------4.20:
Unknown RealTek chip (TxConfig: 0x57100f80)
Offset		Values
------		------
0x0000:		84 39 be 95 00 b2 00 00 40 01 40 10 80 00 80 00 
0x0010:		00 90 27 70 02 00 00 00 a9 0c 46 00 00 00 00 00 
0x0020:		00 e0 7b 54 02 00 00 00 00 00 00 00 00 00 00 00 
0x0030:		00 00 00 00 00 00 00 0c 00 00 00 00 3f 80 00 00 
0x0040:		80 0f 10 57 0e cf 02 00 00 00 00 00 00 00 00 00 
0x0050:		10 00 cf bc 60 11 03 01 00 00 00 00 00 00 00 00 
0x0060:		00 00 00 00 ec 10 23 01 2c f0 00 80 93 00 c0 f0 
0x0070:		00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 dd d1 
0x0080:		8b 06 01 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x0090:		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x00a0:		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x00b0:		7f 04 00 00 00 00 00 00 ad 79 01 d2 00 00 00 00 
0x00c0:		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x00d0:		21 00 04 32 0e 00 00 00 00 00 00 40 c5 4a fd 00 
0x00e0:		e1 20 51 51 00 f0 7b 54 02 00 00 00 27 00 00 00 
0x00f0:		3f 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 

I can see differences.

Concerning the possibility to compile kernels, I'm afraid the hardware is no good enough to make it a practical option; and the last time I compiled a linux kernel was probably version 2.0 ...

Comment 18 Steve 2019-02-05 19:23:35 UTC
Created attachment 1527266 [details]
screenshot showing differences in register dumps using "meld"

This screenshot shows the differences in the register dumps in Comment 17. Generated with:

$ meld regs-4.18.txt regs-4.20.txt

Comment 19 Steve 2019-02-05 19:55:29 UTC
(In reply to Tonino from comment #17)
...
> Concerning the possibility to compile kernels, I'm afraid the hardware is no
> good enough to make it a practical option; and the last time I compiled a
> linux kernel was probably version 2.0 ...

Rebuilding a Fedora kernel isn't too difficult, but it does require a machine with good cooling, since CPU usage will go to 100% at times. The output is a set of rpm packages that can be installed with dnf. Here is a guide:

Building the Fedora Kernel
by Laura Abbott
September 6, 2016 
https://fedoramagazine.org/building-fedora-kernel/

(See the comments at the end for some corrections to the procedure.)

Comment 20 Steve 2019-02-05 19:57:38 UTC
(In reply to Heiner Kallweit from comment #16)
...
> Then you could try building a kernel with the call to
> rtl_hw_aspm_clkreq_enable(tp, true) at the end of rtl_hw_start_8168h_1()
> being disabled.
...

Could you attach a patch, so it is completely clear what needs to be changed?

Comment 21 Heiner Kallweit 2019-02-06 06:55:08 UTC
The following patch is against linux-next, so it may not apply cleanly. But you get the idea.

---
 drivers/net/ethernet/realtek/r8169.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index e8a112149..6ef89f518 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5334,7 +5334,7 @@ static void rtl_hw_start_8168h_1(struct rtl8169_private *tp)
 	r8168_mac_ocp_write(tp, 0xc094, 0x0000);
 	r8168_mac_ocp_write(tp, 0xc09e, 0x0000);
 
-	rtl_hw_aspm_clkreq_enable(tp, true);
+	// rtl_hw_aspm_clkreq_enable(tp, true);
 }
 
 static void rtl_hw_start_8168ep(struct rtl8169_private *tp)
-- 
2.20.1

Comment 22 Tonino 2019-02-06 07:18:42 UTC
(In reply to Steve from comment #19)
> (In reply to Tonino from comment #17)
> ...
> > Concerning the possibility to compile kernels, I'm afraid the hardware is no
> > good enough to make it a practical option; and the last time I compiled a
> > linux kernel was probably version 2.0 ...
> 
> Rebuilding a Fedora kernel isn't too difficult, but it does require a
> machine with good cooling, since CPU usage will go to 100% at times. The
> output is a set of rpm packages that can be installed with dnf. Here is a
> guide:
> 
> Building the Fedora Kernel
> by Laura Abbott
> September 6, 2016 
> https://fedoramagazine.org/building-fedora-kernel/
> 
> (See the comments at the end for some corrections to the procedure.)

I can give it a try, but the CPU and the cooling are indeed the problem; it's a low power passively cooled minipc, and the other one is an old laptop.
So far I didn't even manage to clone the repo; for some reason it goes to 80KB/s until it hangs and die around 60% of the objects.

I'll see what I can do.

Comment 23 Steve 2019-02-06 09:34:07 UTC
(In reply to Heiner Kallweit from comment #21)
> The following patch is against linux-next, so it may not apply cleanly. But
> you get the idea.
...

Thanks.

(In reply to Tonino from comment #22)
...
> I can give it a try, but the CPU and the cooling are indeed the problem;
> it's a low power passively cooled minipc, and the other one is an old laptop.
> So far I didn't even manage to clone the repo; for some reason it goes to
> 80KB/s until it hangs and die around 60% of the objects.
> 
> I'll see what I can do.

OK. I tried the same and the clone slowed at about 60%, but eventually completed.

$ git checkout -b test1 origin/f29
Branch 'test1' set up to track remote branch 'f29' from 'origin'.
Switched to a new branch 'test1'

$ git log -1 kernel.spec
commit c28a1e954f2d1783aaeca35aab0f052052807c8f (HEAD -> test1, origin/f29)
Author: Justin M. Forbes <jforbes>
Date:   Thu Jan 31 08:57:35 2019 -0600

    Linux v4.20.6

Comment 24 Steve 2019-02-06 11:33:02 UTC
Created attachment 1527473 [details]
9999-patch-rtl_hw_start_8168h_1.patch applies to 4.20.6

Save 9999-patch-rtl_hw_start_8168h_1.patch to the directory with kernel.spec.

# Tell git about the patch file:
$ git add 9999-patch-rtl_hw_start_8168h_1.patch

# Add the patch to kernel.spec:
$ ./scripts/newpatch.sh 9999-patch-rtl_hw_start_8168h_1.patch

# Commit the patch file and the modified kernel.spec:
$ git commit -a -m 'add 9999-patch-rtl_hw_start_8168h_1.patch'

NB: The newpatch.sh script changes "buildid" in kernel.spec, so the change described in the guide (Comment 19) is not needed:

$ git show kernel.spec
...
-# define buildid .local
+%define buildid .9999_patch_rtl_hw_start_8168h_1.patch
...

Comment 25 Steve 2019-02-06 13:16:37 UTC
Heiner: Do we need kernel-debuginfo packages?

Comment 26 Steve 2019-02-06 14:44:30 UTC
I successfully completed a build with:

$ fedpkg -v local --arch x86_64 --with baseonly --without debuginfo

Output is:

$ ls x86_64/
kernel-4.20.6-200.9999_patch_rtl_hw_start_8168h_1.patch.fc29.x86_64.rpm
kernel-core-4.20.6-200.9999_patch_rtl_hw_start_8168h_1.patch.fc29.x86_64.rpm
kernel-devel-4.20.6-200.9999_patch_rtl_hw_start_8168h_1.patch.fc29.x86_64.rpm
kernel-modules-4.20.6-200.9999_patch_rtl_hw_start_8168h_1.patch.fc29.x86_64.rpm
kernel-modules-extra-4.20.6-200.9999_patch_rtl_hw_start_8168h_1.patch.fc29.x86_64.rpm

The build log is:

$ ls .build*.log 
.build-4.20.6-200.9999_patch_rtl_hw_start_8168h_1.patch.fc29.log

Build tip: Run "top" in a separate terminal window. Despite the "-v" option (for verbose), there are times when the build does not show any output.

Documentation: "man fedpkg"

kernel.spec has the "--with" and "--without" options. For example:

# Only build the base kernel (--with baseonly):
%define with_baseonly  %{?_with_baseonly:     1} %{?!_with_baseonly:     0}

Comment 27 Tonino 2019-02-06 15:56:55 UTC
I'm already at the third attempt following the tutorial you suggested yesterday; the others failed for lack of space.

If also this attempt fails, I'll try your way.

Comment 28 Heiner Kallweit 2019-02-06 17:02:45 UTC
(In reply to Steve from comment #25)
> Heiner: Do we need kernel-debuginfo packages?

No. Just a test whether the change fixes the issue would be sufficient.

Comment 29 Tonino 2019-02-06 21:44:49 UTC
Ok, I finally managed to have a new kernel compiled and packaged: the behaviour is now the same as with kernel 4.18

Comment 30 Heiner Kallweit 2019-02-07 06:45:10 UTC
(In reply to Tonino from comment #29)
> Ok, I finally managed to have a new kernel compiled and packaged: the
> behaviour is now the same as with kernel 4.18

Behavior same as with 4.18 means that patch from comment 21 fixes the issue for you?
Then indeed we have some incompatibility between board chipset and network chip. I have to think about how we can deal with this w/o having to disable these power-saving features for all users (and thus reducing battery lifetime on notebooks for users).

Comment 31 Tonino 2019-02-07 07:09:59 UTC
I would say that the issue is partially fixed, since RX speed is still 25% slower than TX, while on Windows it's fine (I didn't manage to measure it with iperf because of the stubborn firewall, but the download speed gets to ~950Mb/s even without Ookla).

Anyway, now that you mention it, it's not the first power management issue I have with this device; I was going to file another bug for alsa getting stuck after the screen goes off. Maybe the two things are connected.

Comment 32 Tonino 2019-02-07 07:11:27 UTC
**with Ookla**

Comment 33 Heiner Kallweit 2019-02-07 08:41:02 UTC
(In reply to Tonino from comment #31)
> I would say that the issue is partially fixed, since RX speed is still 25%
> slower than TX, while on Windows it's fine (I didn't manage to measure it
> with iperf because of the stubborn firewall, but the download speed gets to
> ~950Mb/s even without Ookla).

With the fix now, when you say that RX is 25% slower than TX: Is RX on the same level as with 4.18 oder slower than with 4.18?

You could try to play with interrupt coalescing and see whether this has an impact on RX rate. See "ethtool -c" for showing coalesce settings and "ethtool -C" for changing them.

Comment 34 Tonino 2019-02-07 08:50:36 UTC
It's the same as 4.18 (~750 Mb/s, as the measurement above).

I can give it a try later today

Comment 35 Steve 2019-02-07 10:12:53 UTC
(In reply to Tonino from comment #31)
> I would say that the issue is partially fixed, since RX speed is still 25%
> slower than TX, while on Windows it's fine (I didn't manage to measure it
> with iperf because of the stubborn firewall, but the download speed gets to
> ~950Mb/s even without Ookla).
...

What version of Windows are you running?

By "Ookla", do you mean:
https://www.speedtest.net/

Comment 36 Steve 2019-02-07 10:15:27 UTC
(In reply to Heiner Kallweit from comment #33)
...
> You could try to play with interrupt coalescing and see whether this has an
> impact on RX rate. See "ethtool -c" for showing coalesce settings and
> "ethtool -C" for changing them.

"ethtool -C" has a lot of options. Could you suggest some specific tests?

Comment 37 Heiner Kallweit 2019-02-07 10:18:14 UTC
(In reply to Steve from comment #36)
> (In reply to Heiner Kallweit from comment #33)
> ...
> > You could try to play with interrupt coalescing and see whether this has an
> > impact on RX rate. See "ethtool -c" for showing coalesce settings and
> > "ethtool -C" for changing them.
> 
> "ethtool -C" has a lot of options. Could you suggest some specific tests?

When looking at the settings with "ethtool -c" you see what is set and can be changed. Her we talk about a RX issue, therefore it's about rx-frames and rx-usecs.

Comment 38 Tonino 2019-02-07 10:24:59 UTC
> What version of Windows are you running?
10
> 
> By "Ookla", do you mean:
> https://www.speedtest.net/
yes. The results are unstable, for obvious reasons, but still they get close to 1Gb/s. 

To try the coalesce settings I'll need to do it with iperf, to have more control.

Comment 39 Tonino 2019-02-07 10:33:41 UTC
(In reply to Heiner Kallweit from comment #30)

> Then indeed we have some incompatibility between board chipset and network
> chip. I have to think about how we can deal with this w/o having to disable
> these power-saving features for all users (and thus reducing battery
> lifetime on notebooks for users).

Let me know if any additional info about the hardware can help.

Comment 40 Tonino 2019-02-07 11:14:04 UTC
I tried to play around with the coalesce settings, but, when the change has any effect, the speed gets worse, not better.

Looking in more detail, comparing with 4.18 there is a small difference, which I can't say if it's relevant, but I prefer to report it:
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  85.3 MBytes   715 Mbits/sec                  
[  5]   1.00-2.00   sec  88.4 MBytes   742 Mbits/sec                  
[  5]   2.00-3.00   sec  88.5 MBytes   742 Mbits/sec                  
[  5]   3.00-4.00   sec  88.5 MBytes   742 Mbits/sec                  
[  5]   4.00-5.00   sec  88.5 MBytes   742 Mbits/sec                  
[  5]   5.00-6.00   sec  88.5 MBytes   742 Mbits/sec                  
[  5]   6.00-7.00   sec  88.5 MBytes   742 Mbits/sec                  
[  5]   7.00-8.00   sec  88.5 MBytes   742 Mbits/sec                  
[  5]   8.00-9.00   sec  88.5 MBytes   742 Mbits/sec                  
[  5]   9.00-10.00  sec  88.5 MBytes   742 Mbits/sec                  
[  5]  10.00-10.04  sec  3.21 MBytes   740 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.04  sec   885 MBytes   739 Mbits/sec                  receiver
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  85.4 MBytes   716 Mbits/sec                  
[  5]   1.00-2.00   sec  88.6 MBytes   743 Mbits/sec                  
[  5]   2.00-3.00   sec  88.6 MBytes   743 Mbits/sec                  
[  5]   3.00-4.00   sec  88.6 MBytes   744 Mbits/sec                  
[  5]   4.00-5.00   sec  88.8 MBytes   745 Mbits/sec                  
[  5]   5.00-6.00   sec  88.8 MBytes   745 Mbits/sec                  
[  5]   6.00-7.00   sec  88.8 MBytes   745 Mbits/sec                  
[  5]   7.00-8.00   sec  88.8 MBytes   745 Mbits/sec                  
[  5]   8.00-9.00   sec  88.8 MBytes   745 Mbits/sec                  
[  5]   9.00-10.00  sec  88.8 MBytes   745 Mbits/sec                  
[  5]  10.00-10.04  sec  3.24 MBytes   741 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.04  sec   887 MBytes   742 Mbits/sec                  receiver

The average is about 1% slower than 4.18, but the feature that caught my attention is the larger variance. I tried several times, and it never goes exactly as the above example with 4.18.
It may mean nothing, or there may be some other difference between now and 4.18.

Comment 41 Steve 2019-02-07 11:40:16 UTC
(In reply to Tonino from comment #38)
> > What version of Windows are you running?
> 10
...

This might be sufficient for a one-off iperf3 test:

Turn Windows Defender Firewall on or off
https://support.microsoft.com/en-us/help/4028544/windows-10-turn-windows-defender-firewall-on-or-off

Comment 42 Tonino 2019-02-07 11:51:02 UTC
I tried something else: 
iperf3 -R -c to my fc29 laptop from windows -> 750Mb/s, therefore it's probably a limitation on the old laptop side.

I gave a try to a public iperf server, and I got:

[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   107 MBytes   899 Mbits/sec                  
[  5]   1.00-2.00   sec   110 MBytes   925 Mbits/sec                  
[  5]   2.00-3.00   sec   110 MBytes   925 Mbits/sec                  
[  5]   3.00-4.00   sec   110 MBytes   920 Mbits/sec                  
[  5]   4.00-5.00   sec   111 MBytes   930 Mbits/sec                  
[  5]   5.00-6.00   sec   110 MBytes   925 Mbits/sec                  
[  5]   6.00-7.00   sec   110 MBytes   925 Mbits/sec                  
[  5]   7.00-8.00   sec   110 MBytes   925 Mbits/sec                  
[  5]   8.00-9.00   sec   110 MBytes   925 Mbits/sec                  
[  5]   9.00-10.00  sec   110 MBytes   925 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.08 GBytes   931 Mbits/sec  1775             sender
[  5]   0.00-10.00  sec  1.07 GBytes   922 Mbits/sec                  receiver

So, the residual problem is on the old laptop TX side, which I don't really care to investigate.

At this point I would say that this issue is solved with the proposed fix. Let me know if I can help to define how to propagate it to official builds without impairing other functionalities (as Heiner was mentioning).

Comment 43 Steve 2019-02-07 12:25:40 UTC
(In reply to Tonino from comment #42)
> I tried something else: 
> iperf3 -R -c to my fc29 laptop from windows -> 750Mb/s, therefore it's
> probably a limitation on the old laptop side.
> 
> I gave a try to a public iperf server, and I got:
...

Thanks for running that test. For the record, could you post the link speeds reported by ethtool for the minipc and for the old laptop?

# ethtool enp2s0 # minipc
# ethtool enXXXX # old laptop

Comment 44 Tonino 2019-02-07 12:47:23 UTC
----------------------------------------------------------------------------minipc
Settings for enp2s0:
	Supported ports: [ TP AUI BNC MII FIBRE ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Full 
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Full 
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Link partner advertised link modes:  10baseT/Half 10baseT/Full 
	                                     100baseT/Half 100baseT/Full 
	                                     1000baseT/Half 1000baseT/Full 
	Link partner advertised pause frame use: No
	Link partner advertised auto-negotiation: Yes
	Link partner advertised FEC modes: Not reported
	Speed: 1000Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
Cannot get wake-on-lan settings: Operation not permitted
	Current message level: 0x00000033 (51)
			       drv probe ifdown ifup
	Link detected: yes
----------------------------------------------------------------------------------old laptop
Settings for enp2s0f0:
	Supported ports: [ TP ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Supported pause frame use: No
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Advertised pause frame use: Symmetric
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Link partner advertised link modes:  10baseT/Half 10baseT/Full 
	                                     100baseT/Half 100baseT/Full 
	                                     1000baseT/Half 1000baseT/Full 
	Link partner advertised pause frame use: No
	Link partner advertised auto-negotiation: Yes
	Link partner advertised FEC modes: Not reported
	Speed: 1000Mb/s
	Duplex: Full
	Port: Twisted Pair
	PHYAD: 1
	Transceiver: internal
	Auto-negotiation: on
	MDI-X: on
Cannot get wake-on-lan settings: Operation not permitted
	Current message level: 0x000000ff (255)
			       drv probe link timer ifdown ifup rx_err tx_err
	Link detected: yes

Comment 45 Steve 2019-02-07 13:11:21 UTC
(In reply to Tonino from comment #44)

Thanks for posting the ethtool output. Do you have a Cat 5 cable on the old laptop? (I found some of my cables were Cat 5, so I did an inventory and replaced all of them with Cat 5e cables.)

----------------------------------------------------------------------------------old laptop
Settings for enp2s0f0:
...
	Speed: 1000Mb/s
...
	Port: Twisted Pair
...

Comment 46 Tonino 2019-02-07 18:34:19 UTC
No, the cable is 5e.

Comment 47 Heiner Kallweit 2019-02-08 19:39:24 UTC
(In reply to Steve from comment #24)
> Created attachment 1527473 [details]
> 9999-patch-rtl_hw_start_8168h_1.patch applies to 4.20.6
> 
> Save 9999-patch-rtl_hw_start_8168h_1.patch to the directory with kernel.spec.
> 
> # Tell git about the patch file:
> $ git add 9999-patch-rtl_hw_start_8168h_1.patch
> 
> # Add the patch to kernel.spec:
> $ ./scripts/newpatch.sh 9999-patch-rtl_hw_start_8168h_1.patch
> 
> # Commit the patch file and the modified kernel.spec:
> $ git commit -a -m 'add 9999-patch-rtl_hw_start_8168h_1.patch'
> 
> NB: The newpatch.sh script changes "buildid" in kernel.spec, so the change
> described in the guide (Comment 19) is not needed:
> 
> $ git show kernel.spec
> ...
> -# define buildid .local
> +%define buildid .9999_patch_rtl_hw_start_8168h_1.patch
> ...

The r8168 vendor driver has another undocumented setting that I added to the mainline driver with the following patch.
Could you please test with this patch instead of 9999-patch-rtl_hw_start_8168h_1.patch. Thanks a lot.


---
 drivers/net/ethernet/realtek/r8169.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index e8a112149..4854f6d29 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4830,11 +4830,13 @@ static void rtl_pcie_state_l2l3_disable(struct rtl8169_private *tp)
 static void rtl_hw_aspm_clkreq_enable(struct rtl8169_private *tp, bool enable)
 {
 	if (enable) {
+		RTL_W8(tp, 0xf1, RTL_R8(tp, 0xf1) | BIT(7));
 		RTL_W8(tp, Config5, RTL_R8(tp, Config5) | ASPM_en);
 		RTL_W8(tp, Config2, RTL_R8(tp, Config2) | ClkReqEn);
 	} else {
 		RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
 		RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
+		RTL_W8(tp, 0xf1, RTL_R8(tp, 0xf1) & ~BIT(7));
 	}
 
 	udelay(10);
-- 
2.20.1

Comment 48 Tonino 2019-02-09 08:18:47 UTC
Unfortunately with this patch it's even worse than the original performance: an average of 11Mb/s, with some peaks of 100Mb/s

Comment 49 Tonino 2019-02-09 08:22:15 UTC
In case it's relevant, I restarted the whole procedure and it downloaded v4.20.7 instead of 4.20.6; the above result is with 4.20.7 patched

Comment 50 Heiner Kallweit 2019-02-09 11:24:42 UTC
Thanks, at least we tried it. Another test:
If you set kernel parameter pcie_aspm.policy=performance (w/o any of the dicussed patches), does that help?

Comment 51 Heiner Kallweit 2019-02-09 12:02:08 UTC
Also interesting would the "lspci -vv" output for the network chip in the failing scenario.

Comment 52 Tonino 2019-02-09 15:02:43 UTC
I tried with the official 4.20.6 and the suggested kernel parameter set: the effect is the same as the first patch.

Comment 53 Heiner Kallweit 2019-02-09 15:17:14 UTC
(In reply to Tonino from comment #52)
> I tried with the official 4.20.6 and the suggested kernel parameter set: the
> effect is the same as the first patch.

OK, means the kernel parameter fixes the issue as well?

Comment 54 Tonino 2019-02-09 16:49:35 UTC
Yes

Comment 55 Heiner Kallweit 2019-02-09 19:45:07 UTC
(In reply to Tonino from comment #54)
> Yes

Thanks for the confirmation. So you at least have a workaround independent of kernel-patching.
Still helpful would be the output from what I mentioned in comment 51.

Comment 56 Tonino 2019-02-10 07:07:14 UTC
You said "in the failing scenario" ...

Here it is:

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
        Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 23
        Region 0: I/O ports at e000 [size=256]
        Region 2: Memory at a1104000 (64-bit, non-prefetchable) [size=4K]
        Region 4: Memory at a1100000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: r8169
        Kernel modules: r8169

Comment 57 Tonino 2019-02-10 07:09:03 UTC
Or better:

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
        Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 23
        Region 0: I/O ports at e000 [size=256]
        Region 2: Memory at a1104000 (64-bit, non-prefetchable) [size=4K]
        Region 4: Memory at a1100000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 01
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE#
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00000800
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
        Capabilities: [170 v1] Latency Tolerance Reporting
                Max snoop latency: 3145728ns
                Max no snoop latency: 3145728ns
        Capabilities: [178 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=150us PortTPowerOnTime=150us
                L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
                           T_CommonMode=0us LTR1.2_Threshold=163840ns
                L1SubCtl2: T_PwrOn=150us
        Kernel driver in use: r8169
        Kernel modules: r8169

Comment 58 Heiner Kallweit 2019-02-10 10:10:42 UTC
Thanks for the lspci output. The ASPM L1.1 and L1.2 substates are active and we may have the issue that clocks aren't back in time after wakeup from L1.2, thus causing missed packets.
Root cause could be in BIOS, host chipset, network chip, a wrong network chip setting, ..
I contacted Realtek whether there's any known issue with ASPM on the particular chip version (RTL8168h).

Comment 59 Tonino 2019-02-10 10:53:03 UTC
I don't know if this helps in any way, but here is the info about the pci bridge:

00:13.0 PCI bridge: Intel Corporation Device 31d8 (rev f3) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 122
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: a1200000-a12fffff [size=1M]
        Prefetchable memory behind bridge: None
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
                LnkCap: Port #3, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
                        TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #2, PowerLimit 10.000W; Interlock- NoCompl+
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet- LinkState+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
                         AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR+, OBFF Disabled ARIFwd-
                         AtomicOpsCtl: ReqEn- EgressBlck-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee00218  Data: 0000
        Capabilities: [90] Subsystem: Intel Corporation Device 7270
        Capabilities: [a0] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v0] Null
        Capabilities: [140 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [150 v0] Null
        Capabilities: [200 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
                           T_CommonMode=40us LTR1.2_Threshold=163840ns
                L1SubCtl2: T_PwrOn=60us
        Kernel driver in use: pcieport

00:13.1 PCI bridge: Intel Corporation Device 31d9 (rev f3) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin B routed to IRQ 123
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        I/O behind bridge: 0000e000-0000efff [size=4K]
        Memory behind bridge: a1100000-a11fffff [size=1M]
        Prefetchable memory behind bridge: None
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
                LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
                        TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #3, PowerLimit 10.000W; Interlock- NoCompl+
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet- LinkState+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
                         AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR+, OBFF Disabled ARIFwd-
                         AtomicOpsCtl: ReqEn- EgressBlck-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee00238  Data: 0000
        Capabilities: [90] Subsystem: Intel Corporation Device 7270
        Capabilities: [a0] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v0] Null
        Capabilities: [140 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [150 v0] Null
        Capabilities: [200 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
                           T_CommonMode=150us LTR1.2_Threshold=163840ns
                L1SubCtl2: T_PwrOn=150us
        Kernel driver in use: pcieport

Comment 60 Heiner Kallweit 2019-02-10 11:32:35 UTC
I'd like to check whether the issue may be with the ASPM clock power management. I'd appreciate if you could test the following patch (w/o all the other test patches).
And again the "lspci -vv" output for the network chip would be helpful. Thanks.

---
 drivers/net/ethernet/realtek/r8169.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index e8a112149..2237ff442 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -30,6 +30,7 @@
 #include <linux/prefetch.h>
 #include <linux/ipv6.h>
 #include <net/ip6_checksum.h>
+#include <linux/pci-aspm.h>
 
 #define MODULENAME "r8169"
 
@@ -5334,6 +5335,7 @@ static void rtl_hw_start_8168h_1(struct rtl8169_private *tp)
 	r8168_mac_ocp_write(tp, 0xc094, 0x0000);
 	r8168_mac_ocp_write(tp, 0xc09e, 0x0000);
 
+	pci_disable_link_state(tp->pci_dev, PCIE_LINK_STATE_CLKPM);
 	rtl_hw_aspm_clkreq_enable(tp, true);
 }
 
-- 
2.20.1

Comment 61 Tonino 2019-02-10 15:11:37 UTC
20Mb/s with this.

And:
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
        Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 23
        Region 0: I/O ports at e000 [size=256]
        Region 2: Memory at a1104000 (64-bit, non-prefetchable) [size=4K]
        Region 4: Memory at a1100000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 01
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE#
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00000800
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
        Capabilities: [170 v1] Latency Tolerance Reporting
                Max snoop latency: 3145728ns
                Max no snoop latency: 3145728ns
        Capabilities: [178 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=150us PortTPowerOnTime=150us
                L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
                           T_CommonMode=0us LTR1.2_Threshold=163840ns
                L1SubCtl2: T_PwrOn=150us
        Kernel driver in use: r8169
        Kernel modules: r8169

Comment 62 Steve 2019-02-10 15:28:38 UTC
(In reply to Tonino from comment #57)
> Or better:
...

Could you include the command-line when posting command output? That provides important context. If you need to remove some of the output for brevity, insert an ellipsis where the output has been removed. NB: Some terminal programs have "select all" and "copy" commands that make it easy to copy text from the terminal window into a text editor.

Comment 63 Tonino 2019-02-10 15:40:25 UTC
It's lspci -vv, as requested; I just ran it a second time as root, to avoid
        Capabilities: <access denied>

Comment 64 Daniele Viganò 2019-02-20 15:00:13 UTC
I was experiencing the same issue on a Dell Optiplex 3060 MFF running Fedora 29 with 4.20.7-200.fc29.x86_64: I was getting max speed of 12MB/s on Ethernet.

'pcie_aspm.policy=performance' boot command parameter did not fixed the issue.

I resolved the issue setting ASPM to 'off' in the BIOS (I was lucky enough to get a specific setting for this in BIOS). Now I'm able to sustain a 100+ MB/s flow rate.

$ dmesg | grep -i ASPM

[    0.489792] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
[    0.613349] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
[    0.613747] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration

$ lscpi -v

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
	Subsystem: Dell Device 085c
	Flags: bus master, fast devsel, latency 0, IRQ 16
	I/O ports at 3000 [size=256]
	Memory at d1104000 (64-bit, non-prefetchable) [size=4K]
	Memory at d1100000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 01
	Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
	Capabilities: [170] Latency Tolerance Reporting
	Capabilities: [178] L1 PM Substates
	Kernel driver in use: r8169
	Kernel modules: r8169

Comment 65 Daniele Viganò 2019-02-20 15:02:36 UTC
$ lspci -vv

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
	Subsystem: Dell Device 085c
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: I/O ports at 3000 [size=256]
	Region 2: Memory at d1104000 (64-bit, non-prefetchable) [size=4K]
	Region 4: Memory at d1100000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 01
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE#
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00000800
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [140 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
	Capabilities: [170 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [178 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=150us PortTPowerOnTime=150us
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
			   T_CommonMode=0us LTR1.2_Threshold=81920ns
		L1SubCtl2: T_PwrOn=150us
	Kernel driver in use: r8169
	Kernel modules: r8169

Comment 66 Steve 2019-02-20 17:05:36 UTC
(In reply to Daniele Viganò from comment #65)
> $ lspci -vv
> 
> 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
...

Could you post the output for:

$ dmesg | grep XID
$ lspci -s 01:00.0 -nv

Comment 67 Daniele Viganò 2019-02-21 11:17:12 UTC
(In reply to Steve from comment #66)
> 
> Could you post the output for:
> 
> $ dmesg | grep XID

[    6.044987] r8169 0000:01:00.0 eth0: RTL8168h/8111h, d8:9e:f3:9c:21:54, XID 54100800, IRQ 127

> $ lspci -s 01:00.0 -nv

01:00.0 0200: 10ec:8168 (rev 15)
	Subsystem: 1028:085c
	Flags: bus master, fast devsel, latency 0, IRQ 16
	I/O ports at 3000 [size=256]
	Memory at d1104000 (64-bit, non-prefetchable) [size=4K]
	Memory at d1100000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 01
	Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
	Capabilities: [170] Latency Tolerance Reporting
	Capabilities: [178] L1 PM Substates
	Kernel driver in use: r8169
	Kernel modules: r8169


these outputs have been generated with 'ASPM' set as 'Off'. If you want me to provide outputs with 'ASPM' set to 'Auto' or 'L0' I can provide them but not now since I'm from remote.

Comment 68 Steve 2019-02-22 02:48:49 UTC
(In reply to Daniele Viganò from comment #67)
...
> [    6.044987] r8169 0000:01:00.0 eth0: RTL8168h/8111h, d8:9e:f3:9c:21:54, XID 54100800, IRQ 127
...

Thanks for posting that and the lspci output. That confirms that you have the same device as Tonino.

This appears to be the same bug:

Bug 1679140 - Bad r8169 performance on some networks after suspend 

See, in particular, what Heiner says in Bug 1679140, Comment 11.

Comment 69 Heiner Kallweit 2019-03-02 19:49:00 UTC
Here comes an experimental patch. I'd be curious whether it fixes the issue. Helpful would be if an affected user could test and also post the result of lspi -vv with this patch.
I'd like to see whether really L1.2 is disabled only.


---
 drivers/net/ethernet/realtek/r8169.c | 3 +++
 drivers/pci/pcie/aspm.c              | 2 ++
 include/linux/pci-aspm.h             | 7 ++++---
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index c29dde064..761097710 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -29,6 +29,7 @@
 #include <linux/firmware.h>
 #include <linux/prefetch.h>
 #include <linux/ipv6.h>
+#include <linux/pci-aspm.h>
 #include <net/ip6_checksum.h>
 
 #define MODULENAME "r8169"
@@ -7350,6 +7351,8 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (rc)
 		return rc;
 
+	pci_disable_link_state(pdev, PCIE_LINK_STATE_L1_2);
+
 	/* enable device (incl. PCI PM wakeup and hotplug setup) */
 	rc = pcim_enable_device(pdev);
 	if (rc < 0) {
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 727e3c1ef..dbf2a3530 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -1081,6 +1081,8 @@ static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
 		link->aspm_disable |= ASPM_STATE_L0S;
 	if (state & PCIE_LINK_STATE_L1)
 		link->aspm_disable |= ASPM_STATE_L1;
+	if (state & PCIE_LINK_STATE_L1_2)
+		link->aspm_disable |= ASPM_STATE_L1_2_MASK;
 	pcie_config_aspm_link(link, policy_to_aspm_state(link));
 
 	if (state & PCIE_LINK_STATE_CLKPM) {
diff --git a/include/linux/pci-aspm.h b/include/linux/pci-aspm.h
index df28af5ce..8bffef5e0 100644
--- a/include/linux/pci-aspm.h
+++ b/include/linux/pci-aspm.h
@@ -19,9 +19,10 @@
 
 #include <linux/pci.h>
 
-#define PCIE_LINK_STATE_L0S	1
-#define PCIE_LINK_STATE_L1	2
-#define PCIE_LINK_STATE_CLKPM	4
+#define PCIE_LINK_STATE_L0S	BIT(0)
+#define PCIE_LINK_STATE_L1	BIT(1)
+#define PCIE_LINK_STATE_CLKPM	BIT(2)
+#define PCIE_LINK_STATE_L1_2	BIT(3)
 
 #ifdef CONFIG_PCIEASPM
 void pci_disable_link_state(struct pci_dev *pdev, int state);
-- 
2.21.0

Comment 70 Filip Bartmann 2019-03-06 11:03:14 UTC
I have this problem too and in dmesg I have in addition this:
---------------------------------------------------------------
[ 954.310026] NETDEV WATCHDOG: ens5 (r8169): transmit queue 0 timed out
[ 954.310073] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x1f3/0x200
[ 954.310075] Modules linked in: ccm rfcomm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables bnep sunrpc arc4 rtl8192se rtl_pci rtlwifi mac80211 iTCO_wdt iTCO_vendor_support coretemp joydev btusb uvcvideo cfg80211 wmi_bmof hp_wmi btrtl sparse_keymap btbcm btintel videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 bluetooth videobuf2_common videodev media snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel lpc_ich ecdh_generic rfkill snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore dm_crypt i915 kvmgt mdev vfio kvm irqbypass i2c_algo_bit drm_kms_helper serio_raw drm r8169 realtek wmi video
[ 954.310134] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.13-200.fc29.x86_64 #1
[ 954.310135] Hardware name: Hewlett-Packard HP 620/1526, BIOS 68PVI Ver. F.02 06/17/2010
[ 954.310138] RIP: 0010:dev_watchdog+0x1f3/0x200
[ 954.310141] Code: 00 48 63 4d e0 eb 93 4c 89 e7 c6 05 b0 8c b2 00 01 e8 b1 85 fc ff 89 d9 4c 89 e6 48 c7 c7 a8 7b 18 b5 48 89 c2 e8 d7 60 8a ff <0f> 0b eb c0 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56
[ 954.310142] RSP: 0018:ffff8ed6bba03e80 EFLAGS: 00010282
[ 954.310144] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 954.310146] RDX: 0000000000040400 RSI: 00000000000000f6 RDI: 0000000000000300
[ 954.310147] RBP: ffff8ed6b9204480 R08: 000000000000004c R09: 0000000000000003
[ 954.310148] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed6b9204000
[ 954.310149] R13: 0000000000000000 R14: ffff8ed6bba03ed0 R15: 0000000000000000
[ 954.310152] FS: 0000000000000000(0000) GS:ffff8ed6bba00000(0000) knlGS:0000000000000000
[ 954.310153] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 954.310154] CR2: 00001c53996b3000 CR3: 000000008186c000 CR4: 00000000000406f0
[ 954.310156] Call Trace:
[ 954.310159]
[ 954.310165] ? pfifo_fast_dequeue+0x160/0x160
[ 954.310170] call_timer_fn+0x2b/0x130
[ 954.310174] run_timer_softirq+0x3ad/0x3e0
[ 954.310181] __do_softirq+0xe3/0x30a
[ 954.310186] irq_exit+0x100/0x110
[ 954.310189] do_IRQ+0x85/0xd0
[ 954.310193] common_interrupt+0xf/0xf
[ 954.310194]
[ 954.310198] RIP: 0010:cpuidle_enter_state+0xb6/0x330
[ 954.310201] Code: 90 31 ff e8 cc c0 96 ff 80 7c 24 0b 00 74 17 9c 58 66 66 90 66 90 f6 c4 02 0f 85 4c 02 00 00 31 ff e8 fe 80 9c ff fb 66 66 90 <66> 66 90 85 ed 0f 88 1a 02 00 00 48 b8 ff ff ff ff f3 01 00 00 48
[ 954.310202] RSP: 0018:ffffffffb5203e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffcf
[ 954.310204] RAX: ffff8ed6bba20f00 RBX: 000000de314ee569 RCX: 000000de314ee569
[ 954.310206] RDX: 000000de314ee569 RSI: 000000de314ee50a RDI: 0000000000000000
[ 954.310207] RBP: 0000000000000002 R08: fffffffffff0bde3 R09: 00000000000207c0
[ 954.310208] R10: 0000017a2254ae95 R11: ffff8ed6bba1fd84 R12: ffffffffb52d4c38
[ 954.310209] R13: ffff8ed6b43f8c00 R14: 0000000000000002 R15: 000000000000001f
[ 954.310214] do_idle+0x226/0x260
[ 954.310217] cpu_startup_entry+0x19/0x20
[ 954.310222] start_kernel+0x508/0x528
[ 954.310227] secondary_startup_64+0xa4/0xb0
[ 954.310230] —[ end trace 86c11ee05bf41ffc ]—
---------------------------------------------------------------
85:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI Express Fast Ethernet controller (rev 02)
        Subsystem: Hewlett-Packard Company Device 1526
        Physical Slot: 5
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: I/O ports at 2000 [size=256]
        Region 2: Memory at d0410000 (64-bit, prefetchable) [size=4K]
        Region 4: Memory at d0400000 (64-bit, prefetchable) [size=64K]
        Expansion ROM at d0700000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: r8169
        Kernel modules: r8169

Comment 71 Heiner Kallweit 2019-03-09 10:47:18 UTC
(In reply to Filip Bartmann from comment #70)
> I have this problem too and in dmesg I have in addition this:
> ---------------------------------------------------------------

I doubt it's the same issue, you have a 9yrs old system (w/o ASPM L1 sub-states) with a fast ethernet adapter. And the timeout is very generic, it could be anything.
Please create a separate ticket with following info:
- exact issue description and measurements (e.g. with iperf3), infos about potentially missed rx packets (ethtool -S)
- full dmesg output
- lspci -vv output (as root)
- last working and first failing kernel version

Comment 72 Alex Williamson 2019-04-02 04:09:52 UTC
Created attachment 1550866 [details]
lspci of affected system running a99790bf5c7f + comment 69 patch

(In reply to Heiner Kallweit from comment #69)
> Here comes an experimental patch. I'd be curious whether it fixes the issue.
> Helpful would be if an affected user could test and also post the result of
> lspi -vv with this patch.
> I'd like to see whether really L1.2 is disabled only.

I also experience this regression and bisected it to:

commit a99790bf5c7f3d68d8b01e015d3212a98ee7bd57 (HEAD, refs/bisect/bad)
Author: Kai-Heng Feng <kai.heng.feng>
Date:   Thu Jun 21 16:30:39 2018 +0800

    r8169: Reinstate ASPM Support

NFS is effectively unusable in kernels including this patch, a simple 'ls' on an NFS mount takes literally minutes.  The patch in comment 69 does not change the behavior.  Is laptop power savings worth broken NICs elsewhere?

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
        Manufacturer: ASUSTeK COMPUTER INC.
        Product Name: B85M-G

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
        Vendor: American Megatrends Inc.
        Version: 0908
        Release Date: 05/15/2014

CPU: Intel(R) Core(TM) i5-4430 CPU @ 3.00GHz (Haswell)

Comment 73 Heiner Kallweit 2019-04-02 05:17:36 UTC
Patch in comment 69 is for systems with L1.1/L1.2 ASPM states only. These are not supported on your system.
Affected are certain combinations of network chip versions / boards. What should help also in your case: Set boot command line parameter pcie_aspm.policy=performance
I'm currently working on an acceptable (from network subsystem maintainer perspective) way to start with disabled ASPM and allow switching ASPM on/off at runtime.

Comment 74 Heiner Kallweit 2019-04-04 19:35:01 UTC
(In reply to Alex Williamson from comment #72)
> Created attachment 1550866 [details]
> lspci of affected system running a99790bf5c7f + comment 69 patch
> 
> (In reply to Heiner Kallweit from comment #69)
> > Here comes an experimental patch. I'd be curious whether it fixes the issue.
> > Helpful would be if an affected user could test and also post the result of
> > lspi -vv with this patch.
> > I'd like to see whether really L1.2 is disabled only.
> 
> I also experience this regression and bisected it to:
> 
> commit a99790bf5c7f3d68d8b01e015d3212a98ee7bd57 (HEAD, refs/bisect/bad)
> Author: Kai-Heng Feng <kai.heng.feng>
> Date:   Thu Jun 21 16:30:39 2018 +0800
> 
>     r8169: Reinstate ASPM Support
> 
> NFS is effectively unusable in kernels including this patch, a simple 'ls'
> on an NFS mount takes literally minutes.  The patch in comment 69 does not
> change the behavior.  Is laptop power savings worth broken NICs elsewhere?

Could you please test the following and provide the "lspci -vv" output with this patch?

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 88eb9e05d..c9b5a75e3 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -28,6 +28,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/firmware.h>
 #include <linux/prefetch.h>
+#include <linux/pci-aspm.h>
 #include <linux/ipv6.h>
 #include <net/ip6_checksum.h>
 
@@ -7352,6 +7353,11 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (rc)
 		return rc;
 
+	/* Disable ASPM completely as that cause random device stop working
+	 * problems as well as full system hangs for some PCIe devices users.
+	 */
+	pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1);
+
 	/* enable device (incl. PCI PM wakeup and hotplug setup) */
 	rc = pcim_enable_device(pdev);
 	if (rc < 0) {
-- 
2.21.0

Comment 75 Alex Williamson 2019-04-04 22:51:35 UTC
(In reply to Heiner Kallweit from comment #74)
> 
> Could you please test the following and provide the "lspci -vv" output with
> this patch?

This resolves the issue, I'll attach the lspci, but here's a diff of what changes:

--- lspci.without	2019-04-04 16:43:17.974921695 -0600
+++ lspci.with	2019-04-04 16:43:17.966921654 -0600
@@ -317,10 +317,10 @@
 		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
 			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
 			MaxPayload 128 bytes, MaxReadReq 128 bytes
-		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
+		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
 		LnkCap:	Port #3, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us
 			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
-		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
+		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
 		LnkSta:	Speed 2.5GT/s (downgraded), Width x1 (ok)
 			TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
@@ -534,7 +534,7 @@
 		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
 		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
 			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
-		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
+		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
 			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
 		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
 			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
@@ -556,7 +556,7 @@
 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
-		CESta:	RxErr- BadTLP+ BadDLLP+ Rollover- Timeout- AdvNonFatalErr-
+		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
 		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
 			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-

The first chunk is the downstream root port while the last two are the r8169 endpoint.  ASPM is disabled on both ends of the link and we seem to clear up a corrected error from AER.  Maybe such errors could be used to detect and disable ASPM on uncooperative systems?

Comment 76 Alex Williamson 2019-04-04 22:53:02 UTC
Created attachment 1552131 [details]
lspci with comment 74 patch

Comment 77 Heiner Kallweit 2019-04-05 05:05:15 UTC
(In reply to Alex Williamson from comment #75)
> The first chunk is the downstream root port while the last two are the r8169
> endpoint.  ASPM is disabled on both ends of the link and we seem to clear up
> a corrected error from AER.  Maybe such errors could be used to detect and
> disable ASPM on uncooperative systems?

Thanks for testing! Unfortunately there are quite different symptoms of ASPM incompatibilities, it starts with decreased performance due to missed rx packets. I haven't seen anybody else reporting errors that trigger AER.

Comment 78 Justin M. Forbes 2019-08-20 17:39:01 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.

Fedora 29 has now been rebased to 5.2.9-100.fc29.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 30, and are still experiencing this issue, please change the version to Fedora 30.

If you experience different issues, please open a new bug report for those.

Comment 79 Justin M. Forbes 2019-09-17 20:01:43 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.


Note You need to log in before you can comment on or make changes to this bug.