Bug 1671691 - Acer Nitro 5: RTL8111/8168/8411 fails to connect at reboot
Summary: Acer Nitro 5: RTL8111/8168/8411 fails to connect at reboot
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 29
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-01 10:26 UTC by Turgut Kalfaoglu
Modified: 2020-04-08 04:15 UTC (History)
18 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-09-17 20:12:03 UTC
Type: Bug
Embargoed:
turgut: needinfo-


Attachments (Terms of Use)
dmesg for a cold start (93.38 KB, text/plain)
2019-02-01 10:26 UTC, Turgut Kalfaoglu
no flags Details
dmesg for a reboot (88.79 KB, text/plain)
2019-02-01 10:27 UTC, Turgut Kalfaoglu
no flags Details
dmesg output from a reboot where ethernet does not work (2.60 MB, text/plain)
2019-04-16 23:38 UTC, Jason C.
no flags Details
dmesg output from a cold boot where etherent works (1.15 MB, text/plain)
2019-04-16 23:38 UTC, Jason C.
no flags Details

Description Turgut Kalfaoglu 2019-02-01 10:26:32 UTC
Created attachment 1525791 [details]
dmesg for a cold start

1. Please describe the problem:
Laptop Nitro AN515-42 bios 1.11 will not connect to ethernet if the machine is rebooted. If the machine is powered off and turned back on, it works flawlessly.

2. What is the Version-Release number of the kernel:
4.20.4-200.fc29.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
New machine; unable to tell. However' to get it to boot at all without CPU lockups, these kernel parameters are currently being used:
rcu_nocbs=0-7 ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2 vt.handoff=1 pti=off pcie_aspm.policy=performance

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
Restart computer, NetworkManager only connects to wifi.
Shutdown machine, NetworkManager connects to both ethernet and wifi.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Did not try. Was very difficult to get THIS kernel to boot to begin with. See above kernel parameters.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

I have two dmesg: one for reboot and one for cold start.

Comment 1 Turgut Kalfaoglu 2019-02-01 10:27:07 UTC
Created attachment 1525792 [details]
dmesg for a reboot

Comment 2 Turgut Kalfaoglu 2019-02-01 10:30:11 UTC
Hardware info:

# dmesg |grep XID
[    3.696682] r8169 0000:02:00.1 eth0: RTL8411, 98:28:a6:28:f8:6a, XID
5c800800, IRQ 54
[  560.935157] r8169 0000:02:00.1 eth0: RTL8411, 98:28:a6:28:f8:6a, XID
5c800880, IRQ 54

# dmesg |grep "attached PHY"
[   26.900765] Generic PHY r8169-201:00: attached PHY driver [Generic PHY]
(mii_bus:phy_addr=r8169-201:00, irq=IGNORE)
[  560.982478] Generic PHY r8169-201:00: attached PHY driver [Generic PHY]
(mii_bus:phy_addr=r8169-201:00, irq=IGNORE)
 
# more /sys/class/net/enp2s0f1/phydev/phy_id
0x001cc800

Comment 3 Heiner Kallweit 2019-02-01 10:38:50 UTC
There's nothing special in the logs. Other needed information:

- chip register diff (diff of "ethtool -d enp2s0f1") between working and non-working version
- last working and first failing kernel version
- if it's a regression, a bisect would be much appreciated

What you could test to rule out an ASPM issue:
set kernel parameter pcie_aspm.policy=performance

Comment 4 Heiner Kallweit 2019-02-01 10:42:37 UTC
Just see in your description that the ASPM parameter is set already, and it obviously doesn't solve the issue.
It would really be needed that you check older kernel versions. If it's not a regression it could be anything in an underlying layer. It's not necessarily a driver issue.
Alternatively you could also test the r8168 vendor driver.

Comment 5 Turgut Kalfaoglu 2019-02-01 11:03:30 UTC
Btw, If I dual-reboot the machine to windows (10) and then back to linux, the problem also goes away as if I had done a power cycle.

Comment 6 Turgut Kalfaoglu 2019-02-01 11:09:44 UTC
diff between ethtool -d enp2s0f1  :

# diff working.txt notworking.txt 
4c4
< 0x08: Multicast Address Filter     0x00400440 0x00800080
---
> 0x08: Multicast Address Filter     0x00000040 0x00000080
15,16c15,16
< 0x3E: Interrupt Status                            0x0000
<       
---
> 0x3E: Interrupt Status                            0x0040
>       RxFIFO 
18c18
< 0x44: Rx Configuration                        0x0002cf0e
---
> 0x44: Rx Configuration                        0x0000cf0e

Comment 7 Heiner Kallweit 2019-02-01 11:48:49 UTC
That bit 17 in "Rx Configuration" is set is a little surprising. This is the working config? And reproducible?
When did you run the ethtool command? After the r8169 driver was loaded?
I'd expect rtl_init_rxcfg() to clear this bit in the drivers probe function.

Comment 8 Heiner Kallweit 2019-02-01 12:36:35 UTC
Forget about my last question, driver having been loaded of course is a prerequisite for running ethtool.
Most likely this undocumented bit 17 is set internally by the chip.

Because you need some special parameters to run 4.20 on your quite new system, I tend to think that some system incompatibility causes also the network issue.
You could try the latest linux-next kernel, maybe your platform is better supported meanwhile.

Comment 9 Laura Abbott 2019-04-09 20:47:04 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.
 
Fedora XX has now been rebased to 5.0.6  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 30, and are still experiencing this issue, please change the version to Fedora 30.
 
If you experience different issues, please open a new bug report for those.

Comment 10 Jason C. 2019-04-16 23:38:20 UTC
Created attachment 1555703 [details]
dmesg output from a reboot where ethernet does not work

Comment 11 Jason C. 2019-04-16 23:38:49 UTC
Created attachment 1555704 [details]
dmesg output from a cold boot where etherent works

Comment 12 Jason C. 2019-04-16 23:39:15 UTC
This appears to be the same bug as I'm experiencing.  I don't have any special set up to get my system to run stock Fedora 29.

1. Please describe the problem:
Brand new System76 Darter Pro laptop fails to connect to the network via ethernet after a reboot.  If the system is powered off, both the power cable and ethernet cord are removed, and then the system is turned back on it will work again.  Note that rebooting from a kernel version not affected by this into an affected kernel version will work.

2. What is the Version-Release number of the kernel:
kernel-5.0.7-200.fc29.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
I tried the following kernels:
BAD:
kernel-5.1.0-0.rc5.git0.1.fc31.x86_64
kernel-5.0.7-200
kernel-5.0.4-200
kernel-4.20.16-200
kernel-4.20.8-200
kernel-4.20.3-200

WORKS:
kernel-4.19.15-300
kernel-4.18.16-300

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
Boot into an affected kernel (any in the 4.20 series or greater).  Reboot.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Yes, tried kernel-5.1.0-0.rc5.git0.1.fc31.x86_64

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.
Attached a dmesg from the rawhide kernel.  One for a good boot and one for a bad reboot.

Setting the kernel paramter pcie_aspm.policy=performance did not help.

$ grep XID dmesg_rawhide_good.txt
Apr 16 18:19:52 kernel: r8169 0000:39:00.1 eth0: RTL8411, 80:fa:5b:68:aa:84, XID 5c8, IRQ 130

ethtool -d enp57s0f1 diff from a good and bad boot:
$ diff rawhide_good.txt rawhide_bad.txt
4,6c4,6
< 0x08: Multicast Address Filter     0x00400040 0x08800080
< 0x10: Dump Tally Counter Command   0x64eb0000 0x00000008
< 0x20: Tx Normal Priority Ring Addr 0x556bc000 0x00000008
---
> 0x08: Multicast Address Filter     0x00400040 0x08000080
> 0x10: Dump Tally Counter Command   0x66f88000 0x00000008
> 0x20: Tx Normal Priority Ring Addr 0x57be4000 0x00000008
18c18
< 0x44: Rx Configuration                        0x0002cf0e
---
> 0x44: Rx Configuration                        0x0000cf0e
57c57
< 0xE4: Rx Ring Addr                 0x4fc45000 0x00000008
---
> 0xE4: Rx Ring Addr                 0x5067c000 0x00000008


# cat /sys/class/net/enp57s0f1/phydev/phy_id
0x001cc800


It looks like the problem was introduced between 4.19 and 4.20.  I don't think a bisect will help for that since it's an upstream kernel version change.  I did look at the Fedora patches between 
kernel-4.20.3-200 and kernel-4.19.15-300 and nothing stuck out to me.

Comment 13 Heiner Kallweit 2019-04-17 05:35:31 UTC
I can't reproduce the issue with a chip version from the same family (RTL8168g). We can't exclude a platform issue yet. It would be helpful if you bisect between 4.19 and 4.20 (upstream kernels).

Comment 14 Jason C. 2019-04-20 02:21:16 UTC
I've been trying to do the bisection, but I've failed at it a few times now.  I do get to a bad commit, but merging it into 4.19 doesn't break things and reverting it from v4.20 doesn't fix things.

The issue with the bisection is that with some sets of commits there is a bug that causes my machine to fail to reboot.  If I hold down the power button to shut down the system, that has the same effect as gracefully shutting down the system and then unplugging the power cable.  So, I'm unable to test for the issue.  I didn't realize that until I went through a bisection.  So, I think I'm going to have to figure out how to avoid the bug that is causing my system to not reboot while doing the bisection.

I'm not experienced with performing bisections.  This is my first try.  I'm wondering if first I perform a bisection to find the commit that is causing my system to not reboot.  Then if that bad commit is in the set of commits for the bisection test for this ticket I revert it.  Is that a reasonable thing to do?  Is there another approach that you think might work?

Comment 15 Justin M. Forbes 2019-08-20 17:45:37 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.

Fedora 29 has now been rebased to 5.2.9-100.fc29.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 30, and are still experiencing this issue, please change the version to Fedora 30.

If you experience different issues, please open a new bug report for those.

Comment 16 Justin M. Forbes 2019-09-17 20:12:03 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.


Note You need to log in before you can comment on or make changes to this bug.