Bug 1914631
Summary: | RTL8111/8168/8411 ethernet controller doesn't connect on fresh install | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | monsterjamp23 |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 33 | CC: | acaringi, adscvr, airlied, alciregi, bskeggs, hdegoede, hkallweit1, itamar, jarodwilson, jeremy, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, mhjacks, negativo17, ptalbert, steved |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-11-30 17:42:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
monsterjamp23
2021-01-10 12:40:09 UTC
Well, you provide no details at all about the issue you're facing. At first attach a full dmesg. This would also give an idea which chip version you have. You have the Realtek NIC firmware installed on the system? What's the exact issue? Driver not loaded properly? Driver loaded and network brought up, but no link detected? What you refer to at the end is that RTL8125B support was added in kernel version 5.9. This should be completely unrelated, unless you have a RTL8125B. My bad, I wasn't sure what info would be relevant. The issue is that when booting, NetworkManager-wait-online.service stalls the boot process and ultimately ends up failing. When logged in it constantly tries to connect to the network via ethernet but stalls at "Setting network address" then disconnects and tries to connect again endlessly. From what I read this seems to be caused by r8169 drivers causing issues for RTL8168 devices. I believe blacklisting the r8169 drivers solves the issue but I also installed the latest r8168 drivers from realtek's website. I believe the issue is the driver is not being loaded properly. This issue is present during a fresh install and in the live usb installation. I don't have Realtek NIC firmware installed. And yes you're right, the RTL8125B is unrelated. I was wondering if maybe it was related in some way since the user on the thread seems to have a very similar issue. Also worth mentioning is that this is for Fedora 33 (KDE spin) which comes with kernel 5.8. Since fixing the issue and updating the system I am now on kernel 5.9. Full dmesg (after fix): https://paste.centos.org/view/54a6d98f Here's snippet from journalctl at startup before I fixed the issue: > Jan 10 03:56:07 localhost.localdomain abrt-notification[1343]: System encountered a non-fatal error in pfifo_fast_enqueue() > Jan 10 03:56:08 localhost.localdomain abrt-dump-journal-oops[918]: Reported 1 kernel oopses to Abrt > Jan 10 03:56:10 localhost.localdomain kernel: r8169 0000:04:00.0 enp4s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). > Jan 10 03:56:12 localhost.localdomain systemd[1]: NetworkManager-dispatcher.service: Succeeded. > Jan 10 03:56:12 localhost.localdomain audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/syst> > Jan 10 03:56:16 localhost.localdomain kernel: r8169 0000:04:00.0 enp4s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). > Jan 10 03:56:22 localhost.localdomain kernel: r8169 0000:04:00.0 enp4s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). > Jan 10 03:56:27 localhost.localdomain systemd[1]: NetworkManager-wait-online.service: Main process exited, code=exited, status=1/FAILURE > Jan 10 03:56:27 localhost.localdomain systemd[1]: NetworkManager-wait-online.service: Failed with result 'exit-code'. > Jan 10 03:56:27 localhost.localdomain systemd[1]: Failed to start Network Manager Wait Online. > Jan 10 03:56:27 localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-wait-online comm="systemd" exe="/usr/lib/sy> > Jan 10 03:56:27 localhost.localdomain systemd[1]: Reached target Network is Online. I believe this is the loop of attempting to connect, disconnecting, and then trying to connect again: > Jan 10 04:00:51 localhost.localdomain NetworkManager[895]: <warn> [1610272851.7216] dhcp4 (enp4s0): request timed out > Jan 10 04:00:51 localhost.localdomain NetworkManager[895]: <info> [1610272851.7219] dhcp4 (enp4s0): state changed unknown -> timeout > Jan 10 04:00:51 localhost.localdomain NetworkManager[895]: <info> [1610272851.7219] device (enp4s0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed') > Jan 10 04:00:51 localhost.localdomain NetworkManager[895]: <info> [1610272851.7223] manager: NetworkManager state is now DISCONNECTED > Jan 10 04:00:51 localhost.localdomain NetworkManager[895]: <warn> [1610272851.7232] device (enp4s0): Activation: failed for connection 'Wired connection 1' > Jan 10 04:00:51 localhost.localdomain systemd[1]: dbus-:1.10-org.kde.powerdevil.discretegpuhelper: Succeeded. > Jan 10 04:00:51 localhost.localdomain audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-:1.10-org.kde.powerdevil.discretegpuhelper@0 comm="sys> > Jan 10 04:00:51 localhost.localdomain NetworkManager[895]: <info> [1610272851.7234] device (enp4s0): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed') > Jan 10 04:00:51 localhost.localdomain avahi-daemon[787]: Withdrawing address record for fe80::82ba:9636:f9a3:615c on enp4s0. > Jan 10 04:00:51 localhost.localdomain avahi-daemon[787]: Leaving mDNS multicast group on interface enp4s0.IPv6 with address fe80::82ba:9636:f9a3:615c. > Jan 10 04:00:51 localhost.localdomain avahi-daemon[787]: Interface enp4s0.IPv6 no longer relevant for mDNS. > Jan 10 04:00:51 localhost.localdomain systemd[1]: dbus-:1.10-org.kde.powerdevil.backlighthelper: Succeeded. > Jan 10 04:00:51 localhost.localdomain audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-:1.10-org.kde.powerdevil.backlighthelper@0 comm="syste> > Jan 10 04:00:51 localhost.localdomain NetworkManager[895]: <info> [1610272851.7366] dhcp4 (enp4s0): canceled DHCP transaction > Jan 10 04:00:51 localhost.localdomain NetworkManager[895]: <info> [1610272851.7366] dhcp4 (enp4s0): state changed timeout -> done > Jan 10 04:00:51 localhost.localdomain kernel: r8169 0000:04:00.0 enp4s0: Link is Down > Jan 10 04:00:51 localhost.localdomain NetworkManager[895]: <info> [1610272851.7400] policy: auto-activating connection 'Wired connection 1' (2961c11f-dc05-30f4-8098-e4c2f013791e) > Jan 10 04:00:51 localhost.localdomain akonadi_followupreminder_agent[1623]: "No such interface “org.freedesktop.DBus.Properties” on object at path /org/freedesktop/NetworkManager/ActiveConnection/1" > Jan 10 04:00:51 localhost.localdomain akonadi_maildispatcher_agent[1630]: "No such interface “org.freedesktop.DBus.Properties” on object at path /org/freedesktop/NetworkManager/ActiveConnection/1" > Jan 10 04:00:51 localhost.localdomain NetworkManager[895]: <info> [1610272851.7411] device (enp4s0): state change: disconnected -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed') > Jan 10 04:00:51 localhost.localdomain NetworkManager[895]: <info> [1610272851.7414] manager: startup complete sudo lspci -nnvs 04:00.0 (after fix) > 04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0c) > Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet [1458:e000] > Flags: bus master, fast devsel, latency 0, IRQ 59 > I/O ports at f000 [size=256] > Memory at fcb00000 (64-bit, non-prefetchable) [size=4K] > Memory at 7ff0900000 (64-bit, prefetchable) [size=16K] > Capabilities: [40] Power Management version 3 > Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Capabilities: [70] Express Endpoint, MSI 01 > Capabilities: [b0] MSI-X: Enable- Count=4 Masked- > Capabilities: [d0] Vital Product Data > Capabilities: [100] Advanced Error Reporting > Capabilities: [140] Virtual Channel > Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00 > Capabilities: [170] Latency Tolerance Reporting > Kernel driver in use: r8168 > Kernel modules: r8169, r8168 Since the issue still exists on the live usb, I could provide additional information if needed by booting the installation media. RTL8168 family consists of ~ 50 chip versions. Rev 0c (RTL8168g) typically makes no problems and runs rocksolid. However for other chip versions we've seen that there are buggy BIOSes out there. What's needed is a full dmesg with the r8169 driver. And especially NetworkManager also sometimes has bugs, therefore for such tests it can make sense to configure the network manually. If you don't have the firmware installed, then do so, as it's meant to solve compatibility issues. Then provide the "ethtool -i <if>" output to verify that the firmware is loaded. You could also test with latest linux-next, you can get it from kernel.org git. It seems others with the same board don't face the problem, see e.g. here: http://linux-hardware.org/index.php?probe=c355efc65f&log=dmesg One difference is that they use boot command line parameter: amd_iommu=on You could re-test with this parameter set. This is not a DKMS issue anyway, the driver where you report issue is in the kernel. Reassigning component. I unblacklisted r8169 drivers then restarted the system. I can confirm that the issue still exists on kernel 5.9. I tried using the boot command line parameter: amd_iommu=on and it didn't make a difference. Here's the dmesg log where r8169 drivers are attempted to be loaded but fails: https://paste.centos.org/view/5d700a3a Also by Realtek NIC software do you mean the drivers found here (https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-pci-express-software)? In that case that is what solves the issue. Here is ethtool after the proper drivers are installed: > driver: r8168 > version: 8.048.03-NAPI > firmware-version: > expansion-rom-version: > bus-info: 0000:04:00.0 > supports-statistics: yes > supports-test: no > supports-eeprom-access: no > supports-register-dump: yes > supports-priv-flags: no I will later test with latest kernel. Good news, r8169 driver seems to be able to handle r8168g in kernel 5.10.6.
$ ethtool -i enp4s0
> driver: r8169
> version: 5.10.6-200.fc33.x86_64
> firmware-version: rtl8168g-2_0.0.1 02/06/13
> expansion-rom-version:
> bus-info: 0000:04:00.0
> supports-statistics: yes
> supports-test: no
> supports-eeprom-access: no
> supports-register-dump: yes
> supports-priv-flags: no
This kernel is in the updates channel so I guess this issue is mostly resolved. However it does pose a problem when installing Fedora 33 as I believe the installation media comes packaged with kernel 5.8.
(In reply to monsterjamp23 from comment #6) > I unblacklisted r8169 drivers then restarted the system. I can confirm that > the issue still exists on kernel 5.9. I tried using the boot command line > parameter: amd_iommu=on and it didn't make a difference. Here's the dmesg > log where r8169 drivers are attempted to be loaded but fails: > https://paste.centos.org/view/5d700a3a > OK, thanks for testing. > Also by Realtek NIC software do you mean the drivers found here > (https://www.realtek.com/en/component/zoo/category/network-interface- > controllers-10-100-1000m-gigabit-ethernet-pci-express-software)? In that > case that is what solves the issue. > I meant the firmware from linux-firmware package. Your next comment shows that it's installed. firmware-version: rtl8168g-2_0.0.1 02/06/13 (In reply to monsterjamp23 from comment #7) > Good news, r8169 driver seems to be able to handle r8168g in kernel 5.10.6. > Good to know, thanks for the feedback. > $ ethtool -i enp4s0 > > driver: r8169 > > version: 5.10.6-200.fc33.x86_64 > > firmware-version: rtl8168g-2_0.0.1 02/06/13 > > expansion-rom-version: > > bus-info: 0000:04:00.0 > > supports-statistics: yes > > supports-test: no > > supports-eeprom-access: no > > supports-register-dump: yes > > supports-priv-flags: no > > > This kernel is in the updates channel so I guess this issue is mostly > resolved. However it does pose a problem when installing Fedora 33 as I > believe the installation media comes packaged with kernel 5.8. Hard to tell what may have caused the issue based on diff between 5.9 and 5.10. Especially as I can't reproduce the issue on my system with the same chip version. Something board/BIOS-specific may be involved. r8168 vendor driver has quite some undocumented magic that may help to work around such board-specific issues. This message is a reminder that Fedora 33 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '33'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 33 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 33 changed to end-of-life (EOL) status on 2021-11-30. Fedora 33 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |