Bug 2224859

Summary: kernel-6.4.4-200 loses network adapter
Product: [Fedora] Fedora Reporter: Michael Riss <Michael.Riss>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED COMPLETED QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 38CC: acaringi, adscvr, airlied, alciregi, bskeggs, hdegoede, hpa, jarodwilson, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, paul.0000.black, ptalbert, steved, toby
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-04 00:39:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg output of the first NIC loss
none
dmesg output of the second NIC loss
none
lspci topology
none
lspci details none

Description Michael Riss 2023-07-23 16:36:05 UTC
Today I made the update to kernel-6.4.4-200 and so far I have already lost my network adapter twice.

Symptoms: 
- Network traffic through the ethernet adapter stops
- `ip a` still shows the network adapter as if there was no problem
- dmesg shows a stack trace (I will attach both)
- restarting with `nmcli networking off; nmcli networking on` does not work
- after a reboot everything works again as usual ... for a while

Hardware Info (if needed): It's an Acer Aspire VN7-593G and I will see that I attach lspci output for the network adapter.

Is there anything else I can provide?

Reproducible: Sometimes

Actual Results:  
Network traffic sporadically stops.

Expected Results:  
Network traffic should work continuously.

Comment 1 Michael Riss 2023-07-23 16:37:43 UTC
Created attachment 1977146 [details]
dmesg output of the first NIC loss

Comment 2 Michael Riss 2023-07-23 16:38:33 UTC
Created attachment 1977147 [details]
dmesg output of the second NIC loss

Comment 3 Michael Riss 2023-07-23 16:39:33 UTC
Created attachment 1977148 [details]
lspci topology

Comment 4 Michael Riss 2023-07-23 16:42:03 UTC
Created attachment 1977149 [details]
lspci details

Comment 5 Paul Black 2023-07-25 06:43:27 UTC
I've had the same (twice now).

Looks like a similar ethernet device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller

Comment 6 Michael Riss 2023-07-25 19:44:09 UTC
Yes, same adapter.
I now have the network loss about every hour, but irregular. Sometimes network is gone directly after system start, sometimes the network works for 2-3h before the loss occurs.

Comment 7 Toby Ovod-Everett 2023-07-26 13:58:29 UTC
I encountered the same issue and my initial attempt to identify a bug failed to find this case.

I believe 2225388 (https://bugzilla.redhat.com/show_bug.cgi?id=2225388) is the same issue.

Comment 8 Toby Ovod-Everett 2023-07-26 13:59:39 UTC
*** Bug 2225388 has been marked as a duplicate of this bug. ***

Comment 9 Michael Riss 2023-07-28 12:31:41 UTC
The problem unfortunately still persists with kernel-6.4.6-200.fc38.x86_64.

Comment 10 Michael Riss 2023-07-29 16:36:45 UTC
I may have good news: I did some searching on the linux kernel bugzilla and this bug looks like what we have here: https://bugzilla.kernel.org/show_bug.cgi?id=217596

TL;DL: The active state power management of PCIe devices got an update with kernel 6.4 and this seems to negatively affect the r8169 driver.
They managed to fix the problem by reverting the changes in kernel version 6.4.7. Fortunately there is already a Fedora test kernel for kernel 6.4.7. I tested it this afternoon and haven't seen the problem anymore. Of course it's possible that the problem still shows up after 7-8h, but I want to share it with the ones also suffering from this in the hope that it might give them some relief as well.

I made the kernel upgrade to the testing kernel with:

dnf upgrade --enablerepo updates-testing kernel kernel-devel kernel-modules-extra

Good luck!

Comment 11 Michael Riss 2023-07-31 15:28:37 UTC
The network adapter remains stable on my machine with the 6.4.7 kernel also the third day in a row. I think this is the fix.

Comment 12 Michael Riss 2023-08-04 00:39:59 UTC
Seeing that kernel 6.4.7 has reached the regular updates I'm closing this bug as resolved.

Comment 13 Toby Ovod-Everett 2023-08-07 13:22:03 UTC
I switched from running kernel 6.3.12 to 6.4.7 on Fri Aug 4 (I was off grid for 5 days and didn't want to switch immediately prior) and have not had a single issue during the past 57 hours, so I concur with Michael Riss that the bug is resolved.