Bug 1547277 - [Kernel 4.15.3] Ethernet fails to reconnect on resume from suspend w/error do_IRQ: 5.37 No irq handler for vector
Summary: [Kernel 4.15.3] Ethernet fails to reconnect on resume from suspend w/error do...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 27
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-20 22:54 UTC by steelstring94
Modified: 2018-10-25 09:10 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-29 15:16:21 UTC


Attachments (Terms of Use)
Contains dmesg output for bugged suspend/resume and for normal (14.02 KB, text/x-csrc)
2018-02-20 22:54 UTC, steelstring94
no flags Details

Description steelstring94 2018-02-20 22:54:24 UTC
Created attachment 1398458 [details]
Contains dmesg output for bugged suspend/resume and for normal

Description of problem: On kernel 4.15.3, when you suspend the computer, after resuming, the ethernet will fail to reconnect.  This is confirmed kernel issue as booting with 4.14.18 solves the problem.  I am using Fedora KDE.


Version-Release number of selected component (if applicable): kernel-4.15.3-300.fc27.x86_64


How reproducible: Very, unsure if DE affects


Steps to Reproduce:
1. Boot with kernel 4.15.3 (Fedora KDE, unsure if desktop env. has effect)
2. Suspend the machine
3. Resume the machine

Actual results:  Ethernet fails to reconnect, notification appears of following error message:  do_IRQ: 5.37 No irq handler for vector



Expected results: Ethernet connection resumes upon logging back in


Additional info:

Comment 1 Laura Abbott 2018-02-20 23:00:39 UTC
This is best filed as a bug on bugzilla.kernel.org. Select Power Management and then hibernation/suspend as the component. If you can do a bisect to figure out which commit broke that would be helpful as well.

Comment 2 steelstring94 2018-02-20 23:29:33 UTC
(In reply to Laura Abbott from comment #1)
> This is best filed as a bug on bugzilla.kernel.org. Select Power Management
> and then hibernation/suspend as the component. If you can do a bisect to
> figure out which commit broke that would be helpful as well.

From https://www.kernel.org/doc/html/v4.10/admin-guide/reporting-bugs.html

"Please see https://www.kernel.org/ for a list of supported kernels. Any kernel marked with [EOL] is “end of life” and will not have any fixes backported to it.

If you’ve found a bug on a kernel version that isn’t listed on kernel.org, contact your Linux distribution or embedded vendor for support. Alternatively, you can attempt to run one of the supported stable or -rc kernels, and see if you can reproduce the bug on that. It’s preferable to reproduce the bug on the latest -rc kernel."

If you check kernel.org, you'll see 4.15.4 is listed, but not 4.15.3.  Additionally, on https://www.kernel.org/category/releases.html you can see the following quote:

"If you see anything at all after the dash, you are running a distribution kernel. Please use the support channels offered by your distribution vendor to obtain kernel support."

The "dash" referred to is in the output of uname -r.  Because of these reasons, I posted this bug here.  Posting to kernel.org was my initial thought.

Comment 3 Laura Abbott 2018-02-21 01:07:23 UTC
Yes, you did the right thing by reporting it here first. Part of the reason for telling users to report bugs to the distribution (like Fedora) is so they can do first level triage. Fedora isn't carrying any patches that would affect an issue like this so the next course of action is to report the issue to the upstream bugzilla since they are the ones who can actually fix the bug.

Comment 4 steelstring94 2018-02-21 01:16:56 UTC
(In reply to Laura Abbott from comment #3)
> Yes, you did the right thing by reporting it here first. Part of the reason
> for telling users to report bugs to the distribution (like Fedora) is so
> they can do first level triage. Fedora isn't carrying any patches that would
> affect an issue like this so the next course of action is to report the
> issue to the upstream bugzilla since they are the ones who can actually fix
> the bug.

Understood.  Bug reported here:  https://bugzilla.kernel.org/show_bug.cgi?id=198855

Comment 5 samoht0 2018-02-21 17:56:34 UTC
Had exactly that behavior on F26@4.15.3 with ath9k network driver yesterday. Error-message here:

do_IRQ: 1.41 No irq handler for vector

Strangely, resume from hibernation didn't cause the error and break network today. I'm keeping an eye on that.

Comment 6 samoht0 2018-02-23 16:29:49 UTC
Seems to be random. Today with 4.15.4 the Ethernet driver (r8169 is correct, I confused this) was broken after resume with

kernel: do_IRQ: 1.36 No irq handler for vector

r8169 module un- und re-load with modprobe locks up system. So hibernation became totally pointless with 4.15 here.

Comment 7 steelstring94 2018-02-23 16:31:51 UTC
samoht0, since this is apparently not the place for this, you should be making your posts here: https://bugzilla.kernel.org/show_bug.cgi?id=198855

Comment 8 steelstring94 2018-02-28 13:46:29 UTC
This is fixed in kernel 15.4.

Comment 9 steelstring94 2018-02-28 13:47:09 UTC
Sorry, 4.15.4.

Comment 10 steelstring94 2018-03-05 16:08:11 UTC
Re-opening this as it's now happening again since 4.15.6, but now, when I switch back to 4.15.4, it still happens.

Comment 11 samoht0 2018-04-13 11:20:27 UTC
As expected, totally ignored upstream. I've found some time to dig into it. It looks like, this is the issue and there's a fix candidate, which helped the reporter:

https://lkml.org/lkml/2018/4/3/136

@Laura: Can you do a 4.15.16 scratch build with the patch?
(IIRC, you said some time ago, that only patches with signed tag can be imported into Fedora Git...)

Comment 12 Laura Abbott 2018-04-13 17:36:16 UTC
That's supposed to be a debug patch, not a fix so I don't think it makes sense to pull it in. Honestly it sounds like a timing issue. Apparently reporter said it's working on Linus' tree. We're due to rebase to 4.16.x probably next week, can you test on that or rawhide to see if it's working there?

Comment 13 samoht0 2018-04-15 07:34:03 UTC
Thanks for your reply, Laura.
I agree, retesting on 4.16 looks like the best go. Will report the behavior here.

Comment 14 steelstring94 2018-04-20 02:18:27 UTC
I just suspended and resumed and it's not happening now.  This is really strange behavior.  Nearest thing I can figure is the kernel update to 4.15.17.

Comment 15 samoht0 2018-04-21 13:59:40 UTC
Had three clean resumes from hibernation on 4.15.17, looks good so far.
But would like to keep the bug open for some more testing.

Comment 16 samoht0 2018-04-29 11:05:43 UTC
Never happened with 4.15.17/18 and 4.16.4, so I'm OK with closing this as fixed in current kernel package.

Comment 17 samoht0 2018-05-07 16:55:10 UTC
WTF, this is *back* with 4.16.7, while resuming ethernet worked in 4.15.17 to 4.14.6.

Comment 18 Dmitry Valter 2018-05-09 01:14:09 UTC
It still is broken in 4.16.4-200.fc27 and 4.16.5-200.fc27

Comment 19 steelstring94 2018-05-12 12:26:02 UTC
This is happening again, 4.16.7.  How does a problem keep going away and coming back like this?

Comment 20 Jack Sitnikov 2018-06-09 17:29:59 UTC
The same problem in FC28
4.16.14-300.fc28.x86_64.
unload and reload network module(driver) it helps

Comment 21 appdevsw 2018-06-24 09:27:32 UTC
... and the name of the driver and command are: ?

Comment 22 appdevsw 2018-06-24 09:53:52 UTC
In my case:

modprobe -r r8169
modprobe    r8169

r8168 is the name of the module that I found using 
lshw -c network

look at driver=...

Comment 23 appdevsw 2018-06-29 04:59:27 UTC
My problem with the network started when I installed 4.17.2 kernel.
I tested it few hours and I went back to 4.14.18 , which I use every day.
And now I have this problem with 4.14.18 too.
So maybe it's not a problem of the 4.17 kernel, but the installation process?

Comment 24 samoht0 2018-07-22 12:03:14 UTC
(In reply to appdevsw from comment #23)
> My problem with the network started when I installed 4.17.2 kernel.

Mmmh. All others had it before and I didn't ever see it running 4.17.x and 4.14.x. I'm doubtful this is the same problem. The new irq handling code, which likely caused the reported issue, came in with 4.15. Maybe it's time to close this as current and open a new report (?)...

Or is there anybody facing the issue constantly with 4.15.x to 4.17.x?

Comment 25 Justin M. Forbes 2018-07-23 15:24:22 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.

Fedora 27 has now been rebased to 4.17.7-100.fc27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28.

If you experience different issues, please open a new bug report for those.

Comment 26 Justin M. Forbes 2018-08-29 15:16:21 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 5 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 27 Juergen Sievers 2018-10-23 14:47:50 UTC
Same problem back on F28  4.18.14-200.fc28.x86_64 :(

Comment 28 samoht0 2018-10-24 11:25:16 UTC
(In reply to Juergen Sievers from comment #27)
> Same problem back on F28  4.18.14-200.fc28.x86_64 :(

Not for me, and as I guessed above, those that didn't face it on 4.15/4.16 likely have a similar issue, but not the same. So, was it already present on 4.15/4.16?

Maybe opening another bug report is a preferable option.

Comment 29 Juergen Sievers 2018-10-24 18:06:16 UTC
I have now compiled and installed the module

Linux device driver released for RealTek RTL8168B/8111B, RTL8168C/8111C, RTL8168CP/8111CP, RTL8168D/8111D, RTL8168DP/8111DP, and RTL8168E/8111E Gigabit Ethernet controllers with PCI-Express interface.

<Requirements>

	- Kernel source tree (supported Linux kernel 2.6.x and 2.4.x)
	- For linux kernel 2.4.x, this driver supports 2.4.20 and latter.
	- Compiler/binutils for kernel compilation

The problem still exist.
The Connection comes up and drops sporadisch. After boot it takes over 15 minunst the link will be up first time and drops after a few minutes again. 

ifconfig tells me allways the device would be up but the static ip will not be assigned.


Okt 24 16:51:06 nadhh kernel: r8168: enp4s0: link down
Okt 24 16:51:06 nadhh NetworkManager[1485]: <info>  [1540392666.7274] device (enp4s0): state change: disconnected -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Okt 24 17:12:12 nadhh kernel: r8168: enp4s0: link up
Okt 24 17:12:12 nadhh NetworkManager[1485]: <info>  [1540393932.3696] device (enp4s0): carrier: link connected
Okt 24 17:12:12 nadhh NetworkManager[1485]: <info>  [1540393932.3709] device (enp4s0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Okt 24 17:12:16 nadhh kernel: r8168: enp4s0: link down
Okt 24 17:12:16 nadhh NetworkManager[1485]: <info>  [1540393936.4880] device (enp4s0): state change: disconnected -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Okt 24 17:14:03 nadhh NetworkManager[1485]: <info>  [1540394043.9853] device (enp4s0): carrier: link connected
Okt 24 17:14:03 nadhh kernel: r8168: enp4s0: link up
Okt 24 17:14:03 nadhh NetworkManager[1485]: <info>  [1540394043.9868] device (enp4s0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Okt 24 17:14:07 nadhh kernel: r8168: enp4s0: link down
Okt 24 17:14:07 nadhh NetworkManager[1485]: <info>  [1540394047.0794] device (enp4s0): state change: disconnected -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Okt 24 17:15:10 nadhh kernel: r8168: enp4s0: link up
Okt 24 17:15:10 nadhh NetworkManager[1485]: <info>  [1540394110.5466] device (enp4s0): carrier: link connected
Okt 24 17:15:10 nadhh NetworkManager[1485]: <info>  [1540394110.5478] device (enp4s0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Okt 24 17:15:12 nadhh kernel: r8168: enp4s0: link down
Okt 24 17:15:12 nadhh NetworkManager[1485]: <info>  [1540394112.6161] device (enp4s0): state change: disconnected -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Okt 24 17:19:22 nadhh sudo[29396]:  juergen : TTY=pts/1 ; PWD=/home/juergen ; USER=root ; COMMAND=/sbin/brctl addif virbr0 enp4s0
Okt 24 17:19:22 nadhh kernel: virbr0: port 3(enp4s0) entered blocking state
Okt 24 17:19:22 nadhh kernel: virbr0: port 3(enp4s0) entered disabled state
Okt 24 17:19:22 nadhh kernel: device enp4s0 entered promiscuous mode
Okt 24 17:19:22 nadhh audit: ANOM_PROMISCUOUS dev=enp4s0 prom=256 old_prom=0 auid=1000 uid=0 gid=0 ses=2
Okt 24 17:19:22 nadhh NetworkManager[1485]: <info>  [1540394362.3763] ifcfg-rh: add connection in-memory (7e1b1a77-210e-4e95-a13c-cd4b97105ede,"enp4s0")
Okt 24 17:19:22 nadhh NetworkManager[1485]: <info>  [1540394362.3777] device (enp4s0): state change: unavailable -> disconnected (reason 'connection-assumed', sys-iface-state: 'external')
Okt 24 17:19:22 nadhh NetworkManager[1485]: <info>  [1540394362.3797] device (enp4s0): Activation: starting connection 'enp4s0' (7e1b1a77-210e-4e95-a13c-cd4b97105ede)
Okt 24 17:19:22 nadhh NetworkManager[1485]: <info>  [1540394362.3846] device (enp4s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'external')
Okt 24 17:19:22 nadhh NetworkManager[1485]: <info>  [1540394362.3850] device (enp4s0): state change: prepare -> config (reason 'none', sys-iface-state: 'external')
Okt 24 17:19:22 nadhh NetworkManager[1485]: <info>  [1540394362.3852] device (enp4s0): state change: config -> ip-config (reason 'none', sys-iface-state: 'external')
Okt 24 17:19:22 nadhh NetworkManager[1485]: <info>  [1540394362.3853] device (virbr0): bridge port enp4s0 was attached
Okt 24 17:19:22 nadhh NetworkManager[1485]: <info>  [1540394362.3853] device (enp4s0): Activation: connection 'enp4s0' enslaved, continuing activation
Okt 24 17:19:22 nadhh NetworkManager[1485]: <info>  [1540394362.3854] device (enp4s0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'external')
Okt 24 17:19:22 nadhh nm-dispatcher[29400]: req:1 'pre-up' [enp4s0]: new request (1 scripts)
Okt 24 17:19:22 nadhh NetworkManager[1485]: <info>  [1540394362.4804] device (enp4s0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'external')
Okt 24 17:19:22 nadhh NetworkManager[1485]: <info>  [1540394362.4809] device (enp4s0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'external')
Okt 24 17:19:22 nadhh NetworkManager[1485]: <info>  [1540394362.5255] device (enp4s0): Activation: successful, device activated.
Okt 24 17:19:22 nadhh nm-dispatcher[29400]: req:2 'up' [enp4s0]: new request (6 scripts)
Okt 24 17:19:22 nadhh nm-dispatcher[29400]: req:2 'up' [enp4s0]: start running ordered scripts...
Okt 24 17:24:17 nadhh kernel: r8168: enp4s0: link up
Okt 24 17:24:17 nadhh kernel: virbr0: port 3(enp4s0) entered blocking state
Okt 24 17:24:17 nadhh kernel: virbr0: port 3(enp4s0) entered listening state
Okt 24 17:24:17 nadhh NetworkManager[1485]: <info>  [1540394657.3626] device (enp4s0): carrier: link connected
Okt 24 17:24:18 nadhh kernel: r8168: enp4s0: link down

Comment 30 samoht0 2018-10-25 09:10:12 UTC
This is definitely a different issue (unstable ethernet behavior).


Note You need to log in before you can comment on or make changes to this bug.