Bug 1562360 - strange messages after network connects/re-connects [NEEDINFO]
Summary: strange messages after network connects/re-connects
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 27
Hardware: x86_64
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-30 11:02 UTC by aaronsloman
Modified: 2018-08-29 15:22 UTC (History)
17 users (show)

Fixed In Version: kernel: 4.15.13-300.fc27.x86_64 #1 SMP Mon Mar 26 19:06:57 UTC 2018
Clone Of:
Environment:
Last Closed: 2018-08-29 15:22:35 UTC
Type: Bug
Embargoed:
jforbes: needinfo?


Attachments (Terms of Use)
tar file with dmesg about from pc and notebook showing warning messages (42.88 KB, text/tar.gz)
2018-04-01 15:06 UTC, aaronsloman
no flags Details

Description aaronsloman 2018-03-30 11:02:44 UTC
Description of problem:
Warning messages sometimes displayed in xterm windows after network starts or re-starts
(examples below).

Version-Release number of selected component (if applicable):

Kernel 4.15.12-301.fc27.x86_64 (and several earlier kernels)
NetworkManager-1.8.6-1.fc27.x86_64

How reproducible:
It is intermittent e.g. sometimes shown after suspend/resume or hibernate/resume and sometimes not.
Likewise sometimes after using nmcli to turn off network and then turn it on again. But not always.


Steps to Reproduce:
1. Stop NetworkManager
2. Restart Networkmanager
3. (alternatively start NM after reboot, or hibernate, or suspend)

Actual results on desktop PC with ethernet connection:

this is (sometimes) displayed in xterm windows.:

"Dazed and confused" "No irq handler" messages displayed on all open terminal windows. 
with additional information.

Corresponding lines in dmesg output after hibernate and resume:
 [  157.744416] Restarting tasks ... done.
 [  157.745904] PM: hibernation exit
 [  157.751759] IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready
 [  157.992101] IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready
 [  160.605765] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
 [  160.605820] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
 [  436.393267] Uhhuh. NMI received for unknown reason 21 on CPU 0.
 [  436.393268] Do you have a strange power saving mode enabled?
 [  436.393269] Dazed and confused, but trying to continue

The above comes from a desktop PC running F27

I have similar results (intermittently) on a Clevo W515LU.

Turn off wifi:
    nmcli networking off

Turn it on:
    nmcli networking on

Sometimes produces:
    Uhhuh. NMI received for unknown reason 21 on CPU 0.
    Message from syslogd@stone at Mar 21 20:56:18 ...
    kernel:do_IRQ: 3.36 No irq handler for vector

sometimes also
     Do you have a strange power saving mode enabled?
     Dazed and confused, but trying to continue

Despite the messages, the commands work.

Expected results:
   Network restarts without such messages

Additional info:

I can't tell whether there's one bug that produces slightly different manifestations, or two different bugs. I also can't tell whether this is a NetworkManager bug or something deeper, e.g. a kernel bug.

Comment 1 aaronsloman 2018-03-31 23:44:53 UTC
I don't know whether this is a coincidence, but less than 48 hours after I posted this bug report, I found that in the latest kernel 

  4.15.13-300.fc27.x86_64 #1 SMP Mon Mar 26 19:06:57 UTC 2018

just installed both on my Clevo notebook and my desktop PC, the bug reported here seems to have been fixed.

The messages are no longer posted in xterm windows, and the output of dmesg, on both machines, no longer includes:

 Uhhuh. NMI received for unknown reason 21 on CPU 0.
 Do you have a strange power saving mode enabled?
 Dazed and confused, but trying to continue

Neither is it produced by suspend+resume or use of nmcli to turn network off then on again.

Thanks very much!

Comment 2 aaronsloman 2018-04-01 15:06:20 UTC
Created attachment 1415852 [details]
tar file with dmesg about from pc and notebook showing warning messages

The tar file contains two files produced by dmesg, one on the PC 
dmesg-vig-4.15.13-300.txt

and one on the laptop
dmesg-stone-4.15.13-300.txt

In both of the files, search for XXX which indicates warning messages produced after hibernate resume, also displayed on all xterm windows. See comment added Apr 1 2018, for more information.

Comment 3 aaronsloman 2018-04-01 15:08:11 UTC
I wrote:
>   4.15.13-300.fc27.x86_64 #1 SMP Mon Mar 26 19:06:57 UTC 2018
> 
> just installed both on my Clevo notebook . and my desktop PC, the bug reported
> here seems to have been fixed.
> 
> The messages are no longer posted in xterm windows, and the output of dmesg,
> on both machines, no longer includes:
> 
>  Uhhuh. NMI received for unknown reason 21 on CPU 0.
>  Do you have a strange power saving mode enabled?
>  Dazed and confused, but trying to continue
> 
> Neither is it produced by suspend+resume or use of nmcli to turn network off
> then on again.

Unfortunately, I wrote too soon, because I had not tried hibernate+resume. This morning I found problem messages on both notebook (Clevo W515LU) and desktop (Viglen) screens after hibernate+resume:

On the PC:  Viglen DQ67SW/DQ67SW, connected by ethernet cable

   Message from syslogd@vig at Apr  1 09:47:55 ...
    kernel:Uhhuh. NMI received for unknown reason 21 on CPU 0.
 
   Message from syslogd@vig at Apr  1 09:47:55 ...
    kernel:Do you have a strange power saving mode enabled?
 
   Message from syslogd@vig at Apr  1 09:47:55 ...
    kernel:Dazed and confused, but trying to continue

On the Notebook(Clevo W515LU), connected by wifi, the above messages also appeared, but there was a different additional message after hibernate+resume:

   Message from syslogd@stone at Apr  1 09:54:54 ...
    kernel:do_IRQ: 3.36 No irq handler for vector

Are these two problems, one concerning ethernet and the other wifi or two manifestations of the same problem?

I attach a tar file dmesg-output-pc-and-notebook.tar.gz

Comment 4 aaronsloman 2018-04-15 13:47:00 UTC
Hibernate+resume problem seems to be fixed on the PC:  Viglen DQ67SW/DQ67SW.
Using kernel:
4.15.15-300.fc27.x86_64 #1 SMP Mon Apr 2 23:14:02 UTC 2018

It seems that on that machine I no longer get unwanted messages either after suspend+resume or hibernate+resume.

However, using the same kernel on the Notebook computer (Clevo W515LU) I still get unwanted text displayed in Xterm windows after resume from Hibernate, also in /var/log/messages at Apr 15 14:31:15:

(Context before the messages included)
Apr 15 14:31:01 stone systemd[1]: Mounted FUSE Control File System.
Apr 15 14:31:02 stone dbus-daemon[1678]: [session uid=1001 pid=1678] Activating service name='org.gnome.GConf' requested by ':1.6' (uid=1001 pid=1813 comm="/usr/lib64/firefox/firefox ")
Apr 15 14:31:02 stone dbus-daemon[1678]: [session uid=1001 pid=1678] Successfully activated service 'org.gnome.GConf'
Apr 15 14:31:15 stone kernel: Uhhuh. NMI received for unknown reason 3c on CPU 0.
Apr 15 14:31:15 stone kernel: Do you have a strange power saving mode enabled?
Apr 15 14:31:15 stone kernel: Dazed and confused, but trying to continue

In case it is relevant:
Processor: Intel. Celeron N3160 Processor (4cpu)
Processor Base Frequency 1.60 GHz Burst Frequency 2.24 GHz
Cache 2 MB L2

The desktop is an older machine with Intel Core i5, also with integrated graphics. It uses an ethernet connection.

Comment 5 aaronsloman 2018-04-17 22:08:27 UTC
(Addendum to comment #4)

I previously wrote:

> Hibernate+resume problem seems to be fixed on the PC:  Viglen DQ67SW/DQ67SW.
> Using kernel:
> 4.15.15-300.fc27.x86_64 #1 SMP Mon Apr 2 23:14:02 UTC 2018
> 
> It seems that on that machine I no longer get unwanted messages either after
> suspend+resume or hibernate+resume.
> 
> However, using the same kernel on the Notebook computer (Clevo W515LU) I
> still get unwanted text displayed in Xterm windows after resume from
> Hibernate, also in /var/log/messages at Apr 15 14:31:15:
====

I was wrong about the Viglen PC.
It still sometimes, but not always, produces these messages after hibernate + resume, e.g.

  Message from syslogd@vig at Apr 15 09:41:46 ...
   kernel:Uhhuh. NMI received for unknown reason 31 on CPU 0.

  Message from syslogd@vig at Apr 15 09:41:46 ...
   kernel:Do you have a strange power saving mode enabled?

  Message from syslogd@vig at Apr 15 09:41:46 ...
   kernel:Dazed and confused, but trying to continue

However, I don't seem to get these messages in ALL open xterm windows. I had them in small xterm windows used to launch firefox and thunderbird, but not in other xterm windows open across hibernate+resume.

Moreover, there was nothing of this sort in /var/log/messages, but it was recorded in dmesg output.

Here's an example, marked with XXX below, with some context before and after. It looks to me as if the message was generated during *hibernate* on the Viglen PC, not resume.

Still using 4.15.15-300.fc27.x86_64 #1 SMP Mon Apr 2 23:14:02 UTC 2018

after hibernate command:

 [  246.202900] PM: Creating hibernation image:
 [  246.406440] PM: Need to copy 319620 pages
 [  246.406443] PM: Normal pages needed: 319620 + 1024, available pages: 1755747
 [  246.203974] PM: Restoring platform NVS memory
 [  246.204490] Enabling non-boot CPUs ...
 [  246.204538] x86: Booting SMP configuration:
 [  246.204539] smpboot: Booting Node 0 Processor 1 APIC 0x2
 [  246.207786]  cache: parent cpu1 should not be sleeping
 [  246.207969] CPU1 is up
 [  246.207999] smpboot: Booting Node 0 Processor 2 APIC 0x4
 [  246.211195]  cache: parent cpu2 should not be sleeping
 [  246.211392] CPU2 is up
 [  246.211420] smpboot: Booting Node 0 Processor 3 APIC 0x6
 [  246.214628]  cache: parent cpu3 should not be sleeping
 [  246.214844] CPU3 is up
 [  246.218662] ACPI: Waking up from system sleep state S4
 [  246.278335] usb usb1: root hub lost power or was reset
 [  246.278474] usb usb2: root hub lost power or was reset
 [  246.278723] usb usb3: root hub lost power or was reset
 [  246.278724] usb usb4: root hub lost power or was reset
 [  246.281389] serial 00:04: activated
 [  246.282246] ehci-pci 0000:00:1a.0: cache line size of 64 is not supported
 [  246.282368] ehci-pci 0000:00:1d.0: cache line size of 64 is not supported
 [  246.297832] sd 2:0:0:0: [sdb] Starting disk
 [  246.301242] sd 0:0:0:0: [sda] Starting disk
 [  246.608576] usb 2-1: reset high-speed USB device number 2 using ehci-pci
 [  246.616604] usb 1-1: reset high-speed USB device number 2 using ehci-pci
 [  246.619722] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 [  246.619750] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 [  246.619776] ata6: SATA link down (SStatus 0 SControl 300)
 [  246.619797] ata5: SATA link down (SStatus 0 SControl 300)
 [  246.619819] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 [  246.620910] ata3.00: configured for UDMA/133
 [  246.621202] ata2.00: configured for UDMA/100
 [  246.622015] ata1.00: configured for UDMA/133
 [  246.838614] firewire_core 0000:04:03.0: rediscovered device fw0
 [  247.022604] usb 1-1.5: reset high-speed USB device number 4 using ehci-pci
 [  247.354601] usb 1-1.6: reset low-speed USB device [  248.159058] OOM killer enabled.
 [  248.159060] Restarting tasks ... done.
 [  248.160856] PM: hibernation exit
 [  248.164147] IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready
 [  248.406881] IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready
 [  251.019566] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
 [  251.019612] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
XXX
 [  548.796268] Uhhuh. NMI received for unknown reason 31 on CPU 0.
 [  548.796268] Do you have a strange power saving mode enabled?
 [  548.796269] Dazed and confused, but trying to continue

[[Time gap during hibernate]]

 [ 9466.682876] usb 3-1: new high-speed USB device number 2 using xhci_hcd
 [ 9466.811706] usb 3-1: New USB device found, idVendor=03f0, idProduct=c211
 [ 9466.811713] usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
 [ 9466.811717] usb 3-1: Product: Deskjet 2540 series
 [ 9466.811720] usb 3-1: Manufacturer: HP
 [ 9466.811723] usb 3-1: SerialNumber: CN51U573HY0604
 [ 9467.902180] usblp 3-1:1.1: usblp0: USB Bidirectional printer dev 2 if 1 alt 0 proto 2 vid 0x03F0 pid 0xC211
 [ 9467.902205] usbcore: registered new interface driver usblp
 [ 9473.130884] usblp0: removed
 [ 9473.136389] usblp 3-1:1.1: usblp0: USB Bidirectional printer dev 2 if 1 alt 0 proto 2 vid 0x03F0 pid 0xC211
 [ 9615.811865] usblp0: removed
 [16828.590124] usb 3-1: USB disconnect, device number 2
 [17084.855352] e1000e: eno1 NIC Link is Down
 [17085.049925] PM: hibernation entry
 [17085.050113] PM: Syncing filesystems ...
 [17085.182591] PM: done.
 number 5 using ehci-pci
 [  247.882599] usb 1-1.2: reset low-speed USB device number 3 using ehci-pci
 [  248.159056] PM: Basic memory bitmaps freed

===
I hope that's of some use. Let me know if there's anything else I should do to provide evidence.

The similar (?) fault on the Clevo notebook remains unchanged.

It shows this sort of thing in all open xterm windows after hibernate+resume:

   Message from syslogd@stone at Apr 16 18:41:11 ...
   kernel:do_IRQ: 3.36 No irq handler for vector

Also using kernel:
    4.15.15-300.fc27.x86_64 #1 SMP Mon Apr 2 23:14:02 UTC 2018

Comment 6 aaronsloman 2018-04-20 21:37:14 UTC
Yet another wrinkle. I thought I should check whether the differences noted here between notebook and desktop machines were simply due to use of cable on the desktop machine and wifi on the notebook. So I tried connecting the notebook to the router via cable.

I was not able to get NetworkManager to detect the cable, so I rebooted with the cable plugged in from the start. After boot (to level 3) the cable was detected, and worked as expected.

I then tried hibernate+resume on the notebook. After resume that it printed out this message:

  Message from syslogd@stone at Apr 20 21:54:40 ...
  kernel:do_IRQ: 3.36 No irq handler for vector

Moreover, after hibernate+resume the cable was no longer detected and I could reach the network *only* using wifi.

Moreover, if I again invoked hibernate+resume the longer message was printed out:

 stone125 %
 Message from syslogd@stone at Apr 20 21:57:20 ...
  kernel:Uhhuh. NMI received for unknown reason 3c on CPU 0.
 
 Message from syslogd@stone at Apr 20 21:57:20 ...
  kernel:Do you have a strange power saving mode enabled?
 
 Message from syslogd@stone at Apr 20 21:57:20 ...
  kernel:Dazed and confused, but trying to continue
 
And I was still unable to connect via cable, only using wifi.

So re-booted and again found that I could connect to both cable and wifi.
I then tried suspend, using the lid.

After that wifi was still available, but not ethernet. Attempting to bring ethernet back using 'ifup' produced 
  "Connection activation failed: No suitable device found for this connection"

So on the clevo W515LU ethernet is available only immediately after rebooting machine. If either suspend or hibernate occurs, ethernet becomes unavailable and only wifi is available.

On the desktop PC there's no wifi, and fortunately ethernet remains available after reboot, after hibernate, and after suspend, despite producing this:
    :Uhhuh. NMI received for unknown reason

The notebook has been upgraded to kernel 4.15.17-300, though that did not change anything I noticed.

The pc is still on 4.15.15-300 (both using fedora 27).

Summary: both hibernate and suspend reliably render the notebook incapable of using cable networking, until the next boot.

There's a deep flaw somewhere.

Comment 7 aaronsloman 2018-04-22 23:12:52 UTC
(Adding to comment #6)
> ....
> Summary: both hibernate and suspend reliably render the notebook incapable
> of using cable networking, until the next boot.
> ....

I have found an effective but clumsy workaround here that avoids the need to reboot:

https://askubuntu.com/questions/375077/network-devices-unmanaged-after-resume-from-hibernation-in-ubuntu-gnome-13-10

which includes:

> killall NetworkManager is a workaround you can use.

Also referenced here:
https://askubuntu.com/questions/348858/wifi-doesnt-work-after-hibernate-but-does-work-after-suspend?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

Based on the information there, I created a script, containing:

  #!/bin/bash
  killall NetworkManager
  systemctl suspend
  sleep 5
  systemctl start NetworkManager

It did not work without the sleep command. I have not experimented with different sleep parameters.

After waking from suspend invoked by that command 'ifconfig' shows both wifi and ethernet connections available.

Summary: after hibernate or suspend ethernet does not work at all on the Clevo W515LU notebook computer (also other machines, judging by the number of requests for help found on the internet).

In such cases, this sequence brings up wifi and (if cable connected) ethernet:

   killall NetworkManager
   systemctl suspend
   /* When suspended, press on/off button to resume*/
   (pause)
   systemctl start NetworkManager

It looks as if the code for suspend or hibernate if run with NetworkManager should do the equivalent of 'Killall NetworkManager' (before suspending or hybernating?), and after resuming restart NetworkManager.

Then users will not have to go through an additional suspend+resume to get NetworkManager working properly.

Should I put all this in a different bug report?

Is there a better fix, e.g. using /lib/systemd/system-sleep ?? I investigated, but could not get that option to work.

Comment 8 Justin M. Forbes 2018-07-23 15:31:52 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.

Fedora 27 has now been rebased to 4.17.7-100.fc27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28.

If you experience different issues, please open a new bug report for those.

Comment 9 Justin M. Forbes 2018-08-29 15:22:35 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 5 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.


Note You need to log in before you can comment on or make changes to this bug.