Bug 1658623 - system fails to boot with kernel 4.19.8-300.fc29.x86_64
Summary: system fails to boot with kernel 4.19.8-300.fc29.x86_64
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 29
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-12 14:39 UTC by Steven Usdansky
Modified: 2019-01-30 15:13 UTC (History)
20 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-01-30 15:13:35 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Good (current) boot (205.44 KB, text/plain)
2018-12-12 14:39 UTC, Steven Usdansky
no flags Details
failed boot with kernel 4.19.8 (90.24 KB, text/plain)
2018-12-12 14:44 UTC, Steven Usdansky
no flags Details
Log of successful boot of Manjaro 4.19.6-1 kernel for comparison (199.76 KB, text/plain)
2018-12-12 15:00 UTC, Steven Usdansky
no flags Details
failed boot of kernel-4.19.8 20181213a (89.64 KB, text/plain)
2018-12-13 15:54 UTC, Steven Usdansky
no flags Details
failed boot of kernel-4.19.8 20181213b (88.74 KB, text/plain)
2018-12-13 15:54 UTC, Steven Usdansky
no flags Details
failed boot of kernel-4.19.8 screenshot (4.81 MB, image/jpeg)
2018-12-13 16:21 UTC, Steven Usdansky
no flags Details
Log of failed boot of kernel-4.19.8 single (156.88 KB, text/plain)
2018-12-13 17:49 UTC, Steven Usdansky
no flags Details
journalctl -xb output for kernel-4.19.0-1.fc30.x86_64 (157.21 KB, text/plain)
2018-12-13 21:56 UTC, Steven Usdansky
no flags Details
successful boot 4.19.2-300 requires nolapic (192.68 KB, text/plain)
2018-12-15 01:20 UTC, Steven Usdansky
no flags Details
lsusb -v output requested in comment #33 (13.35 KB, text/plain)
2018-12-16 13:54 UTC, Steven Usdansky
no flags Details
dmesg output on hang when enabling networking 4.19.9 (1.65 MB, image/jpeg)
2018-12-18 15:30 UTC, Steven Usdansky
no flags Details

Description Steven Usdansky 2018-12-12 14:39:18 UTC
Created attachment 1513690 [details]
Good (current) boot

Description of problem:
system fails to boot with kernel 4.19.8-300.fc29.x86_64

Version-Release number of selected component (if applicable):
4.19.8-300.fc29.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Select kernel 4.19.8-300.fc29.x86_64 from grub menu
2. Sit back and wait for reboot or lockup


Actual results:
System typically reboots prior to showing login screen, but last time it locked up prior to that 

Expected results:
Normal boot, as what happens with kernel 4.18.17-300.fc29.x86_64

Additional info:
Possibly the same bug as 1648366 I reported for Rawhide, but now that kernel 4.19 is in Fedora 29, I installed and tried to boot using the latest kernel

Comment 1 Steven Usdansky 2018-12-12 14:44:08 UTC
Created attachment 1513691 [details]
failed boot with kernel 4.19.8

Noticed ACPI Core Revision 20180810 in boot log with kernel 4.18.8 vs. ACPI Revision 20180531 with kernel 4.18.17. Seems like this might be significant, but understanding boot logs is beyond my paygrade.

Comment 2 Steven Usdansky 2018-12-12 15:00:53 UTC
Created attachment 1513695 [details]
Log of successful boot of Manjaro 4.19.6-1 kernel for comparison

Comment 3 Steve 2018-12-13 02:07:05 UTC
Not sure if this is relevant, but could you confirm that you are booting Linux from an external USB drive?

boot-4.18.17-300.fc29.x86_64-1.log:
...
Dec 12 02:15:02 ... kernel: scsi 3:0:0:0: Direct-Access     TOSHIBA  External USB 3.0 5438 PQ: 0 ANSI: 6
...
Dec 12 02:15:05 ... systemd[1]: Mounting /sysroot...
Dec 12 02:15:05 ... kernel: EXT4-fs (sdb7): mounted filesystem with ordered data mode. Opts: (null)
...


For the record, this is your previous report:

Bug 1648366 - System locks up when booting any 4.19 or 4.20 kernel 


Tip: If you put the string "Bug nnnnnnn" in a Bugzilla comment, Bugzilla will automatically make a link to the bug report.

Comment 4 Steven Usdansky 2018-12-13 02:19:46 UTC
Yes, I am booting from an external USB drive. Have not been able to find the proper cables to hook up an internal HDD drive - one of the joys of an off-brand mini-PC.  The internal SSD is dedicated to Windows 10.

Comment 5 Steve 2018-12-13 03:02:25 UTC
(In reply to Steven Usdansky from comment #4)
> Yes, I am booting from an external USB drive. Have not been able to find the
> proper cables to hook up an internal HDD drive - one of the joys of an
> off-brand mini-PC.  The internal SSD is dedicated to Windows 10.

Thanks.

In comparing the two log files, I see that two vbox modules are loaded in the 4.18.17 log just after where the 4.19.8 log ends:

Dec 12 08:15:12 Arsenopyrite systemd-modules-load[509]: Inserted module 'vboxnetadp'
Dec 12 08:15:12 Arsenopyrite systemd-modules-load[509]: Inserted module 'vboxpci'

AFAICT, those are not Fedora kernel modules. Could you try disabling anything that is not part of the standard Fedora distro?

Comment 6 Steve 2018-12-13 03:15:47 UTC
For the record:

boot-4.19.8-300.fc29.x86_64-1.log:
...
Dec 12 02:13:11 Arsenopyrite kernel: vboxdrv: Successfully loaded version 5.2.22_RPMFusion (interface 0x00290001)
...

Comment 7 Steven Usdansky 2018-12-13 13:34:19 UTC
Blacklisting all VirtualBox modules and the Broadcom wl module does not seem to alter the behavior. Attempted multiple boots with kernel 4.19.8. Some resulted in system hangs; others in rebooting.

Comment 8 Steve 2018-12-13 15:17:24 UTC
(In reply to Steven Usdansky from comment #7)
> Blacklisting all VirtualBox modules and the Broadcom wl module does not seem
> to alter the behavior. Attempted multiple boots with kernel 4.19.8. Some
> resulted in system hangs; others in rebooting.

Thanks.

In the hang case:

1. What is on the display?
2. What is the disk light doing?
3. Does the caps-lock key toggle the caps-lock light on the keyboard?
4. Can you switch to a console (ctrl-alt-f2)?

In the reboot case:

1. Do you see any reboot messages on the display?

2. See if you can find the journalctl log. That may take some sleuthing:

Run "last | less" to get a list of timestamps.

To get a list of journalctl boot IDs and corresponding timestamps:

$ journalctl --list-boots

To view the journalctl log for a particular boot ID:

$ journalctl -b nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn # nnnn... is a boot ID from the list

NB: Your RTC is set to local time, so log files will show a time jump.

Comment 9 Steve 2018-12-13 15:42:40 UTC
(In reply to Steven Usdansky from comment #4)
> ... one of the joys of an off-brand mini-PC.  ...

Normally, the DMI shows the make and model of the motherboard, but that is missing:

Dec 12 02:14:56 Arsenopyrite kernel: DMI: Default string Default string/SKYBAY, BIOS 5.11 08/09/2016

For completeness, could you post the make and model of your mini-PC or motherboard?

Comment 10 Steven Usdansky 2018-12-13 15:52:59 UTC
Hang case:
1. Display typically but not always hangs a few lines after starting LightDM, normal white text on black background. Had one hang at the starting LightDM message wherein background was green and flickering. 
2. Disk light is flashing. Have tried it many times over the past few weeks with different 4.19 and 4.20 kernels when trying to run Rawhide - disk light will flash and system will hang all night.
3. Will check cap-locks key and report
4. Cannot switch to a console

Reboot case:
1. System seems ready to display the LightDM login screen but reboots instead
2. Looking, looking but no log
   boot kernel-4.18.17 - ensure things are good
   boot kernel 4.19.8
   if reboot:
      boot kernel-4.18.17 and look for journalctl boot log from the 4.19.8 boot
      log does not exist 

Will post additional logs from this morning's failed boots

Comment 11 Steven Usdansky 2018-12-13 15:54:03 UTC
Created attachment 1514097 [details]
failed boot of kernel-4.19.8 20181213a

Comment 12 Steven Usdansky 2018-12-13 15:54:37 UTC
Created attachment 1514098 [details]
failed boot of kernel-4.19.8 20181213b

Comment 13 Steve 2018-12-13 16:21:04 UTC
(In reply to Steven Usdansky from comment #10)
> ... a few lines after starting LightDM, normal white text on black background.

Thanks. If those are boot messages, could you attach a photograph of the display? The log files for the failure cases don't record any messages about lightdm.

Comment 14 Steven Usdansky 2018-12-13 16:21:58 UTC
Created attachment 1514108 [details]
failed boot of kernel-4.19.8 screenshot

Screenshot of hung 4.19.8 boot on a Fedora 29 installation with no VirtualBox and no Broadcom

Comment 15 Steven Usdansky 2018-12-13 16:26:47 UTC
There was no boot log found with journalctl -b for the hang depicted in the screenshot

Comment 16 Steve 2018-12-13 16:42:15 UTC
(In reply to Steven Usdansky from comment #7)
> Blacklisting all VirtualBox modules ...

Please *remove* the VirtualBox package:

# dnf remove VirtualBox\*

$ fgrep -i vbox fail_01.log
Dec 13 00:34:01 Arsenopyrite systemd-modules-load[207]: Failed to find module 'vboxdrv'
Dec 13 00:34:01 Arsenopyrite systemd-modules-load[207]: Failed to find module 'vboxnetflt'
Dec 13 00:34:01 Arsenopyrite systemd-modules-load[207]: Failed to find module 'vboxnetadp'
Dec 13 00:34:01 Arsenopyrite systemd-modules-load[207]: Failed to find module 'vboxpci'

Comment 17 Steven Usdansky 2018-12-13 17:49:39 UTC
Created attachment 1514149 [details]
Log of failed boot of kernel-4.19.8 single

Booted in single-user mode before removing VBox guest additions; attached log file is journalctl -xb output

Removed VBox guest additions (the only vbox rpm in this installation).
Booted normally
Boot still hung. Copying last few lines from a screenshot; everything above looks normal

  Starting Light Display Manager...
  Starting Hold until boot process finishes up...
  Started Deferred execution scheduler.
  Started Hostname Service
  Started Hostname Service.
  Starting Network Manager Script Dispatcher Service...
  IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready
  Generic PHY r8169-100:00: attached PHY driver [GenericPHY] (mii_bus:phy_addr=r8169-100:00, irg=IGNORE)

Comment 18 Steve 2018-12-13 19:18:41 UTC
(In reply to Steven Usdansky from comment #17)
...
> Boot still hung. Copying last few lines from a screenshot; everything above
> looks normal
...

Thanks for trying single user mode. That would seem to eliminate graphics entirely. Unfortunately, the boot logs aren't showing anything like error messages or kernel panics.

In looking over the kernel "Troubleshooting" guide[1], the last thing is "Bisecting the kernel". That is extremely time-consuming and requires a lot of processing power. However, you can do something easier by trying to reproduce the problem with earlier Fedora kernels, which can be found starting here:

https://bodhi.fedoraproject.org/updates/?packages=kernel&page=1

Click the "Builds" tab to get a link to RPM packages:

AFAICT, the last F29 4.18 kernel is kernel-4.18.18-300.fc29:
https://bodhi.fedoraproject.org/updates/FEDORA-2018-5454a04a74

And the first F29 4.19 kernel is kernel-4.19.2-300.fc29:
https://bodhi.fedoraproject.org/updates/FEDORA-2018-f55c305488

[1] https://docs.fedoraproject.org/en-US/quick-docs/kernel/troubleshooting/index.html

Comment 19 Steve 2018-12-13 19:32:31 UTC
Two other things to look into:

1. Updating the BIOS.
2. Installing to the SSD in a small test partition (you may need to shrink a Windows partition to make room). IIUC, you have only tested with USB drives.

Comment 20 Steven Usdansky 2018-12-13 21:56:53 UTC
Created attachment 1514229 [details]
journalctl -xb output for kernel-4.19.0-1.fc30.x86_64

Attached log for failed boot with kernel-4.19.0-1 from Fedora-LXDE-Live-x86_64-Rawhide-20181030.n.0.iso (earliest 4.19 kernel I could find with a quick search on a live iso that I could actually boot and login with). Result with F29 kernel-4.19.2 My best guess is the problem lies with:
   ACPI: Core revision 20180810  
but would need an earlier kernel to verify this. I see kernel-4.18.17 has
   ACPI: Core revision 20180531
  
Have looked for a BIOS update long before kernel 4.19; none available from vendor. Computer was purchased on Amazon in June, 2017:

XCY Mini PC 6th Gen Core i3-6100U 2.3 GHz Dual Core Four Threads Intel HD Graphics Supporting windows 8/10/Linux /Ubuntu System With VGA +HDMI And LAN(8G RAM 128G SSD Windows 10) 
 
/$ inxi -F -c0
System:    Host: Arsenopyrite Kernel: 4.18.16-300.fc29.x86_64 x86_64 bits: 64 Desktop: MATE 1.20.3 
           Distro: Fedora release 29 (Twenty Nine) 
Machine:   Type: Laptop Mobo: INTEL model: SKYBAY serial: <root required> UEFI: American Megatrends v: 5.11 date: 08/09/2016 
CPU:       Topology: Dual Core model: Intel Core i3-6100U bits: 64 type: MT MCP L2 cache: 3072 KiB 
           Speed: 2254 MHz min/max: 400/2300 MHz Core speeds (MHz): 1: 500 2: 500 3: 500 4: 500 
Graphics:  Device-1: Intel Skylake GT2 [HD Graphics 520] driver: i915 v: kernel 
           Display: x11 server: Fedora Project X.org 1.20.3 driver: modesetting unloaded: fbdev,vesa 
           resolution: 1920x1080~60Hz 
           OpenGL: renderer: Mesa DRI Intel HD Graphics 520 (Skylake GT2) v: 4.5 Mesa 18.2.6 
Audio:     Device-1: Intel Sunrise Point-LP HD Audio driver: snd_hda_intel 
           Sound Server: ALSA v: k4.18.16-300.fc29.x86_64 
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169 
           IF: enp1s0 state: up speed: 1000 Mbps duplex: full mac: 70:e0:4f:68:0a:a5 
           Device-2: Broadcom and subsidiaries BCM43227 802.11b/g/n driver: N/A 
Drives:    Local Storage: total: 1.93 TiB used: 767.89 GiB (38.8%) 
           ID-1: /dev/sda vendor: JingX model: 120G SSD size: 118.00 GiB 
           ID-2: /dev/sdb type: USB vendor: Toshiba model: External USB 3.0 size: 1.82 TiB 
Partition: ID-1: / size: 19.56 GiB used: 7.90 GiB (40.4%) fs: ext4 dev: /dev/sdb13 
Sensors:   System Temperatures: cpu: 38.5 C mobo: N/A 
           Fan Speeds (RPM): N/A 
Info:      Processes: 189 Uptime: 16m Memory: 7.70 GiB used: 986.0 MiB (12.5%) Shell: bash inxi: 3.0.28

Comment 21 Steve 2018-12-14 04:31:45 UTC
Could you check this?

3. Does the caps-lock key toggle the caps-lock light on the keyboard?


I haven't been able to find any recent bug reports that clearly match what you are seeing. I've tried searches with variants of "kernel boot hang skylake":

* google.com
* bugzilla.redhat.com
* bugzilla.kernel.org
* bugs.freedesktop.org


> ... Mobo: INTEL model: SKYBAY ...

Thanks for posting that. After Googling, I found several references to "SKYBAY", but, AFAICT, "SKYBAY" is not an official Intel code name:

Intel Processors and Chipsets by Platform Code Name
https://www.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/platform-codenames.html

Comment 22 Steven Usdansky 2018-12-14 12:16:23 UTC
Caps-lock key does not toggle the caps-lock light on the keyboard when the system locks up.

Comment 23 Steve 2018-12-14 15:47:57 UTC
(In reply to Steven Usdansky from comment #22)
> Caps-lock key does not toggle the caps-lock light on the keyboard when the
> system locks up.

OK. It sounds like the "SysRq" methodology described in the kernel troubleshooting guide[1] won't work, since it requires a working keyboard.

The guide suggests booting with "nmi_watchdog=1" on the kernel command line. That "will cause a panic when an NMI watchdog timeout occurs." 

[1] https://docs.fedoraproject.org/en-US/quick-docs/kernel/troubleshooting/index.html

Comment 24 Steve 2018-12-14 17:53:43 UTC
(In reply to Steve from comment #23)
...
> The guide suggests booting with "nmi_watchdog=1" on the kernel command line.
> That "will cause a panic when an NMI watchdog timeout occurs." 
...

I see you already tried that in Bug 1648366, Attachment 1503856 [details]:

Nov 09 07:25:37 Arsenopyrite kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-4.20.0-0.rc1.git1.2.fc30.x86_64 root=UUID=a8551686-2465-4ba5-9633-34e7b9fa579d ro selinux=0 LANG=en_US.UTF-8 rd.luks=0 rd.lvm=0 rd.md=0 rd.dm=0 initcall_debug sysrq_always_enabled=1 nmi_watchdog=1

What does the display show? The log file ends with:

Nov 09 13:25:47 Arsenopyrite systemd[1]: Starting Flush Journal to Persistent Storage...
Nov 09 13:25:47 Arsenopyrite systemd-journald[565]: Time spent on flushing to /var is 27.790415s for 2396 entries.

Note, also, that the "flushing" time is very long. Compare with 33.857ms when running kernel 4.18.17:

$ grep -i flushing boot-4.18.17-300.fc29.x86_64.log
Dec 12 08:15:11 Arsenopyrite systemd-journald[505]: Time spent on flushing to /var is 33.857ms for 941 entries.

Comment 25 Steve 2018-12-14 18:27:31 UTC
$ grep error boot-4.18.17-300.fc29.x86_64.log
Dec 12 02:15:01 Arsenopyrite kernel: usb 2-4: device descriptor read/all, error -110

"USB error -110 means power exceeded, the host could not provide enough electric power for the pendrive to operate."[1]

Do you have another computer that you could try booting from the Toshiba external USB drive?

A powered USB hub might be another alternative, although hubs can cause delays that cause other problems ...

[1] device descriptor read/64, error -110
https://stackoverflow.com/questions/13653692/device-descriptor-read-64-error-110

Comment 26 Steven Usdansky 2018-12-14 19:23:55 UTC
Kernels 4.18 all seem to boot properly from the Toshiba on the problematic PC (XCY) PC. 

The Rawhide Live LXDE iso does not does not let me log in when booting on both the XCY computer (the problem computer) and my HP s3707c desktop (a circa 2014 PC). Apparently a problem with the iso.

The same flash drive I used with the Rawhide Live LXDE iso was used to install F29 on the XCY computer.

Were changes made to any 4.18 kernel to mitigate the effects of Spectre and Meltdown? If those changes were only applied to 4.19 and subsequent kernels, it's possible one or more of those changes is incompatible with the XCY's firmware or UEFI. Manjaro kernel 4.19.6-1 boots without issue from the Manjaro-installed /boot/efi/EFI/Manjaro bits and pieces. Any idea how the Manjaro 4.19 kernel differs from the Fedora 4.19 kernels?

Comment 27 Steven Usdansky 2018-12-15 01:20:07 UTC
Created attachment 1514544 [details]
successful boot 4.19.2-300 requires nolapic

Not sure what level of progress this represents, but I was able to boot kernel 4.19.2-300.fc29.x86_64 by adding nolapic to the kernel line in grub.cfg (see attached log). The same trick enabled me to boot Fedora-Mate_Compiz-Live-Rawhide-20181212.n.0.iso with kernel-4.20.0-0.rc6.git1.1.fc30.x86_64. 

But why have a multi-core processor if only one core is usable?

Comment 28 Steve 2018-12-15 05:30:39 UTC
(In reply to Steve from comment #25)
> $ grep error boot-4.18.17-300.fc29.x86_64.log
> Dec 12 02:15:01 Arsenopyrite kernel: usb 2-4: device descriptor read/all,
> error -110
> 
> "USB error -110 means power exceeded, the host could not provide enough
> electric power for the pendrive to operate."[1]
...

I should have been more skeptical ... :-)

#define	ETIMEDOUT	110	/* Connection timed out */
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/uapi/asm-generic/errno.h?h=v4.18.17#n93

USB Error codes
https://www.kernel.org/doc/html/v4.18/driver-api/usb/error-codes.html
(This doesn't have actual numbers, but it does have explanations.)

However, that doesn't mean there isn't a USB power issue. Check your BIOS settings for anything related to USB.

Comment 29 Steven Usdansky 2018-12-15 13:11:06 UTC
(In reply to Steven Usdansky from comment #26)

> Were changes made to any 4.18 kernel to mitigate the effects of Spectre and
> Meltdown? If those changes were only applied to 4.19 and subsequent kernels,
> it's possible one or more of those changes is incompatible with the XCY's
> firmware or UEFI. Manjaro kernel 4.19.6-1 boots without issue from the
> Manjaro-installed /boot/efi/EFI/Manjaro bits and pieces. Any idea how the
> Manjaro 4.19 kernel differs from the Fedora 4.19 kernels?

A bit of speculation:
Manjaro kernel-4.19.6-1 and the Fedora kernels I've tried at least as far back as 4.18.16 show the following in the boot logs:

  Spectre V2 : Mitigation: Full generic retpoline
  Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
  Spectre V2 : Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier
  Spectre V2 : Enabling Restricted Speculation for firmware calls

However... Fedora kernel 4.19-2, which only booted for me with kernel parameter nolapic, also includes the following in the boot log:

  Spectre V2 : Spectre v2 cross-process SMT mitigation: Enabling STIBP

Could this be related to the problem?

Comment 30 Steve 2018-12-15 16:07:17 UTC
(In reply to Steven Usdansky from comment #29)
...
> A bit of speculation:
...
>   Spectre V2 : Spectre v2 cross-process SMT mitigation: Enabling STIBP
...

To see what the kernel "thinks", run:

$ grep bugs /proc/cpuinfo

There are dozens of kernel parameters, including ones related to "spectre", "spec_store_bypass", and "l1tf":

The kernel’s command-line parameters
https://www.kernel.org/doc/html/v4.19/admin-guide/kernel-parameters.html

Comment 31 Steve 2018-12-15 19:54:27 UTC
(In reply to Steven Usdansky from comment #29)
...
> However... Fedora kernel 4.19-2, which only booted for me with kernel
> parameter nolapic, also includes the following in the boot log:
> 
>   Spectre V2 : Spectre v2 cross-process SMT mitigation: Enabling STIBP
...

That code was reverted in 4.19.4 because "there are major slowdowns involved". See this commit in the log:

2018-11-23	Revert "x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation"

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.19.4

Comment 32 Steve 2018-12-16 02:18:07 UTC
This was reported 2018-12-14:

Cannot boot into console on kernel 4.19.x
https://bbs.archlinux.org/viewtopic.php?id=242645

If you follow the "journalctl when hangs" link, you will find a log file that looks familiar: :-)

$ less -N pastebin-5Ru51X7t.log
...
      3 дек 13 20:34:13 i3main kernel: Linux version 4.19.8-arch1-1-ARCH (builduser@heftig-1129) ...
...
     40 дек 13 20:34:13 i3main kernel: DMI: Default string Default string/SKYBAY, BIOS 5.11 08/09/2016
...
    235 дек 13 20:34:13 i3main kernel: smpboot: CPU0: Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz (family: 0x6, model: 0x4e, stepping: 0x3)
...


For the record, I found that by doing a web search for "kernel 4.19 boot hang".

Comment 33 Steve 2018-12-16 02:50:01 UTC
Could you attach the output from:

# lsusb -v > lsusb-1.txt # run as root for this one

And post the output from:

# lsusb -t

Comment 34 Steven Usdansky 2018-12-16 13:54:34 UTC
Created attachment 1514855 [details]
lsusb -v output requested in comment #33

(In reply to Steve from comment #33)
> Could you attach the output from:
> 
> # lsusb -v > lsusb-1.txt # run as root for this one
> 
> And post the output from:
> 
> # lsusb -t

note: all lsusb output is from boot of kernel-4.18.16-300.fc29.x86_64

/home/a# lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
    |__ Port 4: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/12p, 480M
    |__ Port 5: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
    |__ Port 5: Dev 2, If 1, Class=Human Interface Device, Driver=usbhid, 1.5M
    |__ Port 6: Dev 3, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M

============================

re: comments #31 and #32

I did get a log when booting kernel 4.19.8-300.fc29.x86_64 with kernel parameters "random.trust_cpu=on nomodeset 3"  Both my boot log, and the Arch user's log in the link from #32 (also kernel 4.19.8) contain the line
   
Dec 16 01:03:13 Arsenopyrite kernel: Spectre V2 : User space: Mitigation: STIBP via seccomp and prctl

which does not appear in my Fedora boot log when booting kernel-4.18.16.fc29_x86_64 or in my Manjaro boot log when booting kernel 4.19.6-1-MANJARO

Comment 35 Steve 2018-12-16 14:52:51 UTC
(In reply to Steven Usdansky from comment #34)
> Created attachment 1514855 [details]
> lsusb -v output requested in comment #33
...

Thanks.

> I did get a log when booting kernel 4.19.8-300.fc29.x86_64 with kernel
> parameters "random.trust_cpu=on nomodeset 3"

OK. However, you have "selinux=0" on your kernel command line. That is a non-standard configuration. Please remove it for all future tests.

After removing it, verify that selinux is enabled and in "Enforcing" mode by running:

$ getenforce
Enforcing

Comment 36 Steve 2018-12-18 02:05:05 UTC
(In reply to Steve from comment #32)
> This was reported 2018-12-14:
> 
> Cannot boot into console on kernel 4.19.x
> https://bbs.archlinux.org/viewtopic.php?id=242645
...

Here is an update from Blitz67:

"I've prepared bootable USB stick with base system, it successfully booted, I got console with any keyboard.
System works until I start NetworkManager."

So here is a simple test to try:

Boot with a working kernel, disable networking (In Xfce, uncheck "Automatically connect to this network when it is available".). Do that for the wired and wireless connections. Reboot with the non-working kernel. If it boots, manually start networking.

Alternatively, in your BIOS, temporarily disable all networking -- both wired and wireless.

Comment 37 Steven Usdansky 2018-12-18 14:05:10 UTC
re comment #35:

For all three of my F29 installations (kernels 4.18.16 and 4.18.17) on the problematic XCY PC, F29 does not want to boot without selinux=0. Found this to be true with earlier 4.18 kernels, also.

re comment #36:

using Mate, but should be similar to Xfce
 Booted kernel-4.18.16
 Unchecked Enable Networking in panel applet
 Rebooted into kernel 4.19.9 (selinux=0 but did not add nolapic)
 Boot worked, was able to get to Mate Desktop. Progress!
 Ran nm-applet program to display networking applet in panel
 Checked  Enable Networking (using panel applet) 
Tried two times, with two different results
 a) System totally locked up; no keyboard response - had to use PC's power button to poweroff
 b) System rebooted itself

Comment 38 Steve 2018-12-18 14:44:33 UTC
(In reply to Steven Usdansky from comment #37)
> re comment #35:
> 
> For all three of my F29 installations (kernels 4.18.16 and 4.18.17) on the
> problematic XCY PC, F29 does not want to boot without selinux=0. Found this
> to be true with earlier 4.18 kernels, also.

Please open a new bug for that isssue.

> re comment #36:
> 
> using Mate, but should be similar to Xfce
>  Booted kernel-4.18.16
>  Unchecked Enable Networking in panel applet
>  Rebooted into kernel 4.19.9 (selinux=0 but did not add nolapic)
>  Boot worked, was able to get to Mate Desktop. Progress!
>  Ran nm-applet program to display networking applet in panel
>  Checked  Enable Networking (using panel applet) 
> Tried two times, with two different results
>  a) System totally locked up; no keyboard response - had to use PC's power
> button to poweroff
>  b) System rebooted itself

Excellent! We appear to have a reproducer.

Do the same thing again, but open a full-screen terminal window before enabling networking and run:

$ dmesg -w | tee dmesg-1.log

The file dmesg-1.log may not get written, so be prepared to take a photo of the display too, if you get a hang.

Comment 39 Steven Usdansky 2018-12-18 15:30:49 UTC
Created attachment 1515375 [details]
dmesg output on hang when enabling networking 4.19.9

Rebooted kernel-4.19.9 with networking disabled. Got the hangied to enable networking. Screenshot shows nothing was written to the file at that time. 

Tried the same with Rawhide 20181216 nightly; got a reboot with nothing written to the file

Comment 40 Steve 2018-12-18 16:17:20 UTC
(In reply to Steven Usdansky from comment #39)
> Created attachment 1515375 [details]
> dmesg output on hang when enabling networking 4.19.9
> 
> Rebooted kernel-4.19.9 with networking disabled. Got the hangied to enable
> networking. Screenshot shows nothing was written to the file at that time. 
> 
> Tried the same with Rawhide 20181216 nightly; got a reboot with nothing
> written to the file

Thanks. In looking at the attached boot-4.18.17-300.fc29.x86_64-1.log, you appear to have both the wired and wireless connections enabled.

Could you verify that the proprietary Broadcom wl driver has been removed and not just blacklisted (Comment 7)?

Also, please post the output from these commands (when booted from 4.18.16 and networking is enabled):

$ nmcli connection
$ nmcli device
$ ip link

Adding Thomas Haller to the CC list.

Thomas: We are getting a hang or reboot when networking is enabled with kernel 4.19.9. (See Comment 37 for the reproducer)
Could you suggest a way to proceed?

Comment 41 Steven Usdansky 2018-12-18 18:39:01 UTC
4.18.16 installation (on my computer, F29c) has no Broadcom wl or akmod rpms loaded.
From F29c: 
/etc/modprobe.d$ cat blacklist-b43.conf 
blacklist b43
blacklist bcma

~$ nmcli connection
NAME    UUID                                  TYPE      DEVICE 
enp1s0  448bff2d-8b2e-368b-b542-265a865ee2a4  ethernet  enp1s0 

~$ nmcli device
DEVICE  TYPE      STATE      CONNECTION 
enp1s0  ethernet  connected  enp1s0     
lo      loopback  unmanaged  --         

~$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 70:e0:4f:68:0a:a5 brd ff:ff:ff:ff:ff:ff
~$

Comment 42 Steve 2018-12-18 21:34:58 UTC
(In reply to Steven Usdansky from comment #41)
> 4.18.16 installation (on my computer, F29c) has no Broadcom wl or akmod rpms
> loaded.

The first attachment, "Good (current) boot", has a newer kernel, and F29c is not "/". You should be able to run both kernels from the same file system, so that you have the same networking configuration for both. Please clarify.

> From F29c: 
> /etc/modprobe.d$ cat blacklist-b43.conf 
> blacklist b43
> blacklist bcma

OK. Those are both Fedora kernel Broadcom modules:
$ modinfo b43
$ modinfo bcma

> ~$ nmcli connection
> NAME    UUID                                  TYPE      DEVICE 
> enp1s0  448bff2d-8b2e-368b-b542-265a865ee2a4  ethernet  enp1s0 
> 
> ~$ nmcli device
> DEVICE  TYPE      STATE      CONNECTION 
> enp1s0  ethernet  connected  enp1s0     
> lo      loopback  unmanaged  --         
> 
> ~$ ip link
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
> DEFAULT group default qlen 1000
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
> UP mode DEFAULT group default qlen 1000
>     link/ether 70:e0:4f:68:0a:a5 brd ff:ff:ff:ff:ff:ff
> ~$

OK, so wireless networking is completely disabled. If that is the networking configuration with kernel 4.19.9, then the problem appears to be with the Ethernet driver. nmcli doesn't show the driver in use, but the attached screenshot shows "r8169":

$ modinfo r8169 | fgrep description
description:    RealTek RTL-8169 Gigabit Ethernet driver

Further debugging will get me out of my depth. I suggest opening a new bug with your reproducer and relevant log files, because this bug is too chaotic for maintainers to read. I suggest a bug summary something like this:

"r8169: hang or reboot when networking is enabled with kernel 4.19.9 on Intel Skylake system"

Include a link to this bug in the new bug report.

Comment 43 Steve 2018-12-19 18:17:12 UTC
(In reply to Steve from comment #42)
> (In reply to Steven Usdansky from comment #41)

> ... I suggest opening a new bug with your reproducer ...

Thanks:

Bug 1660649 - r8169: hang or reboot when networking is enabled with kernel 4.19.9 on Intel Skylake system

Comment 44 Justin M. Forbes 2019-01-29 16:14:01 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.

Fedora 29 has now been rebased to 4.20.5-200.fc29.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 45 Steven Usdansky 2019-01-30 14:05:45 UTC
Bug persists, but please close this bug and refer to https://bugzilla.redhat.com/show_bug.cgi?id=1660649

Comment 46 Justin M. Forbes 2019-01-30 15:13:35 UTC
Thanks for letting us know.


Note You need to log in before you can comment on or make changes to this bug.