Bug 1313296 - boot fails on new kernel 4.4.2, module dw_dmac, on Intel NUC 5CPYH
boot fails on new kernel 4.4.2, module dw_dmac, on Intel NUC 5CPYH
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
23
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Andy Shevchenko
Fedora Extras Quality Assurance
:
: 1184273 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-01 05:59 EST by Alan Jenkins
Modified: 2016-09-23 16:58 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-09-23 16:58:46 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
boot log should show some details of hardware, taken on previous working kernel version (211.20 KB, text/plain)
2016-03-01 05:59 EST, Alan Jenkins
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Linux Kernel 101271 None None None 2016-08-10 10:49 EDT

  None (edit)
Description Alan Jenkins 2016-03-01 05:59:40 EST
Created attachment 1131874 [details]
boot log should show some details of hardware, taken on previous working kernel version

Description of problem:
Kernel upgrade causes boot failure, at the point dw_dmac module is loaded.

Version-Release number of selected component (if applicable):
4.4.2-301.fc23.x86_64

Version-Release number, last known good version:
4.3.5-300.fc23.x86_64

How reproducible: always


Steps to Reproduce:
1. Intel NUC model 5CPYH.  (Atom-class CPU, Braswell N3050)
2. Fedora
3. Upgrade and reboot into kernel 4.4.2

Actual results:
Boot hangs at the text console.
Flashing text cursor disappears
System stops responding e.g. to Numlock (LED does not change) and power button (systemd does not initiate clean shut down).

Expected results:
Successful system boot, as with earlier kernel.

Additional info:
Unfortunately there is nothing captured in /sys/fs/pstore.

I was able to show the hang happened after loading the dw_dmac module.  I could then reproduce it simply by running `modprobe dw_dmac` on this kernel version.

(Debug method: I booted to emergency mode.  Started udev and single-stepped `udev trigger`:  udevadm monitor& udevadm trigger -n| while read i; do echo $i; read</dev/tty; echo add>"$i"/uevent; done)
Comment 1 Alan Jenkins 2016-03-01 06:18:19 EST
This appears fixed in vanilla kernel 4.5.0-0.rc5.git0.2.vanilla.knurd.1.fc23.x86_64

(It also fixes another graphics issue I was having).

Suggested by https://bugzilla.kernel.org/show_bug.cgi?id=101271#c8
Found via https://bugs.launchpad.net/hwe-next/+bug/1501580
Comment 2 Andy Shevchenko 2016-03-01 06:51:14 EST
*** Bug 1184273 has been marked as a duplicate of this bug. ***
Comment 3 Alan Jenkins 2016-03-01 08:26:54 EST
Unfortunately I must withdraw endorsement of kernel 4.5-rc5-git-whatever.  A few minutes later it panicked with a backtrace in an interrupt, looks like the timer interrupt.  The screen was sprinkled with some green pixels.

*My system no longer boots, it displays nothing on the screen, not even the BIOS splash*.

Wonder if it's something like the Samsung laptops a few years back that bricked themselves on any kernel panic, due to pstore (https://mjg59.dreamwidth.org/22855.html).

The NUC in question was purchased + updated to the latest firmware version at around the start of this year.

I've tried unplugging it and leaving it for an hour, and it's still dead.  I haven't yet tried CMOS battery or other NUC-specific un-bricking method yet.

Anyone looking for testing feedback is welcome to offer methods to un-brick, or reports of newer firmware versions which are known to fix the bricking problem etc.  People with Intel email addresses particularly welcome :-P.  If I can't get it working, I'll have to send it back to Amazon for warranty.  And I won't want to do that more than once.
Comment 4 Andy Shevchenko 2016-03-01 09:25:38 EST
I would suggest to try to re-flash firmware and check.
Comment 5 Alan Jenkins 2016-03-01 11:31:04 EST
Thanks :).

I didn't need to actually re-flash the firmware.  I was able to enter recovery mode by removing the internal jumper, and then "reset the CMOS" (according to intel instructions) by entering "visual bios" and resetting the firmware settings to default.  That was enough to recover.  (You're also supposed to be able to "reset the CMOS" using the "power button menu", but I couldn't get that to work).

I've reset the firmware a second time, after triggering a second backtrace + bricking.  Different backtrace, different graphics fun.  It triggered during some combination of hotplugging the VGA monitor and switching VTs.  I've done similar messing around on previous kernels without backtraces.

This second, different backtrace was captured by journald (see below).  Actually at the same time gdm was starting, it seems like the VT switch caused it to segfault and restart.

I guess this still isn't very clear :(.  We'd hope -rc5 is clean of most independent bugs, so hopefully it's all related.  But it might not be related to dmac.

I think EFI pstore has been ripped out, so that's not what's causing the firmware failure.  Since 3.9 we had CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE, which fedora used.  Now I can't even find CONFIG_EFI_VARS_PSTORE in the configs for the installed kernels.  And `cat /sys/module/pstore/parameters/backend` shows `(null)`.  I just checked the code & am fairly sure that means there is no pstore backend.

Haven't really heard of anything else that would break the firmware config like this during a crash.


Mar 01 15:24:53 alan-nuc /usr/libexec/gdm-x-session[1916]: (II) input device 'ITE8713 CIR transceiver', /dev/input/event7 is a keyboard
Mar 01 15:24:53 alan-nuc kernel: general protection fault: 0000 [#1] SMP 
Mar 01 15:24:53 alan-nuc kernel: Modules linked in: cmac ebtable_filter ebtables ip6table_filter ip6_tables bnep vfat fat btrfs xor raid6_pq
 intel_rapl coretemp arc4 kvm_intel iwlmvm kvm mac80211 f2fs irqbypass crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support crc32c_int
el ghash_clmulni_intel iwlwifi snd_hda_codec_hdmi snd_intel_sst_acpi snd_intel_sst_core snd_soc_sst_mfld_platform cfg80211 btusb snd_hda_cod
ec_realtek snd_soc_rt5670 snd_soc_sst_match pcspkr btrtl snd_hda_codec_generic snd_soc_rl6231 snd_soc_core mei_txe snd_hda_intel snd_hda_cod
ec mei shpchp snd_compress snd_hda_core snd_hwdep lpc_ich i2c_i801 hci_uart snd_pcm_dmaengine btbcm ac97_bus btqca snd_seq btintel snd_seq_d
evice bluetooth snd_pcm ir_lirc_codec lirc_dev rfkill_gpio rc_rc6_mce ite_cir rc_core dw_dmac snd_timer rfkill dw_dmac_core snd
Mar 01 15:24:53 alan-nuc kernel:  soundcore i2c_designware_platform i2c_designware_core spi_pxa2xx_platform tpm_tis soc_button_array tpm nfs
d auth_rpcgss nfs_acl lockd grace sunrpc i915 i2c_algo_bit drm_kms_helper drm 8021q garp stp llc mrp r8169 serio_raw mii sdhci_acpi sdhci vi
deo mmc_core i2c_hid uas fjes usb_storage
Mar 01 15:24:53 alan-nuc kernel: CPU: 1 PID: 1928 Comm: gsettings Tainted: G        W       4.5.0-0.rc5.git0.2.vanilla.knurd.1.fc23.x86_64 #
1
Mar 01 15:24:53 alan-nuc kernel: Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0047.2015.1123.0950 11/23/2015
Mar 01 15:24:53 alan-nuc kernel: task: ffff88016fd08000 ti: ffff880071e38000 task.ti: ffff880071e38000
Mar 01 15:24:53 alan-nuc kernel: RIP: 0010:[<ffffffff811daa36>]  [<ffffffff811daa36>] vma_interval_tree_insert+0x26/0x80
Mar 01 15:24:53 alan-nuc kernel: RSP: 0018:ffff880071e3bd38  EFLAGS: 00010202
Mar 01 15:24:53 alan-nuc kernel: RAX: 0e200e200e200e20 RBX: ffff88016fef8000 RCX: ffff88005c08cd58
Mar 01 15:24:53 alan-nuc kernel: RDX: ffff88005c08cd68 RSI: ffff8801780aac00 RDI: ffff88016fc31480
Mar 01 15:24:53 alan-nuc kernel: RBP: ffff880071e3bd40 R08: 0000000000000249 R09: 0000000000000000
Mar 01 15:24:53 alan-nuc kernel: R10: ffff88016fc31480 R11: 00007f476599e000 R12: ffff88016fc31480
Mar 01 15:24:53 alan-nuc kernel: R13: ffff88006a2dd860 R14: ffff88006a2dd870 R15: ffff8801780aac08
Mar 01 15:24:53 alan-nuc kernel: FS:  00007f4765bdc800(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000
Mar 01 15:24:53 alan-nuc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 01 15:24:53 alan-nuc kernel: CR2: 00007f475bdd91f0 CR3: 000000016fcf6000 CR4: 00000000001006e0
Mar 01 15:24:53 alan-nuc kernel: Stack:
Mar 01 15:24:53 alan-nuc kernel:  ffffffff811e77b6 ffff880071e3bd88 ffffffff811e8424 ffff88017375ee40
Mar 01 15:24:53 alan-nuc kernel:  ffff8801780aabe0 ffff88006a2dd870 00007f475bb8f000 0000000000000075
Mar 01 15:24:53 alan-nuc kernel:  ffff88017375ee40 ffff88016fc31480 ffff880071e3be18 ffffffff811eb12d
Mar 01 15:24:53 alan-nuc kernel: Call Trace:
Mar 01 15:24:53 alan-nuc kernel:  [<ffffffff811e77b6>] ? __vma_link_file+0x46/0x50
Mar 01 15:24:53 alan-nuc kernel:  [<ffffffff811e8424>] vma_link+0x74/0xc0
Mar 01 15:24:53 alan-nuc kernel:  [<ffffffff811eb12d>] mmap_region+0x3bd/0x5f0
Mar 01 15:24:53 alan-nuc kernel:  [<ffffffff811eb644>] do_mmap+0x2e4/0x3e0
Mar 01 15:24:53 alan-nuc kernel:  [<ffffffff811ce35f>] vm_mmap_pgoff+0xaf/0xe0
Mar 01 15:24:53 alan-nuc kernel:  [<ffffffff811e9371>] SyS_mmap_pgoff+0x1c1/0x290
Mar 01 15:24:53 alan-nuc kernel:  [<ffffffff8101cd2b>] SyS_mmap+0x1b/0x30
Mar 01 15:24:53 alan-nuc kernel:  [<ffffffff817b7c2e>] entry_SYSCALL_64_fastpath+0x12/0x71
Mar 01 15:24:53 alan-nuc kernel: Code: 00 00 00 00 00 0f 1f 44 00 00 48 8b 47 08 48 2b 07 49 89 fa 4c 8b 8f 98 00 00 00 48 89 f2 31 c9 48 c1 e8 0c 4d 8d 44 01 ff eb 1d <4c> 39 40 18 73 04 4c 89 40 18 4c 3b 48 40 48 8d 48 10 48 8d 50 
Mar 01 15:24:53 alan-nuc kernel: RIP  [<ffffffff811daa36>] vma_interval_tree_insert+0x26/0x80
Comment 6 Andy Shevchenko 2016-03-01 12:27:13 EST
Okay, It clearly has nothing to do with dw_dmac. You may create another bug on kernel bugzilla I suppose to track this issue.
Comment 7 Andy Shevchenko 2016-08-10 10:49:39 EDT
Have no idea if Fedora's kernel has the fix, but no new report since v4.5 out.
Comment 8 Alan Jenkins 2016-08-20 03:35:54 EDT
I believe that is correct.  I didn't have further boot problems testing 4.6, and using Fedora 4.6 kernels.  This is without blacklisting any modules or custom boot options.

Dunno about the other (unrelated) problem, I probably haven't been messing around with the VGA connector so much.
Comment 9 Laura Abbott 2016-09-23 15:24:03 EDT
*********** MASS BUG UPDATE **************
 
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 23 kernel bugs.
 
Fedora 23 has now been rebased to 4.7.4-100.fc23.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 24 or 25, and are still experiencing this issue, please change the version to Fedora 24 or 25.
 
If you experience different issues, please open a new bug report for those.
Comment 10 Alan Jenkins 2016-09-23 16:58:46 EDT
Closing.  Thanks everyone!

Note You need to log in before you can comment on or make changes to this bug.