Bug 1788488 - BUG: kernel NULL pointer dereference, address: 0000000000000058 kernel panic on boot
Summary: BUG: kernel NULL pointer dereference, address: 0000000000000058 kernel panic ...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 31
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-07 11:09 UTC by Piotr Żurek
Modified: 2020-10-07 21:27 UTC (History)
24 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-25 22:27:15 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Kernel panic on boot (4.37 MB, image/jpeg)
2020-01-07 11:09 UTC, Piotr Żurek
no flags Details
Kernel Panic (2.96 MB, image/jpeg)
2020-01-08 01:18 UTC, Matthew Phillips
no flags Details
kernel panic on boot on ThinkPad A285 (1.58 MB, image/jpeg)
2020-01-08 06:45 UTC, Masami Ichikawa
no flags Details

Description Piotr Żurek 2020-01-07 11:09:45 UTC
Created attachment 1650357 [details]
Kernel panic on boot

Description of problem:

After normal Fedora 31 dnf upgrade part of which was kernel update to 5.4.7-200-fc.31 from @updates repo the system ThinkPad A475

Version-Release number of selected component (if applicable):
"kernel.x86_64                                      5.4.7-200.fc31                      @updates"


How reproducible:
every boot

Steps to Reproduce:
1. install/update to kernel.x86_64                                      5.4.7-200.fc31                      @updates                             

2. reboot
3. observe kernel panic

Actual results:
kernel panic just before mounting root (also before decrypting 

Expected results:
no issues boot

Additional info:
The system boots fine with previous kernel version: "kernel.x86_64                                      5.3.16-300.fc31                     @updates"

Comment 1 Matthew Phillips 2020-01-08 01:18:11 UTC
Created attachment 1650546 [details]
Kernel Panic

I came here to post the same thing.  Are you also using a Lenovo ThinkPad E585?  I attached a picture of my kernal panic as well, it looks fairly similar.

Comment 2 Matthew Phillips 2020-01-08 01:22:19 UTC
Sorry, I somehow missed your computer model in the original post.  So it looks like it might be some sort of issue with AMD hardware (my processor is Ryzen 7 2700U if that helps).  My panic occurred on Silverblue though.

Comment 3 Masami Ichikawa 2020-01-08 06:35:50 UTC
I have Lenovo ThinkPad A285 (Ryzen  5 PRO 2500U) and had looks same issue.
I also have i7-9700K machine that works fine.

Comment 4 Masami Ichikawa 2020-01-08 06:45:26 UTC
Created attachment 1650593 [details]
kernel panic on boot on ThinkPad A285

Comment 5 Piotr Żurek 2020-01-09 11:35:17 UTC
My ThinkPad A475 is previous arch AMD (Bristol Ridge) CPU. Quite a different beast than Ryzen. I suspect some Thinkpad/AMD BIOS problem in general but that's only suspicion.

Comment 6 Piotr Żurek 2020-01-10 15:11:34 UTC
FYI - The 5.4.8 kernel update issued today also panics.

Comment 7 Andrew Hutchings 2020-01-10 19:22:30 UTC
Also affects my Lenovo T495, Ryzen 7 PRO 3700U

Comment 8 Andrew Hutchings 2020-01-10 19:23:01 UTC
Also affects my Lenovo T495, Ryzen 7 PRO 3700U

Comment 9 Matthew Phillips 2020-01-11 20:04:32 UTC
How long does it usually take for something like this to get a look from a developer?  Honestly asking, fortunately I don't run into bugs very often on Fedora.  But Silverblue doesn't have an upgrade option to exclude the kernel, at least that I can tell, and there is a high severity update to Firefox that needs to be installed.  And AMD Ryzen hardware has been fairly popular.

Comment 10 hoppsz 2020-01-12 01:07:35 UTC
Try disabling Secure Boot.

This panic also affects my Lenovo Ideapad 330S-15ARR, AMD Ryzen 5 2500U. 
The kernel panics immediately when booting kernel 5.4.7-200-fc.31 or kernel 5.4.8. The panic looks identical to what others have attached, i.e.

  BUG: kernel NULL pointer dereference, address 0000000000000058

HOWEVER:

Disabling Secure Boot in the BIOS Setup Utility alows the laptop to boot.
  Secure Boot enabled:  panics every boot.
  Secure Boot disabled: boots and runs with no issues.

Comment 11 Masami Ichikawa 2020-01-12 02:01:15 UTC
I added "trace_clock=local" in kernel command line. That works fine for me.  It works with secureboot on.

Comment 12 Björn 2020-01-12 09:09:44 UTC
I can confirm that masami256's fix (adding the kernel line arg "trace_clock=local") resolved the issue for me on a Dell XPS 9750.

My specs: https://linux-hardware.org/?probe=6c4e2e9577

Comment 13 Andrew Hutchings 2020-01-12 12:03:38 UTC
Also confirmed that the "trace_clock=local" workaround works on my ThinkPad T495

Comment 14 Matthew Phillips 2020-01-12 19:18:27 UTC
(In reply to masami256 from comment #11)
> I added "trace_clock=local" in kernel command line. That works fine for me. 
> It works with secureboot on.

Thank you so much, this also worked for me.  I really appreciate the help because I was worried I might have to nuke-and-pave the OS.  Forgive my ignorance, but I'd like to ask a couple of other questions regarding this workaround, I understand if you can't answer all of them:

1.  Will there be a fix in a future kernel?
2.  How will we know when the kernel argument is no longer needed?  Will it be listed here?
3.  (not as important) How did you figure this out?  What does the kernel argument actually do?

Thanks again!
Matt

Comment 15 Piotr Żurek 2020-01-12 23:10:21 UTC
Thanks, I confirm the workaround works also for Thinkpad A475.


Ad 3. Didn't tought of that earlier but removing "quiet" from kernel boot parameters allows to see this message: 


              "Unstable clock detected, switching default tracing clock to "global"
               If you want to keep using the local clock, then add:
                 "trace_clock=local"
               on the kernel command line

so that's why it was probably suggested. 

But why is this needed in our case is beyond me.

Comment 16 dani 2020-01-13 20:17:09 UTC
Same issue and workaround works on Thinkpad E595 - AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx.
Issue encountered after upgrade to kernel 5.4.8-200 from 5.3.16-300
Using secureboot, kernel in lockdown with messages 'Tracing disabled due to lockdown'.
Removing 'quiet' shows same error - "Unstable clock detected, switching default tracing clock to "global".

I'm wondering if this has anything to do with zen's aggressive power save - see  https://www.agner.org/optimize/blog/read.php?i=838 

"The Ryzen is saving power quite aggressively. Unused units are clock gated, and the clock frequency is varying quite dramatically with the workload and the temperature. In my tests, I often saw a clock frequency as low as 8% of the nominal frequency in cases where disk access was the limiting factor, while the clock frequency could be as high as 114% of the nominal frequency after a very long sequence of CPU-intensive code. Such a high frequency cannot be obtained if all eight cores are active because of the increase in temperature.

The varying clock frequency was a big problem for my performance tests because it was impossible to get precise and reproducible measurements of computation times."

Other possibly relevant messages:
"TSC synchronization [CPU#0 -> CPU#1]:
Measured 8675990638 cycles TSC warp between CPUs, turning off TSC clock.
tsc: Marking TSC unstable due to check_tsc_sync_source failed"

Setting trace_clock=  (That is - with no value after the equal sign) which presumably disables the trace_clock also seems to work.

Comment 17 Masami Ichikawa 2020-01-15 06:06:02 UTC
I build upstream kernels 5.4.7 and 5.5-rc5(I use /boot/config-5.4.7-200.fc31.x86_64 for base configuration). They booted normally without trace_clock option.
It might be fedora patches cause this problem(or I missed some kernel configuration). anyway I'll check these patches.

Comment 18 dani 2020-01-15 07:35:42 UTC
(In reply to masami256 from comment #17)
> I build upstream kernels 5.4.7 and 5.5-rc5(I use
> /boot/config-5.4.7-200.fc31.x86_64 for base configuration). They booted
> normally without trace_clock option.
> It might be fedora patches cause this problem(or I missed some kernel
> configuration). anyway I'll check these patches.

Did you enable kernel lockdown? AFAICT the trace_clock issue only happens with lockdown (which is enabled by default with secureboot), and is the default fedora install on thinkpads, possibly due to default bios/uefi configurations of thinkpads.

Comment 19 dani 2020-01-15 07:42:22 UTC
see https://www.fosslinux.com/21502/linux-kernel-5-4-to-get-lockdown-functionality.htm
"Linux kernel 5.4 to get lockdown functionality"

Comment 20 Masami Ichikawa 2020-01-15 07:46:27 UTC
> Did you enable kernel lockdown? AFAICT the trace_clock issue only happens with lockdown (which is enabled by default with secureboot), and is the default fedora install on thinkpads, possibly due to default bios/uefi configurations of thinkpads.

Yes, I enabled it when I build upstream kernel.

Comment 21 dani 2020-01-15 10:13:12 UTC
I'm sure you've already checked this, but could you please verify again? what are the contents of /sys/kernel/security/lsm ?
Mine are: lockdown,capability,yama,selinux
And of course secureboot is also enabled (mokutil --sb-state).

Comment 22 Masami Ichikawa 2020-01-15 23:48:51 UTC
> I'm sure you've already checked this, but could you please verify again? what are the contents of /sys/kernel/security/lsm ?
Mine are: lockdown,capability,yama,selinux
And of course secureboot is also enabled (mokutil --sb-state).

sure! I checked linux 5.5-rc5.

here is command result.

$ cat /sys/kernel/security/lsm
lockdown,capability,yama,selinux

$ mokutil --sb-state
SecureBoot enabled

Comment 23 Masami Ichikawa 2020-01-16 00:16:31 UTC
I found difference between fedora kernel and upstream kernel. The fedora kernel added efi-secureboot.patch (https://src.fedoraproject.org/rpms/kernel/blob/f31/f/efi-secureboot.patch). This patch adds new kernel option which called CONFIG_LOCK_DOWN_IN_EFI_SECURE_BOOT.
When I comment out this patch in kernel.spec or unset CONFIG_LOCK_DOWN_IN_EFI_SECURE_BOOT option in kernel-x86_64-fedora.config, kernel boots without kernel panic.

I checkout kernel package from git repository(https://src.fedoraproject.org/rpms/kernel/tree/f31) and build kernel 5.4.12 packages. When I clone git repo, HEAD was 40a1cf57d3dd1365d4d4373ca2a278d9633cba17.

Comment 24 Masami Ichikawa 2020-01-16 05:43:58 UTC
(In reply to masami256 from comment #23)
> I found difference between fedora kernel and upstream kernel. The fedora
> kernel added efi-secureboot.patch
> (https://src.fedoraproject.org/rpms/kernel/blob/f31/f/efi-secureboot.patch).
> This patch adds new kernel option which called
> CONFIG_LOCK_DOWN_IN_EFI_SECURE_BOOT.
> When I comment out this patch in kernel.spec or unset
> CONFIG_LOCK_DOWN_IN_EFI_SECURE_BOOT option in kernel-x86_64-fedora.config,
> kernel boots without kernel panic.
> 
> I checkout kernel package from git
> repository(https://src.fedoraproject.org/rpms/kernel/tree/f31) and build
> kernel 5.4.12 packages. When I clone git repo, HEAD was
> 40a1cf57d3dd1365d4d4373ca2a278d9633cba17.

I am able to reproduce this oops with upstream kernel.  If I set CONFIG_LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY=y, boot fails.

Comment 25 Masami Ichikawa 2020-01-21 14:50:19 UTC
I wrote a patch and it was accepted (https://lkml.org/lkml/2020/1/21/586). 
So when this patch merged into stable tree, this bug will be fixed in fedora kernel package in the feature.

Comment 26 Matthew Phillips 2020-01-29 02:56:21 UTC
Masami, thank you for your help and hard work!

Comment 27 Masami Ichikawa 2020-01-30 14:11:09 UTC
(In reply to Matthew Phillips from comment #26)
> Masami, thank you for your help and hard work!

No problem :)

btw, The patch has been included in Linux 5.5 and 5.4.16.

Comment 28 Masami Ichikawa 2020-02-05 00:09:04 UTC
hi. I installed kernel-5.4.17-200.fc31, that works fine without trace_clock option :)

Comment 29 Matthew Phillips 2020-02-08 01:18:17 UTC
(In reply to Masami Ichikawa from comment #28)
> hi. I installed kernel-5.4.17-200.fc31, that works fine without trace_clock
> option :)

I can confirm, it worked for me as well.  Thanks one last time!

Comment 30 Justin M. Forbes 2020-03-03 16:32:24 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 31 kernel bugs.

Fedora 31 has now been rebased to 5.5.7-200.fc31.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 32, and are still experiencing this issue, please change the version to Fedora 32.

If you experience different issues, please open a new bug report for those.

Comment 31 Justin M. Forbes 2020-03-25 22:27:15 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.


Note You need to log in before you can comment on or make changes to this bug.