Bug 1874117

Summary: 5.8.3-300.fc33.aarch64 kernel panic on boot (X-Gene PMU)
Product: [Fedora] Fedora Reporter: Paul Whalen <pwhalen>
Component: kernelAssignee: Peter Robinson <pbrobinson>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 33CC: acaringi, airlied, awilliam, bskeggs, fzatlouk, hdegoede, ichavero, itamar, jarodwilson, jcm, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, mjg59, msalter, pbrobinson, robatino, steved
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: aarch64   
OS: Linux   
Whiteboard: AcceptedBlocker
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-13 20:49:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418, 1766775, 1766777    
Attachments:
Description Flags
fulll boot log
none
Fix uninitialized variable in xgene PMU driver none

Description Paul Whalen 2020-08-31 15:34:34 UTC
1. Please describe the problem:
When attempting to boot 5.8.3-300.fc33 on an ampere eMag, it panics. 

2. What is the Version-Release number of the kernel:
5.8.3-300.fc33

Panic:

[    9.925276] xgene-pmu APMC0D83:00: X-Gene PMU version 3
[    9.938064] Unable to handle kernel read from unreadable memory at virtual address 0000000000004006
[    9.947101] Mem abort info:
[    9.949882]   ESR = 0x96000004
[    9.952927]   EC = 0x25: DABT (current EL), IL = 32 bits
[    9.958225]   SET = 0, FnV = 0
[    9.961265]   EA = 0, S1PTW = 0
[    9.964395] Data abort info:
[    9.967262]   ISV = 0, ISS = 0x00000004
[    9.971083]   CM = 0, WnR = 0
[    9.974041] [0000000000004006] user address but active_mm is swapper
[    9.980381] Internal error: Oops: 96000004 [#1] SMP
[    9.985246] Modules linked in:
[    9.988289] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.3-300.fc33.aarch64 #1
[    9.995583] Hardware name: Lenovo HR350A            7X35CTO1WW    /HR350A     , BIOS HVE104N-1.12 11/29/2019
[   10.005395] pstate: 00400005 (nzcv daif +PAN -UAO BTYPE=--)
[   10.010957] pc : string+0x50/0x100
[   10.014346] lr : vsnprintf+0x160/0x750
[   10.018081] sp : ffff800012b4b760
[   10.021381] x29: ffff800012b4b760 x28: 000000000000000c 
[   10.026679] x27: ffff8000113610d5 x26: ffff8000113610d5 
[   10.031977] x25: 0000000000000020 x24: 0000000000000000 
[   10.037275] x23: 00000000ffffffe8 x22: ffff800010f8e628 
[   10.042572] x21: ffff800012b4b8f0 x20: 0000000000000000 
[   10.047870] x19: 0000000000000000 x18: 00000000fffffffc 
[   10.053167] x17: 000000000000002d x16: 0000000000000001 
[   10.058465] x15: 0000000000000020 x14: 0000000000000000 
[   10.063762] x13: 0000000000000000 x12: 071c71c71c71c71c 
[   10.069060] x11: 00000000ffffff76 x10: ffff800012b4b8f0 
[   10.074357] x9 : ffff8000109e97d8 x8 : 00000000ffffffff 
[   10.079655] x7 : 000000000000000b x6 : 0000000000000000 
[   10.084952] x5 : 0000000000000000 x4 : 0000000000000000 
[   10.090250] x3 : ffff0a00ffffff04 x2 : 0000000000004006 
[   10.095547] x1 : ffffffffffffffff x0 : 000000000000000c 
[   10.100845] Call trace:
[   10.103280]  string+0x50/0x100
[   10.106321]  vsnprintf+0x160/0x750
[   10.109711]  devm_kvasprintf+0x5c/0xb4
[   10.113446]  devm_kasprintf+0x54/0x60
[   10.117096]  __devm_ioremap_resource+0xdc/0x1a0
[   10.121613]  devm_ioremap_resource+0x14/0x20
[   10.125871]  acpi_get_pmu_hw_inf.isra.0+0x84/0x15c
[   10.130648]  acpi_pmu_dev_add+0xbc/0x21c
[   10.134558]  acpi_ns_walk_namespace+0x16c/0x1e4
[   10.139075]  acpi_walk_namespace+0xb4/0xfc
[   10.143157]  xgene_pmu_probe_pmu_dev+0x7c/0xe0
[   10.147586]  xgene_pmu_probe.part.0+0x2c0/0x310
[   10.152103]  xgene_pmu_probe+0x54/0x64
[   10.155839]  platform_drv_probe+0x60/0xb4
[   10.159835]  really_probe+0xe8/0x4a0
[   10.163397]  driver_probe_device+0xe4/0x100
[   10.167566]  device_driver_attach+0xcc/0xd4
[   10.171736]  __driver_attach+0xb0/0x17c
[   10.175558]  bus_for_each_dev+0x6c/0xb0
[   10.179380]  driver_attach+0x30/0x40
[   10.182942]  bus_add_driver+0x154/0x250
[   10.186764]  driver_register+0x84/0x140
[   10.190586]  __platform_driver_register+0x54/0x60
[   10.195278]  xgene_pmu_driver_init+0x28/0x34
[   10.199535]  do_one_initcall+0x40/0x204
[   10.203358]  do_initcalls+0x104/0x144
[   10.207007]  kernel_init_freeable+0x198/0x210
[   10.211352]  kernel_init+0x20/0x12c
[   10.214827]  ret_from_fork+0x10/0x18
[   10.218391] Code: 91000400 110004e1 eb08009f 540000c0 (38646846) 
[   10.224484] ---[ end trace f08c10566496a703 ]---
[   10.229165] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[   10.236815] SMP: stopping secondary CPUs
[   10.241945] Kernel Offset: 0x40000 from 0xffff800010000000
[   10.247416] PHYS_OFFSET: 0x80000000
[   10.250892] CPU features: 0x240002,20802008
[   10.255061] Memory Limit: none
[   10.258107] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Comment 1 Paul Whalen 2020-08-31 15:36:09 UTC
Created attachment 1713174 [details]
fulll boot log

Comment 2 Paul Whalen 2020-08-31 16:56:57 UTC
Last working kernel was 5.8.2-300.fc33.aarch64

5.8.3-300.fc33.aarch64 panics on boot.

Comment 3 Mark Salter 2020-09-02 01:10:04 UTC
Created attachment 1713385 [details]
Fix uninitialized variable in xgene PMU driver

A recent v5.9-rc1 patch uncovered a long standing bug in xgene PMU driver. This patche initializes the resource struct so that later reference to a bad pointer is avoided.

Comment 4 Mark Salter 2020-09-02 01:10:45 UTC
I'll send a patch upstream tomorrow.

Comment 5 Peter Robinson 2020-09-02 13:36:09 UTC
Patch pushed to 5.8.x for F-33/32/31.

Thanks for the patch Mark.

Comment 6 Paul Whalen 2020-09-02 15:30:06 UTC
Proposing as a blocker for F33 beta, this greatly inhibits testing on aarch64.

Comment 7 Paul Whalen 2020-09-02 18:05:49 UTC
Affects any device that uses the X-Gene PMU driver, not just the Ampere eMag.

Comment 8 FrantiĊĦek Zatloukal 2020-09-04 08:09:13 UTC
Accepted as Beta Blocker per voting in https://pagure.io/fedora-qa/blocker-review/issue/59 .

Bug hinders execution of required Beta test plans or dramatically reduces test coverage on aarch64.

Comment 9 Paul Whalen 2020-09-04 13:46:14 UTC
5.8.6-301.fc33.aarch64 boots as expected on the emag. 

Thanks again Mark.

Comment 10 Fedora Update System 2020-09-08 16:58:42 UTC
FEDORA-2020-5081eec059 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-5081eec059

Comment 11 Fedora Update System 2020-09-08 17:04:36 UTC
FEDORA-2020-5081eec059 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 12 Peter Robinson 2020-10-07 08:32:03 UTC
This isn't properly fixed, there's a new fix headed upstream for 5.10:

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/core&id=a76b8236edcf

Comment 13 Fedora Blocker Bugs Application 2020-10-07 08:35:18 UTC
Proposed as a Blocker for 33-final by Fedora user pbrobinson using the blocker tracking app because:

 Issues on enterprise aarch64 Ampete eMAG systems including the HW we use for the builders.

Comment 14 Fedora Update System 2020-10-08 11:42:31 UTC
FEDORA-2020-9664e2f1d2 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-9664e2f1d2

Comment 15 Fedora Update System 2020-10-08 22:19:28 UTC
FEDORA-2020-9664e2f1d2 has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-9664e2f1d2`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-9664e2f1d2

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 16 Fedora Update System 2020-10-12 21:57:04 UTC
FEDORA-2020-9664e2f1d2 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 17 Adam Williamson 2020-10-13 17:11:39 UTC
The update apparently wasn't marked as fixing the bug; can we close it or is something else needed? Thanks!

Comment 18 Peter Robinson 2020-10-13 18:03:42 UTC
(In reply to Adam Williamson from comment #17)
> The update apparently wasn't marked as fixing the bug; can we close it or is
> something else needed? Thanks!

Which update? I updated to a newer more robust fix that is landing upstream in 5.10 as part of the 5.8.14 kernel, it seems the changelog was trimmed. So IMO this can be closed.

* Wed Oct  7 2020 Peter Robinson <pbrobinson>
- Fix aarch64 boot crash on BTI capable systems
- Fix boot crash on aarch64 Ampere eMAG systems (rhbz #1874117)
 
* Thu Oct  1 12:09:16 CDT 2020 Justin M. Forbes <jforbes> - 5.8.13-300
- Linux v5.8.13

Comment 19 Adam Williamson 2020-10-13 20:49:50 UTC
https://bodhi.fedoraproject.org/updates/FEDORA-2020-9664e2f1d2 - #c16 above says it was pushed to stable. That was the 5.8.14-300 update. So if you think that fixed it, let's go ahead and close.