1. Please describe the problem: When attempting to boot 5.8.3-300.fc33 on an ampere eMag, it panics. 2. What is the Version-Release number of the kernel: 5.8.3-300.fc33 Panic: [ 9.925276] xgene-pmu APMC0D83:00: X-Gene PMU version 3 [ 9.938064] Unable to handle kernel read from unreadable memory at virtual address 0000000000004006 [ 9.947101] Mem abort info: [ 9.949882] ESR = 0x96000004 [ 9.952927] EC = 0x25: DABT (current EL), IL = 32 bits [ 9.958225] SET = 0, FnV = 0 [ 9.961265] EA = 0, S1PTW = 0 [ 9.964395] Data abort info: [ 9.967262] ISV = 0, ISS = 0x00000004 [ 9.971083] CM = 0, WnR = 0 [ 9.974041] [0000000000004006] user address but active_mm is swapper [ 9.980381] Internal error: Oops: 96000004 [#1] SMP [ 9.985246] Modules linked in: [ 9.988289] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.3-300.fc33.aarch64 #1 [ 9.995583] Hardware name: Lenovo HR350A 7X35CTO1WW /HR350A , BIOS HVE104N-1.12 11/29/2019 [ 10.005395] pstate: 00400005 (nzcv daif +PAN -UAO BTYPE=--) [ 10.010957] pc : string+0x50/0x100 [ 10.014346] lr : vsnprintf+0x160/0x750 [ 10.018081] sp : ffff800012b4b760 [ 10.021381] x29: ffff800012b4b760 x28: 000000000000000c [ 10.026679] x27: ffff8000113610d5 x26: ffff8000113610d5 [ 10.031977] x25: 0000000000000020 x24: 0000000000000000 [ 10.037275] x23: 00000000ffffffe8 x22: ffff800010f8e628 [ 10.042572] x21: ffff800012b4b8f0 x20: 0000000000000000 [ 10.047870] x19: 0000000000000000 x18: 00000000fffffffc [ 10.053167] x17: 000000000000002d x16: 0000000000000001 [ 10.058465] x15: 0000000000000020 x14: 0000000000000000 [ 10.063762] x13: 0000000000000000 x12: 071c71c71c71c71c [ 10.069060] x11: 00000000ffffff76 x10: ffff800012b4b8f0 [ 10.074357] x9 : ffff8000109e97d8 x8 : 00000000ffffffff [ 10.079655] x7 : 000000000000000b x6 : 0000000000000000 [ 10.084952] x5 : 0000000000000000 x4 : 0000000000000000 [ 10.090250] x3 : ffff0a00ffffff04 x2 : 0000000000004006 [ 10.095547] x1 : ffffffffffffffff x0 : 000000000000000c [ 10.100845] Call trace: [ 10.103280] string+0x50/0x100 [ 10.106321] vsnprintf+0x160/0x750 [ 10.109711] devm_kvasprintf+0x5c/0xb4 [ 10.113446] devm_kasprintf+0x54/0x60 [ 10.117096] __devm_ioremap_resource+0xdc/0x1a0 [ 10.121613] devm_ioremap_resource+0x14/0x20 [ 10.125871] acpi_get_pmu_hw_inf.isra.0+0x84/0x15c [ 10.130648] acpi_pmu_dev_add+0xbc/0x21c [ 10.134558] acpi_ns_walk_namespace+0x16c/0x1e4 [ 10.139075] acpi_walk_namespace+0xb4/0xfc [ 10.143157] xgene_pmu_probe_pmu_dev+0x7c/0xe0 [ 10.147586] xgene_pmu_probe.part.0+0x2c0/0x310 [ 10.152103] xgene_pmu_probe+0x54/0x64 [ 10.155839] platform_drv_probe+0x60/0xb4 [ 10.159835] really_probe+0xe8/0x4a0 [ 10.163397] driver_probe_device+0xe4/0x100 [ 10.167566] device_driver_attach+0xcc/0xd4 [ 10.171736] __driver_attach+0xb0/0x17c [ 10.175558] bus_for_each_dev+0x6c/0xb0 [ 10.179380] driver_attach+0x30/0x40 [ 10.182942] bus_add_driver+0x154/0x250 [ 10.186764] driver_register+0x84/0x140 [ 10.190586] __platform_driver_register+0x54/0x60 [ 10.195278] xgene_pmu_driver_init+0x28/0x34 [ 10.199535] do_one_initcall+0x40/0x204 [ 10.203358] do_initcalls+0x104/0x144 [ 10.207007] kernel_init_freeable+0x198/0x210 [ 10.211352] kernel_init+0x20/0x12c [ 10.214827] ret_from_fork+0x10/0x18 [ 10.218391] Code: 91000400 110004e1 eb08009f 540000c0 (38646846) [ 10.224484] ---[ end trace f08c10566496a703 ]--- [ 10.229165] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 10.236815] SMP: stopping secondary CPUs [ 10.241945] Kernel Offset: 0x40000 from 0xffff800010000000 [ 10.247416] PHYS_OFFSET: 0x80000000 [ 10.250892] CPU features: 0x240002,20802008 [ 10.255061] Memory Limit: none [ 10.258107] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
Created attachment 1713174 [details] fulll boot log
Last working kernel was 5.8.2-300.fc33.aarch64 5.8.3-300.fc33.aarch64 panics on boot.
Created attachment 1713385 [details] Fix uninitialized variable in xgene PMU driver A recent v5.9-rc1 patch uncovered a long standing bug in xgene PMU driver. This patche initializes the resource struct so that later reference to a bad pointer is avoided.
I'll send a patch upstream tomorrow.
Patch pushed to 5.8.x for F-33/32/31. Thanks for the patch Mark.
Proposing as a blocker for F33 beta, this greatly inhibits testing on aarch64.
Affects any device that uses the X-Gene PMU driver, not just the Ampere eMag.
Accepted as Beta Blocker per voting in https://pagure.io/fedora-qa/blocker-review/issue/59 . Bug hinders execution of required Beta test plans or dramatically reduces test coverage on aarch64.
5.8.6-301.fc33.aarch64 boots as expected on the emag. Thanks again Mark.
FEDORA-2020-5081eec059 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-5081eec059
FEDORA-2020-5081eec059 has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report.
This isn't properly fixed, there's a new fix headed upstream for 5.10: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/core&id=a76b8236edcf
Proposed as a Blocker for 33-final by Fedora user pbrobinson using the blocker tracking app because: Issues on enterprise aarch64 Ampete eMAG systems including the HW we use for the builders.
FEDORA-2020-9664e2f1d2 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-9664e2f1d2
FEDORA-2020-9664e2f1d2 has been pushed to the Fedora 33 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-9664e2f1d2` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-9664e2f1d2 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2020-9664e2f1d2 has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report.
The update apparently wasn't marked as fixing the bug; can we close it or is something else needed? Thanks!
(In reply to Adam Williamson from comment #17) > The update apparently wasn't marked as fixing the bug; can we close it or is > something else needed? Thanks! Which update? I updated to a newer more robust fix that is landing upstream in 5.10 as part of the 5.8.14 kernel, it seems the changelog was trimmed. So IMO this can be closed. * Wed Oct 7 2020 Peter Robinson <pbrobinson> - Fix aarch64 boot crash on BTI capable systems - Fix boot crash on aarch64 Ampere eMAG systems (rhbz #1874117) * Thu Oct 1 12:09:16 CDT 2020 Justin M. Forbes <jforbes> - 5.8.13-300 - Linux v5.8.13
https://bodhi.fedoraproject.org/updates/FEDORA-2020-9664e2f1d2 - #c16 above says it was pushed to stable. That was the 5.8.14-300 update. So if you think that fixed it, let's go ahead and close.