Bug 2053214 - early kernel panic on aarch64 with kernel-5.17.0-0.rc2.20220202+
Summary: early kernel panic on aarch64 with kernel-5.17.0-0.rc2.20220202+
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-10 16:57 UTC by Paul Whalen
Modified: 2022-04-12 15:42 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Paul Whalen 2022-02-10 16:57:59 UTC
1. Please describe the problem:

Mustang and Seattle show nothing on the serial console with kernel-5.17.0-0.rc2.20220202+. Last working kernel-5.17.0-0.rc2.83.fc36


2. What is the Version-Release number of the kernel:

kernel-5.17.0-0.rc3.20220208git555f3d7be91a.90.fc36

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

kernel-5.17.0-0.rc2.20220202+


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Attempt to boot 5.17 rc3 on affected hardware, no output is shown on the serial console.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

yes

6. Are you running any modules that not shipped with directly Fedora's kernel?:

no


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.


EFI stub: Booting Linux Kernel...
EFI stub: EFI_RNG_PROTOCOL unavailable
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services...
L3C: 8MB
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x500f0001]
[    0.000000] Linux version 5.17.0-0.rc3.20220208git555f3d7be91a.90.fc36.aarch64 (mockbuild.fedoraproject.org) (gcc (GCC) 12.0.1 20220205 (Red Hat 12.0.1-0), GNU ld version 2.37-24.fc36) #1 SMP Tue Feb 8 19:34:02 UTC 2022
[    0.000000] Machine model: APM X-Gene Mustang board
[    0.000000] earlycon: uart0 at MMIO32 0x000000001c020000 (options '')
[    0.000000] printk: bootconsole [uart0] enabled
[    0.000000] printk: debug: skip boot console de-registration.
[    0.000000] efi: EFI v2.40 by X-Gene Mustang Board EFI Oct 17 2016 13:54:05
[    0.000000] efi: ACPI=0x43fa700000 ACPI 2.0=0x43fa700014 SMBIOS 3.0=0x43fa9db000 ESRT=0x43ff006d18 MOKvar=0x43fa3e0000 MEMRESERVE=0x43fa5e0798 
[    0.000000] esrt: Reserving ESRT space from 0x00000043ff006d18 to 0x00000043ff006d78.
[    0.000000] NUMA: No NUMA configuration found
[    0.000000] NUMA: Faking a node at [mem 0x0000004000000000-0x00000043ffffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x43fdf3b0c0-0x43fdf51fff]
[    0.000000] Unable to handle kernel read from unreadable memory at virtual address 0000000000000000
[    0.000000] Mem abort info:
[    0.000000]   ESR = 0x96000004
[    0.000000]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.000000]   SET = 0, FnV = 0
[    0.000000]   EA = 0, S1PTW = 0
[    0.000000]   FSC = 0x04: level 0 translation fault
[    0.000000] Data abort info:
[    0.000000]   ISV = 0, ISS = 0x00000004
[    0.000000]   CM = 0, WnR = 0
[    0.000000] [0000000000000000] user address but active_mm is swapper
[    0.000000] Internal error: Oops: 96000004 [#1] SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-0.rc3.20220208git555f3d7be91a.90.fc36.aarch64 #1
[    0.000000] Hardware name: APM X-Gene Mustang board (DT)
[    0.000000] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.000000] pc : sparse_init+0xec/0x200
[    0.000000] lr : sparse_init+0xec/0x200
[    0.000000] sp : ffff80000a893d60
[    0.000000] x29: ffff80000a893d60 x28: 0000004391350018 x27: 00000043fa47aa88
[    0.000000] x26: 00000043fa47aa80 x25: 00000043ffa26428 x24: 00000043ffa26448
[    0.000000] x23: 0000000000000800 x22: 0000000000000000 x21: 0000000004000000
[    0.000000] x20: 0000000000000004 x19: ffff80000bfce800 x18: ffffffffffffffff
[    0.000000] x17: 0000000000010000 x16: 00000043ff9b0000 x15: 0000020000000000
[    0.000000] x14: 0000000000999000 x13: 0000000000000000 x12: 0000000000000000
[    0.000000] x11: ffff80000aa977b8 x10: 0000000000000002 x9 : 0000000000000000
[    0.000000] x8 : ffff0003fdf1b0c0 x7 : 0000000000000000 x6 : 000000000000003f
[    0.000000] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004
[    0.000000] x2 : ffff0003fdf1b0c0 x1 : 0000000000000800 x0 : 0000000000000000
[    0.000000] Call trace:
[    0.000000]  sparse_init+0xec/0x200
[    0.000000]  bootmem_init+0x64/0x1d4
[    0.000000]  setup_arch+0x19c/0x21c
[    0.000000]  start_kernel+0x94/0x4dc
[    0.000000]  __primary_switched+0xc0/0xc8
[    0.000000] Code: aa1703e0 97c8afd1 aa1703e0 9792ae3a (f9400001) 
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

Comment 1 Paul Whalen 2022-02-10 17:02:05 UTC
ACPI looks the same:

EFI stub: Booting Linux Kernel...
EFI stub: EFI_RNG_PROTOCOL unavailable
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services...
L3C: 8MB
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x500f0001]
[    0.000000] Linux version 5.17.0-0.rc3.20220208git555f3d7be91a.90.fc36.aarch64 (mockbuild.fedoraproject.org) (gcc (GCC) 12.0.1 20220205 (Red Hat 12.0.1-0), GNU ld version 2.37-24.fc36) #1 SMP Tue Feb 8 19:34:02 UTC 2022
[    0.000000] Machine model: APM X-Gene Mustang board
[    0.000000] earlycon: uart0 at MMIO32 0x000000001c020000 (options '')
[    0.000000] printk: bootconsole [uart0] enabled
[    0.000000] printk: debug: skip boot console de-registration.
[    0.000000] efi: EFI v2.40 by X-Gene Mustang Board EFI Oct 17 2016 13:54:05
[    0.000000] efi: ACPI=0x43fa700000 ACPI 2.0=0x43fa700014 SMBIOS 3.0=0x43fa9db000 ESRT=0x43ff006d18 MOKvar=0x43fa3e0000 MEMRESERVE=0x43fa5e0798 
[    0.000000] esrt: Reserving ESRT space from 0x00000043ff006d18 to 0x00000043ff006d78.
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000043FA700014 000024 (v02 APM   )
[    0.000000] ACPI: XSDT 0x00000043FA6F00E8 00007C (v01 APM    XGENE    00000003      01000013)
[    0.000000] ACPI: FACP 0x00000043FA6C0000 00010C (v05 APM    XGENE    00000003 INTL 20140724)
[    0.000000] ACPI: DSDT 0x00000043FA6D0000 005922 (v05 APM    APM88xxx 00000001 INTL 20140724)
[    0.000000] ACPI: DBG2 0x00000043FA6E0000 0000AA (v00 APMC0D XGENEDBG 00000000 INTL 20140724)
[    0.000000] ACPI: GTDT 0x00000043FA6A0000 000060 (v02 APM    XGENE    00000001 INTL 20140724)
[    0.000000] ACPI: MCFG 0x00000043FA690000 00003C (v01 APM    XGENE    00000002 INTL 20140724)
[    0.000000] ACPI: SPCR 0x00000043FA680000 000050 (v02 APMC0D XGENESPC 00000000 INTL 20140724)
[    0.000000] ACPI: SSDT 0x00000043FA670000 00002D (v02 APM    XGENE    00000001 INTL 20140724)
[    0.000000] ACPI: BERT 0x00000043FA660000 000030 (v01 APM    XGENE    00000002 INTL 20140724)
[    0.000000] ACPI: HEST 0x00000043FA650000 0002A8 (v01 APM    XGENE    00000002 INTL 20140724)
[    0.000000] ACPI: APIC 0x00000043FA640000 0002A4 (v03 APM    XGENE    00000003      01000013)
[    0.000000] ACPI: SSDT 0x00000043FA630000 000063 (v02 REDHAT MACADDRS 00000001      01000013)
[    0.000000] ACPI: SSDT 0x00000043FA620000 000032 (v02 REDHAT UARTCLKS 00000001      01000013)
[    0.000000] ACPI: SPCR: console: uart,mmio32,0x1c020000
[    0.000000] ACPI: CEDT not present
[    0.000000] NUMA: Failed to initialise from firmware
[    0.000000] NUMA: Faking a node at [mem 0x0000004000000000-0x00000043ffffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x43fdf5e0c0-0x43fdf74fff]
[    0.000000] Unable to handle kernel read from unreadable memory at virtual address 0000000000000000
[    0.000000] Mem abort info:
[    0.000000]   ESR = 0x96000004
[    0.000000]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.000000]   SET = 0, FnV = 0
[    0.000000]   EA = 0, S1PTW = 0
[    0.000000]   FSC = 0x04: level 0 translation fault
[    0.000000] Data abort info:
[    0.000000]   ISV = 0, ISS = 0x00000004
[    0.000000]   CM = 0, WnR = 0
[    0.000000] [0000000000000000] user address but active_mm is swapper
[    0.000000] Internal error: Oops: 96000004 [#1] SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-0.rc3.20220208git555f3d7be91a.90.fc36.aarch64 #1
[    0.000000] Hardware name: APM X-Gene Mustang board (DT)
[    0.000000] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.000000] pc : sparse_init+0xec/0x200
[    0.000000] lr : sparse_init+0xec/0x200
[    0.000000] sp : ffff80000a893d60
[    0.000000] x29: ffff80000a893d60 x28: 0000004391350018 x27: 00000043fa47aa88
[    0.000000] x26: 00000043fa47aa80 x25: 00000043ffa263d8 x24: 00000043ffa263f8
[    0.000000] x23: 0000000000000800 x22: 0000000000000000 x21: 0000000004000000
[    0.000000] x20: 0000000000000004 x19: ffff80000bfce800 x18: ffffffffffffffff
[    0.000000] x17: 0000000000010000 x16: 00000043ff9b0000 x15: 0000020000000000
[    0.000000] x14: 0000000000999200 x13: 0000000000000000 x12: 0000000000000000
[    0.000000] x11: ffff80000aa977b8 x10: 0000000000000002 x9 : 0000000000000000
[    0.000000] x8 : ffff0003fdf3e0c0 x7 : 0000000000000000 x6 : 000000000000003f
[    0.000000] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004
[    0.000000] x2 : ffff0003fdf3e0c0 x1 : 0000000000000800 x0 : 0000000000000000
[    0.000000] Call trace:
[    0.000000]  sparse_init+0xec/0x200
[    0.000000]  bootmem_init+0x64/0x1d4
[    0.000000]  setup_arch+0x19c/0x21c
[    0.000000]  start_kernel+0x94/0x4dc
[    0.000000]  __primary_switched+0xc0/0xc8
[    0.000000] Code: aa1703e0 97c8afd1 aa1703e0 9792ae3a (f9400001) 
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

Comment 3 Jeremy Linton 2022-04-06 16:27:04 UTC
I just duped this on my seattle as well (acpi mode), 

[    0.000000] [0000000000000000] user address but active_mm is swapper
[    0.000000] Internal error: Oops: 96000004 [#1] SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.1-300.fc36.aarch64 #1
[    0.000000] Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive) (DT)
[    0.000000] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.000000] pc : sparse_init+0xec/0x200
[    0.000000] lr : sparse_init+0xec/0x200
[    0.000000] sp : ffff80000a653d60
[    0.000000] x29: ffff80000a653d60 x28: 00000083faef0018 x27: 0000000000000000
[    0.000000] x26: 0000000000000001 x25: 00000083fff42f28 x24: 00000083fff42f48
[    0.000000] x23: 0000000000001000 x22: 0000000000000000 x21: 0000000008000000
[    0.000000] x20: 0000000000000004 x19: ffff80000abe8a40 x18: ffffffffffffffff
[    0.000000] x17: 00000000001bd000 x16: 00000083ffe43000 x15: 0000020000000000
[    0.000000] x14: 000000000006e000 x13: 0000020000000000 x12: 00000000001bd000
[    0.000000] x11: 00000083ffe43000 x10: 0000020000000000 x9 : 0000000000000000
[    0.000000] x8 : ffff0003fd1c86c0 x7 : 0000000000000000 x6 : 000000000000003f
[    0.000000] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004
[    0.000000] x2 : ffff0003fd1c86c0 x1 : 0000000000001000 x0 : 0000000000000000
[    0.000000] Call trace:
[    0.000000]  sparse_init+0xec/0x200
[    0.000000]  bootmem_init+0x64/0x1cc
[    0.000000]  setup_arch+0x198/0x218
[    0.000000]  start_kernel+0x7c/0x49c
[    0.000000]  __primary_switched+0xc0/0xc8
[    0.000000] Code: aa1703e0 97c93e89 aa1703e0 979555fe (f9400001)

Comment 4 Jeremy Linton 2022-04-06 16:52:39 UTC
Given the fact that building 5.17/5.18 with gcc11 results in a working machine, and this rather large bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160

its probably worthwhile to merge the gcc12 fix, and see if that fixes the problem.

Comment 5 Paul Whalen 2022-04-11 14:34:10 UTC
5.17.2-300.fc36.aarch64 is working on both the Seattle and Mustang.

Comment 6 Jeremy Linton 2022-04-11 17:49:23 UTC
Its possible something moved around, while tracking down the rpi4/genet bug there are quite a few non obvious changes/etc which cause the compiler to keep the resulting enable_dma() routine in place and call it. 

So, its not unexpected, I did the same a/b compiler test with this which attempts to only change the gcc building it, and that makes the problem appear/disappear in my testing.

Comment 7 Mark Salter 2022-04-12 15:24:44 UTC
I don't see how the compiler is involved here. When I looked at it, the downstream only patch in the MR mentioned in comment 2 was clearly at fault.
It incorrectly changed a test from "if (!mem_section)" to "if (!*mem_section)". On seattle and mustang, there is no sparse memory at mem_section[0], so it is NULL and __nr_to_section() always returns NULL which gets defreferenced and that's the cause of the splat.

Comment 8 Justin M. Forbes 2022-04-12 15:42:27 UTC
The patch mentioned in that ark MR was updated several times, the most recent being the one that Linus finally merged, and that patch is queued for all stable releases.


Note You need to log in before you can comment on or make changes to this bug.