Created attachment 1711896 [details] console.log 1. Please describe the problem: In CKI testing we started seeing a kernel panic after booting into an upstream kernel starting with 5.8.0-rc1 on ppc64le, primarily occurring on Power9 (bare metal and virtualized) kernel BUG at arch/powerpc/mm/pgtable.c:304! 2. What is the Version-Release number of the kernel: 5.9.0-rc1 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : Unsure when it first started, but we've seen it as early as 5.8.0 on mainline commit 30185b69a2d5 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: Yes not reproducible every time but easily reproducible on Power9, just install the kernel and boot 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: 6. Are you running any modules that not shipped with directly Fedora's kernel?: No 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. Here's a snippet of the kernel oops, also uploaded a complete log of the console.log [ 2.659569] ------------[ cut here ]------------ [ 2.659596] WARNING: CPU: 28 PID: 1 at arch/powerpc/mm/pgtable.c:185 set_pte_at+0xd8/0x1c0 [ 2.659629] Modules linked in: [ 2.659650] CPU: 28 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc1-e622126.cki #1 [ 2.659684] NIP: c000000000080a48 LR: c000000001310dd0 CTR: 0000000000000000 [ 2.659718] REGS: c000201cbd40b840 TRAP: 0700 Not tainted (5.9.0-rc1-e622126.cki) [ 2.659751] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44000222 XER: 00000000 [ 2.659789] CFAR: c000000000080994 IRQMASK: 0 [ 2.659789] GPR00: c000000001310dd0 c000201cbd40bad0 c000000001a1a800 c000201caa96a800 [ 2.659789] GPR04: 000fb49f39720000 c000001fde540090 05012d0100000080 00000000de540005 [ 2.659789] GPR08: 0000000000000080 07000000000000c0 05000000000000c0 0000000000000009 [ 2.659789] GPR12: 0000000000000000 c000001ffffd0800 f9ffffffffffffff 050054de1f000080 [ 2.659789] GPR16: 090097aa1c200080 c000000000eb3ea0 c000000001af1a58 c00c000007f79528 [ 2.659789] GPR20: c000000001200000 c000000001a4af18 0000000000000120 09e0ffde1f000080 [ 2.659789] GPR24: c000001fde2c76c0 000000000000012d c000001fdeffe3e0 8000000000000105 [ 2.659789] GPR28: 000fb49f39720000 c000201caa970e58 c000201caa96a800 c000001fde540090 [ 2.659915] NIP [c000000000080a48] set_pte_at+0xd8/0x1c0 [ 2.659937] LR [c000000001310dd0] debug_vm_pgtable+0x76c/0x1bd4 [ 2.659958] Call Trace: [ 2.659976] [c000201cbd40bad0] [c0000000003d8158] __pmd_alloc+0x38/0x120 (unreliable) [ 2.660024] [c000201cbd40bb10] [c000000001310bf4] debug_vm_pgtable+0x590/0x1bd4 [ 2.660049] [c000201cbd40bc10] [c000000000011d90] do_one_initcall+0x60/0x2b0 [ 2.660083] [c000201cbd40bce0] [c0000000012d4da0] kernel_init_freeable+0x2dc/0x360 [ 2.660108] [c000201cbd40bdb0] [c0000000000123c4] kernel_init+0x2c/0x158 [ 2.660121] [c000201cbd40be20] [c00000000000d3d0] ret_from_kernel_thread+0x5c/0x6c [ 2.660144] Instruction dump: [ 2.660162] ebc10030 fbe50000 38210040 7c0803a6 ebe1fff8 4e800020 3d200700 792907c6 [ 2.660200] 612900c0 7d4a4838 2c2a00c0 4182ff54 <0fe00000> 3d220003 39290718 60df0040 [ 2.660218] ---[ end trace 173cdb877073211b ]--- [ 2.660229] ------------[ cut here ]------------ [ 2.660248] kernel BUG at arch/powerpc/mm/pgtable.c:304! [ 2.660269] Oops: Exception in kernel mode, sig: 5 [#1] [ 2.660290] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV [ 2.660321] Modules linked in: [ 2.660340] CPU: 28 PID: 1 Comm: swapper/0 Tainted: G W 5.9.0-rc1-e622126.cki #1 [ 2.660374] NIP: c000000000080ed0 LR: c000000000485ef4 CTR: 0000000000000000 [ 2.660419] REGS: c000201cbd40b820 TRAP: 0700 Tainted: G W (5.9.0-rc1-e622126.cki) [ 2.660453] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44000222 XER: 00000000 [ 2.660489] CFAR: c000000000485ef0 IRQMASK: 0 [ 2.660489] GPR00: 00000000deffe009 c000201cbd40bab0 c000000001a1a800 0000000009e0ffde [ 2.660489] GPR04: 000fb49f39720000 0000000000000005 09e0ffde1f000080 ffffffffffffffff [ 2.660489] GPR08: 0000000000000000 0000000000000001 0000000000000001 0000000000000009 [ 2.660489] GPR12: 0000000000000000 c000001ffffd0800 f9ffffffffffffff 050054de1f000080 [ 2.660489] GPR16: 090097aa1c200080 c000000000eb3ea0 c000000001af1a58 c00c000007f79528 [ 2.660489] GPR20: c000000001200000 c000000001a4af18 0000000000000120 09e0ffde1f000080 [ 2.660489] GPR24: c000001fde2c76c0 000000000000012d c000001fdeffe3e0 8000000000000105 [ 2.660489] GPR28: 000fb49f39720000 c000201caa970e58 c0000000012d0105 c000001fde540090 [ 2.660709] NIP [c000000000080ed0] assert_pte_locked+0xf0/0x1a0 [ 2.660741] LR [c000000000485ef4] pte_update+0xd4/0x190 [ 2.660770] Call Trace: [ 2.660788] [c000201cbd40bab0] [c000000000485ef4] pte_update+0xd4/0x190 (unreliable) [ 2.660822] [c000201cbd40bb10] [c000000001310df0] debug_vm_pgtable+0x78c/0x1bd4 [ 2.660856] [c000201cbd40bc10] [c000000000011d90] do_one_initcall+0x60/0x2b0 [ 2.660890] [c000201cbd40bce0] [c0000000012d4da0] kernel_init_freeable+0x2dc/0x360 [ 2.660924] [c000201cbd40bdb0] [c0000000000123c4] kernel_init+0x2c/0x158 [ 2.660957] [c000201cbd40be20] [c00000000000d3d0] ret_from_kernel_thread+0x5c/0x6c [ 2.660980] Instruction dump: [ 2.660998] 39290010 7ce707b4 7c894c36 79081564 7d293838 7908f082 38e0ffff 79291f24 [ 2.661035] 78e8f00e 7d09402a 7d090074 7929d182 <0b090000> 79070022 5509c03e 5109421e [ 2.661075] ---[ end trace 173cdb877073211c ]--- [ 2.745561] [ 2.941933] usb 2-4: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 2.975769] usb 2-4: New USB device found, idVendor=0451, idProduct=8140, bcdDevice= 1.00 [ 2.975790] usb 2-4: New USB device strings: Mfr=0, Product=0, SerialNumber=0 [ 2.989529] hub 2-4:1.0: USB hub found [ 2.990397] hub 2-4:1.0: 4 ports detected [ 3.121304] usb 1-3: new high-speed USB device number 2 using xhci_hcd [ 3.303654] usb 1-3: New USB device found, idVendor=0557, idProduct=7000, bcdDevice= 0.00 [ 3.303683] usb 1-3: New USB device strings: Mfr=0, Product=0, SerialNumber=0 [ 3.318042] hub 1-3:1.0: USB hub found [ 3.318770] hub 1-3:1.0: 4 ports detected [ 3.471304] usb 1-4: new high-speed USB device number 3 using xhci_hcd [ 3.656509] usb 1-4: New USB device found, idVendor=0451, idProduct=8142, bcdDevice= 1.00 [ 3.656536] usb 1-4: New USB device strings: Mfr=0, Product=0, SerialNumber=1 [ 3.656550] usb 1-4: SerialNumber: 9E040849E165 [ 3.670034] hub 1-4:1.0: USB hub found [ 3.670752] hub 1-4:1.0: 4 ports detected [ 3.745631] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000005 [ 5.497797] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000005 ]--- [-- MARK -- Tue Aug 18 04:35:00 2020] [-- MARK -- Tue Aug 18 04:40:00 2020]
FWIW I haven't seen this issue on my Power9 system (Talos) running kernel-5.8.0-1.fc33.ppc64le for the last 2 weeks.
------- Comment From mainamdar.com 2020-08-24 08:58 EDT------- Any update on this
------- Comment From gusld.com 2020-10-21 17:59 EDT------- As I understand it, Aneesh is working on this upstream at https://lore.kernel.org/linux-mm/20200902114222.181353-1-aneesh.kumar@linux.ibm.com/ As a workaround, CONFIG_DEBUG_VM_PGTABLE can be disabled.
And it looks to me that the patchset has been already merged to 5.10