Bug 1870213 - ppc64le panic kernel on boot - BUG at arch/powerpc/mm/pgtable.c:304!
Summary: ppc64le panic kernel on boot - BUG at arch/powerpc/mm/pgtable.c:304!
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: ppc64le
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PPCTracker
TreeView+ depends on / blocked
 
Reported: 2020-08-19 14:43 UTC by Rachel Sibley
Modified: 2022-01-26 20:17 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-26 20:17:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
console.log (394.88 KB, text/plain)
2020-08-19 14:43 UTC, Rachel Sibley
no flags Details


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 187769 0 None None None 2020-08-21 08:33:31 UTC

Description Rachel Sibley 2020-08-19 14:43:49 UTC
Created attachment 1711896 [details]
console.log

1. Please describe the problem:
In CKI testing we started seeing a kernel panic after booting into an upstream kernel starting with 5.8.0-rc1 on ppc64le,
primarily occurring on Power9 (bare metal and virtualized)

kernel BUG at arch/powerpc/mm/pgtable.c:304! 

2. What is the Version-Release number of the kernel: 5.9.0-rc1

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
Unsure when it first started, but we've seen it as early as 5.8.0 on mainline commit 30185b69a2d5

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
Yes not reproducible every time but easily reproducible on Power9, just install the kernel and boot

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Here's a snippet of the kernel oops, also uploaded a complete log of the console.log

[    2.659569] ------------[ cut here ]------------ 
[    2.659596] WARNING: CPU: 28 PID: 1 at arch/powerpc/mm/pgtable.c:185 set_pte_at+0xd8/0x1c0 
[    2.659629] Modules linked in: 
[    2.659650] CPU: 28 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc1-e622126.cki #1 
[    2.659684] NIP:  c000000000080a48 LR: c000000001310dd0 CTR: 0000000000000000 
[    2.659718] REGS: c000201cbd40b840 TRAP: 0700   Not tainted  (5.9.0-rc1-e622126.cki) 
[    2.659751] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44000222  XER: 00000000 
[    2.659789] CFAR: c000000000080994 IRQMASK: 0  
[    2.659789] GPR00: c000000001310dd0 c000201cbd40bad0 c000000001a1a800 c000201caa96a800  
[    2.659789] GPR04: 000fb49f39720000 c000001fde540090 05012d0100000080 00000000de540005  
[    2.659789] GPR08: 0000000000000080 07000000000000c0 05000000000000c0 0000000000000009  
[    2.659789] GPR12: 0000000000000000 c000001ffffd0800 f9ffffffffffffff 050054de1f000080  
[    2.659789] GPR16: 090097aa1c200080 c000000000eb3ea0 c000000001af1a58 c00c000007f79528  
[    2.659789] GPR20: c000000001200000 c000000001a4af18 0000000000000120 09e0ffde1f000080  
[    2.659789] GPR24: c000001fde2c76c0 000000000000012d c000001fdeffe3e0 8000000000000105  
[    2.659789] GPR28: 000fb49f39720000 c000201caa970e58 c000201caa96a800 c000001fde540090  
[    2.659915] NIP [c000000000080a48] set_pte_at+0xd8/0x1c0 
[    2.659937] LR [c000000001310dd0] debug_vm_pgtable+0x76c/0x1bd4 
[    2.659958] Call Trace: 
[    2.659976] [c000201cbd40bad0] [c0000000003d8158] __pmd_alloc+0x38/0x120 (unreliable) 
[    2.660024] [c000201cbd40bb10] [c000000001310bf4] debug_vm_pgtable+0x590/0x1bd4 
[    2.660049] [c000201cbd40bc10] [c000000000011d90] do_one_initcall+0x60/0x2b0 
[    2.660083] [c000201cbd40bce0] [c0000000012d4da0] kernel_init_freeable+0x2dc/0x360 
[    2.660108] [c000201cbd40bdb0] [c0000000000123c4] kernel_init+0x2c/0x158 
[    2.660121] [c000201cbd40be20] [c00000000000d3d0] ret_from_kernel_thread+0x5c/0x6c 
[    2.660144] Instruction dump: 
[    2.660162] ebc10030 fbe50000 38210040 7c0803a6 ebe1fff8 4e800020 3d200700 792907c6  
[    2.660200] 612900c0 7d4a4838 2c2a00c0 4182ff54 <0fe00000> 3d220003 39290718 60df0040  
[    2.660218] ---[ end trace 173cdb877073211b ]--- 
[    2.660229] ------------[ cut here ]------------ 
[    2.660248] kernel BUG at arch/powerpc/mm/pgtable.c:304! 
[    2.660269] Oops: Exception in kernel mode, sig: 5 [#1] 
[    2.660290] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV 
[    2.660321] Modules linked in: 
[    2.660340] CPU: 28 PID: 1 Comm: swapper/0 Tainted: G        W         5.9.0-rc1-e622126.cki #1 
[    2.660374] NIP:  c000000000080ed0 LR: c000000000485ef4 CTR: 0000000000000000 
[    2.660419] REGS: c000201cbd40b820 TRAP: 0700   Tainted: G        W          (5.9.0-rc1-e622126.cki) 
[    2.660453] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44000222  XER: 00000000 
[    2.660489] CFAR: c000000000485ef0 IRQMASK: 0  
[    2.660489] GPR00: 00000000deffe009 c000201cbd40bab0 c000000001a1a800 0000000009e0ffde  
[    2.660489] GPR04: 000fb49f39720000 0000000000000005 09e0ffde1f000080 ffffffffffffffff  
[    2.660489] GPR08: 0000000000000000 0000000000000001 0000000000000001 0000000000000009  
[    2.660489] GPR12: 0000000000000000 c000001ffffd0800 f9ffffffffffffff 050054de1f000080  
[    2.660489] GPR16: 090097aa1c200080 c000000000eb3ea0 c000000001af1a58 c00c000007f79528  
[    2.660489] GPR20: c000000001200000 c000000001a4af18 0000000000000120 09e0ffde1f000080  
[    2.660489] GPR24: c000001fde2c76c0 000000000000012d c000001fdeffe3e0 8000000000000105  
[    2.660489] GPR28: 000fb49f39720000 c000201caa970e58 c0000000012d0105 c000001fde540090  
[    2.660709] NIP [c000000000080ed0] assert_pte_locked+0xf0/0x1a0 
[    2.660741] LR [c000000000485ef4] pte_update+0xd4/0x190 
[    2.660770] Call Trace: 
[    2.660788] [c000201cbd40bab0] [c000000000485ef4] pte_update+0xd4/0x190 (unreliable) 
[    2.660822] [c000201cbd40bb10] [c000000001310df0] debug_vm_pgtable+0x78c/0x1bd4 
[    2.660856] [c000201cbd40bc10] [c000000000011d90] do_one_initcall+0x60/0x2b0 
[    2.660890] [c000201cbd40bce0] [c0000000012d4da0] kernel_init_freeable+0x2dc/0x360 
[    2.660924] [c000201cbd40bdb0] [c0000000000123c4] kernel_init+0x2c/0x158 
[    2.660957] [c000201cbd40be20] [c00000000000d3d0] ret_from_kernel_thread+0x5c/0x6c 
[    2.660980] Instruction dump: 
[    2.660998] 39290010 7ce707b4 7c894c36 79081564 7d293838 7908f082 38e0ffff 79291f24  
[    2.661035] 78e8f00e 7d09402a 7d090074 7929d182 <0b090000> 79070022 5509c03e 5109421e  
[    2.661075] ---[ end trace 173cdb877073211c ]--- 
[    2.745561]  
[    2.941933] usb 2-4: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd 
[    2.975769] usb 2-4: New USB device found, idVendor=0451, idProduct=8140, bcdDevice= 1.00 
[    2.975790] usb 2-4: New USB device strings: Mfr=0, Product=0, SerialNumber=0 
[    2.989529] hub 2-4:1.0: USB hub found 
[    2.990397] hub 2-4:1.0: 4 ports detected 
[    3.121304] usb 1-3: new high-speed USB device number 2 using xhci_hcd 
[    3.303654] usb 1-3: New USB device found, idVendor=0557, idProduct=7000, bcdDevice= 0.00 
[    3.303683] usb 1-3: New USB device strings: Mfr=0, Product=0, SerialNumber=0 
[    3.318042] hub 1-3:1.0: USB hub found 
[    3.318770] hub 1-3:1.0: 4 ports detected 
[    3.471304] usb 1-4: new high-speed USB device number 3 using xhci_hcd 
[    3.656509] usb 1-4: New USB device found, idVendor=0451, idProduct=8142, bcdDevice= 1.00 
[    3.656536] usb 1-4: New USB device strings: Mfr=0, Product=0, SerialNumber=1 
[    3.656550] usb 1-4: SerialNumber: 9E040849E165 
[    3.670034] hub 1-4:1.0: USB hub found 
[    3.670752] hub 1-4:1.0: 4 ports detected 
[    3.745631] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000005 
[    5.497797] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000005 ]--- 
[-- MARK -- Tue Aug 18 04:35:00 2020] 
[-- MARK -- Tue Aug 18 04:40:00 2020]

Comment 1 Dan Horák 2020-08-21 09:52:32 UTC
FWIW I haven't seen this issue on my Power9 system (Talos) running kernel-5.8.0-1.fc33.ppc64le for the last 2 weeks.

Comment 2 IBM Bug Proxy 2020-08-24 13:03:46 UTC
------- Comment From mainamdar.com 2020-08-24 08:58 EDT-------
Any update on this

Comment 3 IBM Bug Proxy 2020-10-21 22:10:29 UTC
------- Comment From gusld.com 2020-10-21 17:59 EDT-------
As I understand it, Aneesh is working on this upstream at https://lore.kernel.org/linux-mm/20200902114222.181353-1-aneesh.kumar@linux.ibm.com/

As a workaround, CONFIG_DEBUG_VM_PGTABLE can be disabled.

Comment 4 Dan Horák 2020-10-23 08:51:01 UTC
And it looks to me that the patchset has been already merged to 5.10


Note You need to log in before you can comment on or make changes to this bug.