Bug 2160760 - [ppc64le] [eln] Workqueue: eval_map_wq tracer_init_tracefs_work_func
Summary: [ppc64le] [eln] Workqueue: eval_map_wq tracer_init_tracefs_work_func
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: ppc64le
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PPCTracker
TreeView+ depends on / blocked
 
Reported: 2023-01-13 15:39 UTC by Bruno Goncalves
Modified: 2023-04-03 19:35 UTC (History)
21 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2023-04-03 19:35:11 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Bruno Goncalves 2023-01-13 15:39:29 UTC
1. Please describe the problem:
Trying to provision Fedora ELN compose using kernel"6.2.0-0.rc3.20230112gite8f60cd7db24.27.eln125.ppc64le" fails with the following issue:

[    0.135887] clocksource: Switched to clocksource timebase 
[    0.137405] Callback from call_rcu_tasks_rude() invoked. 
[    0.405588] VFS: Disk quotas dquot_6.6.0 
[    0.405983] VFS: Dquot-cache hash table entries: 8192 (order 0, 65536 bytes) 
[    0.406455] BUG: Unable to handle kernel data access at 0xc00e0000017865e0 
[    0.406493] Faulting instruction address: 0xc0000000007af360 
[    0.406510] Oops: Kernel access of bad area, sig: 11 [#1] 
[    0.406520] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries 
[    0.406533] Modules linked in: 
[    0.406545] CPU: 2 PID: 9 Comm: kworker/u16:0 Tainted: G        W         -------  ---  6.2.0-0.rc3.20230112gite8f60cd7db24.27.eln125.ppc64le #1 
[    0.406562] Hardware name: IBM,9009-42A POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.10 (VL950_063) hv:phyp pSeries 
[    0.406575] Workqueue: eval_map_wq tracer_init_tracefs_work_func 
[    0.406595] NIP:  c0000000007af360 LR: c0000000007aba2c CTR: c000000000364190 
[    0.406606] REGS: c0000000089a7700 TRAP: 0380   Tainted: G        W         -------  ---   (6.2.0-0.rc3.20230112gite8f60cd7db24.27.eln125.ppc64le) 
[    0.406621] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 44002004  XER: 00000000 
[    0.406663] CFAR: c0000000007aba28 IRQMASK: 0  
[    0.406663] GPR00: c0000000006b9924 c0000000089a79a0 c00000000254fc00 18000000017865e0  
[    0.406663] GPR04: c00000000047b758 0000000000000cc0 0000000000000040 0000000000000dc0  
[    0.406663] GPR08: a80e000000000000 0000000000000000 0000000000000001 0000000000000000  
[    0.406663] GPR12: c000000000364190 c00000001ec9df00 c000000008117068 0000000000000000  
[    0.406663] GPR16: c0000000084dfa08 c0000000084dfa10 c0000000018489e0 c000000004c60a50  
[    0.406663] GPR20: c0000000084dfa20 c0000000084dfa48 c0000000046335a8 c00000000d341980  
[    0.406663] GPR24: c0000000046335e4 c00000000bc32f10 c000000006546940 c000000004c61830  
[    0.406663] GPR28: 0000000000000cc0 c00000000047b758 c00000000bc32f00 c00000000bc32f00  
[    0.406835] NIP [c0000000007af360] kasan_byte_accessible+0x10/0x20 
[    0.406847] LR [c0000000007aba2c] __kasan_check_byte+0x2c/0xa0 
[    0.406861] Call Trace: 
[    0.406867] [c0000000089a79a0] [c0000000046335e4] global_trace+0x1d84/0x1ee0 (unreliable) 
[    0.406888] [c0000000089a79e0] [c0000000006b9924] krealloc+0x54/0x1b0 
[    0.406904] [c0000000089a7a30] [c00000000047b758] create_trace_option_files+0x258/0x520 
[    0.406922] [c0000000089a7b20] [c00000000047bb24] __update_tracer_options+0x74/0xb0 
[    0.406940] [c0000000089a7b60] [c000000003051bb4] tracer_init_tracefs_work_func+0x254/0x288 
[    0.406959] [c0000000089a7bf0] [c0000000001ef484] process_one_work+0x644/0xd80 
[    0.406975] [c0000000089a7d20] [c0000000001efcbc] worker_thread+0xfc/0x770 
[    0.406988] [c0000000089a7df0] [c0000000002037cc] kthread+0x1ec/0x200 
[    0.407006] [c0000000089a7e50] [c00000000000dfa4] ret_from_kernel_thread+0x5c/0x64 
[    0.407022] --- interrupt: 0 at 0x0 
[    0.407032] NIP:  0000000000000000 LR: 0000000000000000 CTR: 0000000000000000 
[    0.407043] REGS: c0000000089a7e80 TRAP: 0000   Tainted: G        W         -------  ---   (6.2.0-0.rc3.20230112gite8f60cd7db24.27.eln125.ppc64le) 
[    0.407058] MSR:  0000000000000000 <>  CR: 00000000  XER: 00000000 
[    0.407075] CFAR: 0000000000000000 IRQMASK: 0  
[    0.407075] GPR00: 0000000000000000 c0000000089a8000 0000000000000000 0000000000000000  
[    0.407075] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[    0.407075] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[    0.407075] GPR12: 0000000000000000 0000000000000000 c0000000002035e8 c0000000080d2080  
[    0.407075] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[    0.407075] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[    0.407075] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[    0.407075] GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[    0.407236] NIP [0000000000000000] 0x0 
[    0.407244] LR [0000000000000000] 0x0 
[    0.407253] --- interrupt: 0 
[    0.407261] Code: 3c4c01da 384208d0 7cc802a6 38a00001 4bfffc58 60000000 60000000 60000000 3d00a80e 7863e8c2 39400001 790807c6 <7d2340ae> 28090007 7c60505e 4e800020  
[    0.407329] ---[ end trace 0000000000000000 ]--- 
[    0.411315] pstore: backend (nvram) writing error (-1) 
[    0.411326]  
[    0.411394] kworker/u16:0 (9) used greatest stack depth: 26704 bytes left 
[    0.417049] NET: Registered PF_INET protocol family 
[    0.417273] IP idents hash table entries: 262144 (order: 5, 2097152 bytes, linear) 
[    0.423967] tcp_listen_portaddr_hash hash table entries: 8192 (order: 3, 655360 bytes, linear) 
[    0.424554] Table-perturb hash table entries: 65536 (order: 2, 262144 bytes, linear) 
[    0.424604] TCP established hash table entries: 131072 (order: 4, 1048576 bytes, linear) 
[    0.428483] TCP bind hash table entries: 65536 (order: 7, 10485760 bytes, linear) 
[    0.443639] TCP: Hash tables configured (established 131072 bind 65536) 
[    0.444174] MPTCP token hash table entries: 16384 (order: 4, 1572864 bytes, linear) 
[    0.445789] UDP hash table entries: 8192 (order: 4, 1572864 bytes, linear) 
[    0.447753] UDP-Lite hash table entries: 8192 (order: 4, 1572864 bytes, linear) 
[    0.449952] NET: Registered PF_UNIX/PF_LOCAL protocol family 
[    0.450017] NET: Registered PF_XDP protocol family 
[    0.450028] PCI: CLS 0 bytes, default 128 
[    0.450272] Trying to unpack rootfs image as initramfs... 
[    0.451869] IOMMU table initialized, virtual merging enabled 
[    0.461248] vas: API is supported only with radix page tables 
[    0.466889] hv-24x7: read 1530 catalog entries, created 509 event attrs (0 failures), 275 descs 
[   11.742061] Freeing initrd memory: 96384K 


2. What is the Version-Release number of the kernel:
kernel-6.2.0-0.rc3.20230112gite8f60cd7db24.27.eln125.ppc64le

Comment 2 Don Zickus 2023-01-23 16:46:53 UTC
@dhorak - do you have any thoughts here?  This is blocking rawhide ppc composes from booting.

Comment 3 Dan Horák 2023-01-23 17:04:29 UTC
I am running all the rawhide "nodebug" rcs and haven't seem anything like that. So it could be specific to ELN config (if there is a difference), or to running under LPAR (I am running on bare metal (aka powernv), also there isn't a problem within KVM guests) ...

Is there a known build where it started to happen?

Comment 4 Dan Horák 2023-01-23 18:23:32 UTC
Fedora-ELN-20230118.1 is OK on a similar machine (9009-41A vs 9009-42A)

Comment 5 Bruno Goncalves 2023-01-24 09:47:07 UTC
(In reply to Dan Horák from comment #4)
> Fedora-ELN-20230118.1 is OK on a similar machine (9009-41A vs 9009-42A)


The last compose we hit this failure was with Fedora-ELN-20230117.0 (kernel-6.2.0-0.rc3.20230113gitd9fc1511728c.28.eln125.ppc64le)

Currently, we are using Fedora-ELN-20230118.1 (kernel- 6.2.0-0.rc4.31.eln125.ppc64le) and I didn't see the panic, but this kernel doesn't have the kernel debug options set...

Comment 6 Dan Horák 2023-01-24 13:04:43 UTC
Can you recheck with the latest debug-enabled kernel (kernel-debug package)? I mean install from an ELN compose and then install the kernel-debug and reboot to it. And maybe install both latest kernel and kernel-debug. If only the debug one fails, then there is something wrong and we will let the upstream powerpc kernel maintainers know.

Comment 8 Bruno Goncalves 2023-01-24 14:20:05 UTC
I also tested with `6.2.0-0.rc4.31.eln125.ppc64le` and it boots fine, once I botted with the debug kernel (6.2.0-0.rc4.31.eln125.ppc64le+debug) it panics

[    0.126194] clocksource: Switched to clocksource timebase 
[    0.127110] Callback from call_rcu_tasks_rude() invoked. 
[    0.380970] VFS: Disk quotas dquot_6.6.0 
[    0.381189] VFS: Dquot-cache hash table entries: 8192 (order 0, 65536 bytes) 
[    0.381482] BUG: Unable to handle kernel data access at 0xc00e000001445b98 
[    0.381518] Faulting instruction address: 0xc0000000007ac6a0 
[    0.381530] Oops: Kernel access of bad area, sig: 11 [#1] 
[    0.381537] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries 
[    0.381545] Modules linked in: 
[    0.381553] CPU: 6 PID: 9 Comm: kworker/u16:0 Tainted: G        W         -------  ---  6.2.0-0.rc4.31.eln125.ppc64le+debug #1 
[    0.381564] Hardware name: IBM,8375-42A POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW940.02 (VL940_041) hv:phyp pSeries 
[    0.381572] Workqueue: eval_map_wq tracer_init_tracefs_work_func 
[    0.381585] NIP:  c0000000007ac6a0 LR: c0000000007a8d6c CTR: c000000000362b60 
[    0.381593] REGS: c000000008977710 TRAP: 0380   Tainted: G        W         -------  ---   (6.2.0-0.rc4.31.eln125.ppc64le+debug) 
[    0.381603] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44002004  XER: 00000000 
[    0.381628] CFAR: c0000000007a8d68 IRQMASK: 0  
[    0.381628] GPR00: c0000000006b7970 c0000000089779b0 c00000000254fc00 1800000001445b98  
[    0.381628] GPR04: c000000000479ba4 0000000000000cc0 0000000000000040 0000000000000dc0  
[    0.381628] GPR08: 0000000000000001 a80e000000000000 c000000008874e00 0000000000000000  
[    0.381628] GPR12: c000000000362b60 c00000001eca8b00 c000000008117068 0000000000000000  
[    0.381628] GPR16: 0000000000000000 c0000000084d4f20 c0000000084d4f10 c0000000084d4f08  
[    0.381628] GPR20: c0000000084d4f28 c0000000084c2405 c0000000046334a8 c00000000d22bf80  
[    0.381628] GPR24: c0000000046334e4 c00000000a22dcd0 c000000006546940 c000000004c61830  
[    0.381628] GPR28: 0000000000000cc0 c000000000479ba4 c00000000a22dcc0 c00000000a22dcc0  
[    0.381743] NIP [c0000000007ac6a0] kasan_byte_accessible+0x10/0x20 
[    0.381755] LR [c0000000007a8d6c] __kasan_check_byte+0x2c/0xa0 
[    0.381765] Call Trace: 
[    0.381769] [c0000000089779b0] [c0000000046334e4] global_trace+0x1d84/0x1ee0 (unreliable) 
[    0.381783] [c0000000089779f0] [c0000000006b7970] krealloc+0x50/0x1e0 
[    0.381795] [c000000008977a40] [c000000000479ba4] create_trace_option_files+0x264/0x520 
[    0.381807] [c000000008977b20] [c000000000479f64] __update_tracer_options+0x74/0xb0 
[    0.381819] [c000000008977b60] [c00000000305163c] tracer_init_tracefs_work_func+0x218/0x284 
[    0.381832] [c000000008977bf0] [c0000000001ee620] process_one_work+0x5e0/0xd70 
[    0.381844] [c000000008977d20] [c0000000001eeeac] worker_thread+0xfc/0x770 
[    0.381855] [c000000008977df0] [c00000000020296c] kthread+0x1ec/0x200 
[    0.381867] [c000000008977e50] [c00000000000dfa4] ret_from_kernel_thread+0x5c/0x64 
[    0.381878] --- interrupt: 0 at 0x0 
[    0.381885] NIP:  0000000000000000 LR: 0000000000000000 CTR: 0000000000000000 
[    0.381892] REGS: c000000008977e80 TRAP: 0000   Tainted: G        W         -------  ---   (6.2.0-0.rc4.31.eln125.ppc64le+debug) 
[    0.381901] MSR:  0000000000000000 <>  CR: 00000000  XER: 00000000 
[    0.381912] CFAR: 0000000000000000 IRQMASK: 0  
[    0.381912] GPR00: 0000000000000000 c000000008978000 0000000000000000 0000000000000000  
[    0.381912] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[    0.381912] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[    0.381912] GPR12: 0000000000000000 0000000000000000 c000000000202788 c0000000080d2080  
[    0.381912] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[    0.381912] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[    0.381912] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[    0.381912] GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[    0.382021] NIP [0000000000000000] 0x0 
[    0.382028] LR [0000000000000000] 0x0 
[    0.382033] --- interrupt: 0 
[    0.382038] Code: 3c4c01da 38423590 7cc802a6 38a00001 4bfffc58 60000000 60000000 60000000 3d20a80e 7863e8c2 39000001 792907c6 <7d2348ae> 28090007 7c60405e 4e800020  
[    0.382083] ---[ end trace 0000000000000000 ]--- 
[    0.384868] pstore: backend (nvram) writing error (-1) 
[    0.384874]

Comment 9 Dan Horák 2023-01-24 15:03:58 UTC
Thanks for the check, adding Michael as the powerpc kernel maintainer then.

Comment 10 Michael Ellerman 2023-01-25 04:45:38 UTC
The combination of hash MMU and KASAN does not work, it crashes like above.

There was some discussion on how to handle it better, but it needs more development effort to reach a solution:

http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20221004223724.38707-1-nathanl@linux.ibm.com/

I'm not sure why your machine has booted with the hash MMU, it should default to Radix because it's a Power9.

Comment 12 Dan Horák 2023-01-27 12:35:02 UTC
Thanks for the comments, Michael.

Bruno, Don, is it intentional the VM/LPAR is configured with hash MMU?

Comment 15 Dan Horák 2023-04-03 19:35:11 UTC
recent kernels boot on LPAR with hash MMU, thus closing


Note You need to log in before you can comment on or make changes to this bug.