Bug 2160760
Summary: | [ppc64le] [eln] Workqueue: eval_map_wq tracer_init_tracefs_work_func | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bruno Goncalves <bgoncalv> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | acaringi, adscvr, airlied, alciregi, bskeggs, dan, dzickus, ellerman, hdegoede, hpa, jarodwilson, jglisse, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, ptalbert, scweaver, steved |
Target Milestone: | --- | Keywords: | Regression, TestBlocker |
Target Release: | --- | ||
Hardware: | ppc64le | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-04-03 19:35:11 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1071880 |
Description
Bruno Goncalves
2023-01-13 15:39:29 UTC
@dhorak - do you have any thoughts here? This is blocking rawhide ppc composes from booting. I am running all the rawhide "nodebug" rcs and haven't seem anything like that. So it could be specific to ELN config (if there is a difference), or to running under LPAR (I am running on bare metal (aka powernv), also there isn't a problem within KVM guests) ... Is there a known build where it started to happen? Fedora-ELN-20230118.1 is OK on a similar machine (9009-41A vs 9009-42A) (In reply to Dan Horák from comment #4) > Fedora-ELN-20230118.1 is OK on a similar machine (9009-41A vs 9009-42A) The last compose we hit this failure was with Fedora-ELN-20230117.0 (kernel-6.2.0-0.rc3.20230113gitd9fc1511728c.28.eln125.ppc64le) Currently, we are using Fedora-ELN-20230118.1 (kernel- 6.2.0-0.rc4.31.eln125.ppc64le) and I didn't see the panic, but this kernel doesn't have the kernel debug options set... Can you recheck with the latest debug-enabled kernel (kernel-debug package)? I mean install from an ELN compose and then install the kernel-debug and reboot to it. And maybe install both latest kernel and kernel-debug. If only the debug one fails, then there is something wrong and we will let the upstream powerpc kernel maintainers know. I also tested with `6.2.0-0.rc4.31.eln125.ppc64le` and it boots fine, once I botted with the debug kernel (6.2.0-0.rc4.31.eln125.ppc64le+debug) it panics [ 0.126194] clocksource: Switched to clocksource timebase [ 0.127110] Callback from call_rcu_tasks_rude() invoked. [ 0.380970] VFS: Disk quotas dquot_6.6.0 [ 0.381189] VFS: Dquot-cache hash table entries: 8192 (order 0, 65536 bytes) [ 0.381482] BUG: Unable to handle kernel data access at 0xc00e000001445b98 [ 0.381518] Faulting instruction address: 0xc0000000007ac6a0 [ 0.381530] Oops: Kernel access of bad area, sig: 11 [#1] [ 0.381537] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 0.381545] Modules linked in: [ 0.381553] CPU: 6 PID: 9 Comm: kworker/u16:0 Tainted: G W ------- --- 6.2.0-0.rc4.31.eln125.ppc64le+debug #1 [ 0.381564] Hardware name: IBM,8375-42A POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW940.02 (VL940_041) hv:phyp pSeries [ 0.381572] Workqueue: eval_map_wq tracer_init_tracefs_work_func [ 0.381585] NIP: c0000000007ac6a0 LR: c0000000007a8d6c CTR: c000000000362b60 [ 0.381593] REGS: c000000008977710 TRAP: 0380 Tainted: G W ------- --- (6.2.0-0.rc4.31.eln125.ppc64le+debug) [ 0.381603] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44002004 XER: 00000000 [ 0.381628] CFAR: c0000000007a8d68 IRQMASK: 0 [ 0.381628] GPR00: c0000000006b7970 c0000000089779b0 c00000000254fc00 1800000001445b98 [ 0.381628] GPR04: c000000000479ba4 0000000000000cc0 0000000000000040 0000000000000dc0 [ 0.381628] GPR08: 0000000000000001 a80e000000000000 c000000008874e00 0000000000000000 [ 0.381628] GPR12: c000000000362b60 c00000001eca8b00 c000000008117068 0000000000000000 [ 0.381628] GPR16: 0000000000000000 c0000000084d4f20 c0000000084d4f10 c0000000084d4f08 [ 0.381628] GPR20: c0000000084d4f28 c0000000084c2405 c0000000046334a8 c00000000d22bf80 [ 0.381628] GPR24: c0000000046334e4 c00000000a22dcd0 c000000006546940 c000000004c61830 [ 0.381628] GPR28: 0000000000000cc0 c000000000479ba4 c00000000a22dcc0 c00000000a22dcc0 [ 0.381743] NIP [c0000000007ac6a0] kasan_byte_accessible+0x10/0x20 [ 0.381755] LR [c0000000007a8d6c] __kasan_check_byte+0x2c/0xa0 [ 0.381765] Call Trace: [ 0.381769] [c0000000089779b0] [c0000000046334e4] global_trace+0x1d84/0x1ee0 (unreliable) [ 0.381783] [c0000000089779f0] [c0000000006b7970] krealloc+0x50/0x1e0 [ 0.381795] [c000000008977a40] [c000000000479ba4] create_trace_option_files+0x264/0x520 [ 0.381807] [c000000008977b20] [c000000000479f64] __update_tracer_options+0x74/0xb0 [ 0.381819] [c000000008977b60] [c00000000305163c] tracer_init_tracefs_work_func+0x218/0x284 [ 0.381832] [c000000008977bf0] [c0000000001ee620] process_one_work+0x5e0/0xd70 [ 0.381844] [c000000008977d20] [c0000000001eeeac] worker_thread+0xfc/0x770 [ 0.381855] [c000000008977df0] [c00000000020296c] kthread+0x1ec/0x200 [ 0.381867] [c000000008977e50] [c00000000000dfa4] ret_from_kernel_thread+0x5c/0x64 [ 0.381878] --- interrupt: 0 at 0x0 [ 0.381885] NIP: 0000000000000000 LR: 0000000000000000 CTR: 0000000000000000 [ 0.381892] REGS: c000000008977e80 TRAP: 0000 Tainted: G W ------- --- (6.2.0-0.rc4.31.eln125.ppc64le+debug) [ 0.381901] MSR: 0000000000000000 <> CR: 00000000 XER: 00000000 [ 0.381912] CFAR: 0000000000000000 IRQMASK: 0 [ 0.381912] GPR00: 0000000000000000 c000000008978000 0000000000000000 0000000000000000 [ 0.381912] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.381912] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.381912] GPR12: 0000000000000000 0000000000000000 c000000000202788 c0000000080d2080 [ 0.381912] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.381912] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.381912] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.381912] GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.382021] NIP [0000000000000000] 0x0 [ 0.382028] LR [0000000000000000] 0x0 [ 0.382033] --- interrupt: 0 [ 0.382038] Code: 3c4c01da 38423590 7cc802a6 38a00001 4bfffc58 60000000 60000000 60000000 3d20a80e 7863e8c2 39000001 792907c6 <7d2348ae> 28090007 7c60405e 4e800020 [ 0.382083] ---[ end trace 0000000000000000 ]--- [ 0.384868] pstore: backend (nvram) writing error (-1) [ 0.384874] Thanks for the check, adding Michael as the powerpc kernel maintainer then. The combination of hash MMU and KASAN does not work, it crashes like above. There was some discussion on how to handle it better, but it needs more development effort to reach a solution: http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20221004223724.38707-1-nathanl@linux.ibm.com/ I'm not sure why your machine has booted with the hash MMU, it should default to Radix because it's a Power9. Thanks for the comments, Michael. Bruno, Don, is it intentional the VM/LPAR is configured with hash MMU? recent kernels boot on LPAR with hash MMU, thus closing |