Bug 2343283 - crashes with 6.12.x f41 kernels on power9 hypervisors
Summary: crashes with 6.12.x f41 kernels on power9 hypervisors
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 41
Hardware: powerpc
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2025-01-31 21:14 UTC by Kevin Fenzi
Modified: 2025-06-05 04:04 UTC (History)
16 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Kevin Fenzi 2025-01-31 21:14:38 UTC
We have been getting lockups/crashes of our power9 hypervisors (that run a bunch of buildvm's).

They are on f41 and when this happens, the machine just becomes unresponsive and we have to power cycle it. ;( 

In remote (but not local) logs we see:

Jan 31 15:57:09 bvmhost-p09-03.iad2.fedoraproject.org kernel: Failed to allocate a TCE me
mory, level shift=17
Jan 31 15:57:09 bvmhost-p09-03.iad2.fedoraproject.org kernel: Kernel attempted to write u
ser page (0) - exploit attempt? (uid: 0)

It's sporadic, but I am not sure what triggers it.
It seems to happen more often with 6.12.10...

Reproducible: Always

Comment 1 Aditi Mishra 2025-05-07 07:08:44 UTC
Hey Kevin Fenzi, what machine setup are you using lpar or qemu/kvm vm?

Comment 2 Kevin Fenzi 2025-05-08 01:11:54 UTC
    description: PowerNV
    product: 9006-22P (supermicro,p9dsu2u)
    vendor: IBM

I am still seeing this crash sporadically on kernel-6.14.4-200.fc41.ppc64le

Comment 3 Aditi Mishra 2025-05-09 07:15:26 UTC
Hey Kevin Fenzi, 

can you send logs? I'll try to reproduce in my end also. Just for clearance, you are using P9 baremetal machine and its having F41. So I have one doubt here, what exactly you are running on that machine, you said "bunch of buildvm" I'm not aware of it, can you clear my doubt? 

Thanks and regards,
Aditi Mishra

Comment 4 Kevin Fenzi 2025-05-17 19:32:24 UTC
So, this seems to be happening with newer kernels too. ;( 

Here's a oops I managed to get from the serial console with a crash on 6.14.4-200.fc41.ppc64le

[29824.091547] Failed to allocate a TCE memory, level shift=17                                  
[29824.091547] Failed to allocate a TCE memory, level shift=17                                  
[29824.094831] Kernel attempted to write user page (0) - exploit attempt? (uid: 0)              
[29824.094831] Kernel attempted to write user page (0) - exploit attempt? (uid: 0)              
[29824.101082] BUG: Kernel NULL pointer dereference on write at 0x00000000                
[29824.101082] BUG: Kernel NULL pointer dereference on write at 0x00000000                 
[29824.106291] Faulting instruction address: 0xc0000000001ba12c                            
[29824.106291] Faulting instruction address: 0xc0000000001ba12c                                 
[29824.110483] Oops: Kernel access of bad area, sig: 11 [#1]                                    
[29824.110483] Oops: Kernel access of bad area, sig: 11 [#1]                                    
[29824.115669] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV                         
[29824.115669] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV                         
[29824.120860] Modules linked in: vhost_net vhost vhost_iotlb tap tun kvm_hv kvm dm_service_time
 binfmt_misc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp llc rfkill nft_reje
ct_ipv6 nf_reject_ipv6 nft_chain_nat nf_nat nft_reject_ipv4 nf_reject_ipv4 nft_reject nft_ct nf_
conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables vfat fat ses ofpart powernv_flash enclosure tp
m_i2c_nuvoton scsi_transport_sas at24 onboard_usb_dev mtd ipmi_powernv opal_prd vmx_crypto i2c_o
pal rtc_opal ipmi_devintf ipmi_msghandler joydev auth_rpcgss fuse dm_multipath loop sunrpc nfnet
link zram lz4hc_compress lz4_compress xfs dm_crypt raid456 async_raid6_recov async_memcpy async_
pq async_xor async_tx raid1 i40e megaraid_sas aacraid ast i2c_algo_bit libie scsi_dh_rdac scsi_d
h_emc scsi_dh_alua aes_gcm_p10_crypto crypto_simd cryptd                                  
[29824.120860] Modules linked in: vhost_net vhost vhost_iotlb tap tun kvm_hv kvm dm_service_time
 binfmt_misc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp llc rfkill nft_reje
ct_ipv6 nf_reject_ipv6 nft_chain_nat nf_nat nft_reject_ipv4 nf_reject_ipv4 nft_reject nft_ct nf_
conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables vfat fat ses ofpart powernv_flash enclosure tp
m_i2c_nuvoton scsi_transport_sas at24 onboard_usb_dev mtd ipmi_powernv opal_prd vmx_crypto i2c_o
pal rtc_opal ipmi_devintf ipmi_msghandler joydev auth_rpcgss fuse dm_multipath loop sunrpc nfnet
link zram lz4hc_compress lz4_compress xfs dm_crypt raid456 async_raid6_recov async_memcpy async_
pq async_xor async_tx raid1 i40e megaraid_sas aacraid ast i2c_algo_bit libie scsi_dh_rdac scsi_d
h_emc scsi_dh_alua aes_gcm_p10_crypto crypto_simd cryptd
[29824.171150] CPU: 122 UID: 0 PID: 4660 Comm: md2_raid6 Not tainted 6.14.4-200.fc41.ppc64le #1
[29824.171150] CPU: 122 UID: 0 PID: 4660 Comm: md2_raid6 Not tainted 6.14.4-200.fc41.ppc64le #1
[29824.174438] Hardware name: 9006-22P POWER9 0x4e1202 opal:skiboot-v6.0.23 PowerNV
[29824.174438] Hardware name: 9006-22P POWER9 0x4e1202 opal:skiboot-v6.0.23 PowerNV
[29824.180661] NIP:  c0000000001ba12c LR: c0000000001ba128 CTR: 0000000000000000
[29824.180661] NIP:  c0000000001ba12c LR: c0000000001ba128 CTR: 0000000000000000
[29824.185901] REGS: c00020006c0331d0 TRAP: 0300   Not tainted  (6.14.4-200.fc41.ppc64le)
[29824.185901] REGS: c00020006c0331d0 TRAP: 0300   Not tainted  (6.14.4-200.fc41.ppc64le)
[29824.191144] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 24004422  XER: 2004002d
[29824.191144] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 24004422  XER: 2004002d
[29824.195428] CFAR: c0000000001b9d90 DAR: 0000000000000000 DSISR: 42000000 IRQMASK: 0 
[29824.195428] GPR00: c0000000001ba128 c00020006c033470 c00000000250aa00 0000000000000000 
[29824.195428] GPR04: c000201ffabd7188 c000201ffabe5280 c00020006c0331b8 0000201ff7a00000 
[29824.195428] GPR08: 0000000000000027 0000000000000000 0000000000000000 000000008800482c 
[29824.195428] GPR12: 0000000000000004 c000201fff66f380 0000000000000001 0000000000000001 
[29824.195428] GPR16: c0002000980202c0 7fffffffffffffff c000200e3cf84000 c0002000980202c0 
[29824.195428] GPR20: 0000000000000000 c0002000203ed0a8 0000000000000001 0000000000001000 
[29824.195428] GPR24: 0000000000000003 0000000001d34000 00000000200e3cf8 0000000000000010 
[29824.195428] GPR28: c00020002027f000 0000000000000000 0000000000000001 0000200e3cf80003 
[29824.195428] CFAR: c0000000001b9d90 DAR: 0000000000000000 DSISR: 42000000 IRQMASK: 0 
[29824.195428] GPR00: c0000000001ba128 c00020006c033470 c00000000250aa00 0000000000000000 
[29824.195428] GPR04: c000201ffabd7188 c000201ffabe5280 c00020006c0331b8 0000201ff7a00000 
[29824.195428] GPR08: 0000000000000027 0000000000000000 0000000000000000 000000008800482c 
[29824.195428] GPR12: 0000000000000004 c000201fff66f380 0000000000000001 0000000000000001 
[29824.195428] GPR16: c0002000980202c0 7fffffffffffffff c000200e3cf84000 c0002000980202c0 
[29824.195428] GPR20: 0000000000000000 c0002000203ed0a8 0000000000000001 0000000000001000 
[29824.195428] GPR24: 0000000000000003 0000000001d34000 00000000200e3cf8 0000000000000010 
[29824.195428] GPR28: c00020002027f000 0000000000000000 0000000000000001 0000200e3cf80003 
[29824.250227] NIP [c0000000001ba12c] pnv_tce_build+0xcc/0x140
[29824.250227] NIP [c0000000001ba12c] pnv_tce_build+0xcc/0x140
[29824.254986] LR [c0000000001ba128] pnv_tce_build+0xc8/0x140
[29824.254986] LR [c0000000001ba128] pnv_tce_build+0xc8/0x140
[29824.259719] Call Trace:
[29824.259719] Call Trace:
[29824.261353] [c00020006c033470] [c0000000001ba128] pnv_tce_build+0xc8/0x140 (unreliable)      
[29824.261353] [c00020006c033470] [c0000000001ba128] pnv_tce_build+0xc8/0x140 (unreliable)      
[29824.268155] [c00020006c0334d0] [c0000000001b5958] pnv_ioda2_tce_build+0x38/0xb0              
[29824.268155] [c00020006c0334d0] [c0000000001b5958] pnv_ioda2_tce_build+0x38/0xb0              
[29824.273930] [c00020006c033510] [c00000000005be68] ppc_iommu_map_sg+0x1f8/0x6c0               
[29824.273930] [c00020006c033510] [c00000000005be68] ppc_iommu_map_sg+0x1f8/0x6c0
[29824.278697] [c00020006c033640] [c00000000005a564] dma_iommu_map_sg+0x54/0x70
[29824.278697] [c00020006c033640] [c00000000005a564] dma_iommu_map_sg+0x54/0x70
[29824.284459] [c00020006c033660] [c00000000037ad38] __dma_map_sg_attrs+0x1f8/0x2c0
[29824.284459] [c00020006c033660] [c00000000037ad38] __dma_map_sg_attrs+0x1f8/0x2c0
[29824.289241] [c00020006c0336c0] [c00000000037ae1c] dma_map_sg_attrs+0x1c/0x40
[29824.289241] [c00020006c0336c0] [c00000000037ae1c] dma_map_sg_attrs+0x1c/0x40
[29824.295056] [c00020006c0336e0] [c0000000010c202c] scsi_dma_map+0x5c/0x90
[29824.295056] [c00020006c0336e0] [c0000000010c202c] scsi_dma_map+0x5c/0x90
[29824.299806] [c00020006c033700] [c008000017283f0c] megasas_build_io_fusion+0x264/0x3e0 [megara
id_sas]
[29824.299806] [c00020006c033700] [c008000017283f0c] megasas_build_io_fusion+0x264/0x3e0 [megara
id_sas]
[29824.303661] [c00020006c0337b0] [c008000017285868] megasas_build_and_issue_cmd_fusion+0xd0/0x2
f0 [megaraid_sas]
[29824.303661] [c00020006c0337b0] [c008000017285868] megasas_build_and_issue_cmd_fusion+0xd0/0x2
f0 [megaraid_sas]
[29824.309541] [c00020006c033840] [c008000017271844] megasas_queue_command+0x11c/0x280 [megaraid
_sas]
[29824.309541] [c00020006c033840] [c008000017271844] megasas_queue_command+0x11c/0x280 [megaraid
_sas]
[29824.316385] [c00020006c0338a0] [c0000000010bf0ec] scsi_dispatch_cmd+0xbc/0x2e0
[29824.316385] [c00020006c0338a0] [c0000000010bf0ec] scsi_dispatch_cmd+0xbc/0x2e0
[29824.322203] [c00020006c033920] [c0000000010c1078] scsi_queue_rq+0x598/0x880
[29824.322203] [c00020006c033920] [c0000000010c1078] scsi_queue_rq+0x598/0x880
[29824.326985] [c00020006c0339d0] [c000000000c712d0] blk_mq_dispatch_rq_list+0x170/0x690
[29824.326985] [c00020006c0339d0] [c000000000c712d0] blk_mq_dispatch_rq_list+0x170/0x690
[29824.329793] [c00020006c033a70] [c000000000c7a898] __blk_mq_do_dispatch_sched+0x458/0x480
[29824.329793] [c00020006c033a70] [c000000000c7a898] __blk_mq_do_dispatch_sched+0x458/0x480
[29824.334596] [c00020006c033b20] [c000000000c7af28] __blk_mq_sched_dispatch_requests+0x1d8/0x25
0
[29824.334596] [c00020006c033b20] [c000000000c7af28] __blk_mq_sched_dispatch_requests+0x1d8/0x25
0
[29824.338453] [c00020006c033b90] [c000000000c7b034] blk_mq_sched_dispatch_requests+0x44/0xb0
[29824.338453] [c00020006c033b90] [c000000000c7b034] blk_mq_sched_dispatch_requests+0x44/0xb0
[29824.345286] [c00020006c033bc0] [c000000000c6b1b8] blk_mq_run_hw_queue+0x348/0x3e0
[29824.345286] [c00020006c033bc0] [c000000000c6b1b8] blk_mq_run_hw_queue+0x348/0x3e0
[29824.350088] [c00020006c033c10] [c000000000c705d4] blk_mq_dispatch_plug_list+0x1c4/0x470
[29824.350088] [c00020006c033c10] [c000000000c705d4] blk_mq_dispatch_plug_list+0x1c4/0x470
[29824.354890] [c00020006c033cc0] [c000000000c7196c] blk_mq_flush_plug_list+0x17c/0x230
[29824.354890] [c00020006c033cc0] [c000000000c7196c] blk_mq_flush_plug_list+0x17c/0x230
[29824.358711] [c00020006c033d10] [c000000000c593ac] __blk_flush_plug+0x14c/0x1e0
[29824.358711] [c00020006c033d10] [c000000000c593ac] __blk_flush_plug+0x14c/0x1e0
[29824.362529] [c00020006c033d90] [c000000000c59710] blk_finish_plug+0x40/0x60
[29824.362529] [c00020006c033d90] [c000000000c59710] blk_finish_plug+0x40/0x60
[29824.367340] [c00020006c033dc0] [c008000016348cd8] raid5d+0x530/0x720 [raid456]
[29824.367340] [c00020006c033dc0] [c008000016348cd8] raid5d+0x530/0x720 [raid456]
[29824.371142] [c00020006c033f10] [c00000000126f15c] md_thread+0x10c/0x270
[29824.371142] [c00020006c033f10] [c00000000126f15c] md_thread+0x10c/0x270
[29824.375923] [c00020006c033f90] [c00000000028a1b8] kthread+0x138/0x150
[29824.375923] [c00020006c033f90] [c00000000028a1b8] kthread+0x138/0x150
[29824.380658] [c00020006c033fe0] [c00000000000ded8] start_kernel_thread+0x14/0x18
[29824.380658] [c00020006c033fe0] [c00000000000ded8] start_kernel_thread+0x14/0x18
[29824.383404] Code: 41810088 e8bc0028 38c00001 38800000 7f83e378 7fffd836 7fffc378 7ca5c850 7ca
5f214 3bde0001 4bfffa05 37bdffff <7fe01d28> 4082ffc0 eb610038 38210060 
[29824.383404] Code: 41810088 e8bc0028 38c00001 38800000 7f83e378 7fffd836 7fffc378 7ca5c850 7ca
5f214 3bde0001 4bfffa05 37bdffff <7fe01d28> 4082ffc0 eb610038 38210060 
[29824.392999] ---[ end trace 0000000000000000 ]---
[29824.392999] ---[ end trace 0000000000000000 ]---
[29824.490538] pstore: backend (nvram) writing error (-1)
[29824.490538] pstore: backend (nvram) writing error (-1)
[29824.494386] 
[29824.494386] 
[29824.495028] note: md2_raid6[4660] exited with irqs disabled
[29824.495028] note: md2_raid6[4660] exited with irqs disabled
[29824.495028] note: md2_raid6[4660] exited with irqs disabled
[29824.495028] note: md2_raid6[4660] exited with irqs disabled
[29824.495911] ------------[ cut here ]------------
[29824.495911] ------------[ cut here ]------------
[29824.499027] WARNING: CPU: 122 PID: 4660 at kernel/exit.c:885 do_exit+0x98/0x5d0
[29824.499027] WARNING: CPU: 122 PID: 4660 at kernel/exit.c:885 do_exit+0x98/0x5d0
[29824.502822] Modules linked in: vhost_net vhost vhost_iotlb tap tun kvm_hv kvm dm_service_time
 binfmt_misc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp llc rfkill nft_reje
ct_ipv6 nf_reject_ipv6 nft_chain_nat nf_nat nft_reject_ipv4 nf_reject_ipv4 nft_reject nft_ct nf_
conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables vfat fat ses ofpart powernv_flash enclosure tp
m_i2c_nuvoton scsi_transport_sas at24 onboard_usb_dev mtd ipmi_powernv opal_prd vmx_crypto i2c_o
pal rtc_opal ipmi_devintf ipmi_msghandler joydev auth_rpcgss fuse dm_multipath loop sunrpc nfnet
link zram lz4hc_compress lz4_compress xfs dm_crypt raid456 async_raid6_recov async_memcpy async_
pq async_xor async_tx raid1 i40e megaraid_sas aacraid ast i2c_algo_bit libie scsi_dh_rdac scsi_d
h_emc scsi_dh_alua aes_gcm_p10_crypto crypto_simd cryptd
[29824.502822] Modules linked in: vhost_net vhost vhost_iotlb tap tun kvm_hv kvm dm_service_time
 binfmt_misc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp llc rfkill nft_reje
ct_ipv6 nf_reject_ipv6 nft_chain_nat nf_nat nft_reject_ipv4 nf_reject_ipv4 nft_reject nft_ct nf_
conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables vfat fat ses ofpart powernv_flash enclosure tp
m_i2c_nuvoton scsi_transport_sas at24 onboard_usb_dev mtd ipmi_powernv opal_prd vmx_crypto i2c_o
pal rtc_opal ipmi_devintf ipmi_msghandler joydev auth_rpcgss fuse dm_multipath loop sunrpc nfnet
link zram lz4hc_compress lz4_compress xfs dm_crypt raid456 async_raid6_recov async_memcpy async_
pq async_xor async_tx raid1 i40e megaraid_sas aacraid ast i2c_algo_bit libie scsi_dh_rdac scsi_d
h_emc scsi_dh_alua aes_gcm_p10_crypto crypto_simd cryptd
[29824.530090] CPU: 122 UID: 0 PID: 4660 Comm: md2_raid6 Tainted: G      D            6.14.4-200
.fc41.ppc64le #1
[29824.530090] CPU: 122 UID: 0 PID: 4660 Comm: md2_raid6 Tainted: G      D            6.14.4-200
.fc41.ppc64le #1
[29824.535012] Tainted: [D]=DIE
[29824.535012] Tainted: [D]=DIE
[29824.537720] Hardware name: 9006-22P POWER9 0x4e1202 opal:skiboot-v6.0.23 PowerNV
[29824.537720] Hardware name: 9006-22P POWER9 0x4e1202 opal:skiboot-v6.0.23 PowerNV
[29824.539554] NIP:  c000000000251a68 LR: c000000000251a5c CTR: 0000000000000000
[29824.539554] NIP:  c000000000251a68 LR: c000000000251a5c CTR: 0000000000000000
[29824.543412] REGS: c00020006c032cf0 TRAP: 0700   Tainted: G      D             (6.14.4-200.fc4
1.ppc64le)
[29824.543412] REGS: c00020006c032cf0 TRAP: 0700   Tainted: G      D             (6.14.4-200.fc4
1.ppc64le)
[29824.548332] MSR:  9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 24004422  XER: 2004002d
[29824.548332] MSR:  9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 24004422  XER: 2004002d
[29824.555239] CFAR: c00000000024f468 IRQMASK: 0  
[29824.537720] Hardware name: 9006-22P POWER9 0x4e1202 opal:skiboot-v6.0.23 PowerNV
[29824.537720] Hardware name: 9006-22P POWER9 0x4e1202 opal:skiboot-v6.0.23 PowerNV
[29824.539554] NIP:  c000000000251a68 LR: c000000000251a5c CTR: 0000000000000000
[29824.539554] NIP:  c000000000251a68 LR: c000000000251a5c CTR: 0000000000000000
[29824.543412] REGS: c00020006c032cf0 TRAP: 0700   Tainted: G      D             (6.14.4-200.fc4
1.ppc64le)
[29824.543412] REGS: c00020006c032cf0 TRAP: 0700   Tainted: G      D             (6.14.4-200.fc4
1.ppc64le)
[29824.548332] MSR:  9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 24004422  XER: 2004002d
[29824.548332] MSR:  9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 24004422  XER: 2004002d
[29824.555239] CFAR: c00000000024f468 IRQMASK: 0  
[29824.555239] GPR00: c000000000251a5c c00020006c032f90 c00000000250aa00 0000000000000000 
[29824.555239] GPR04: 0000000000002710 00000000ffff7fff 00000000000000f5 fffffffffffe0000 
[29824.555239] GPR08: 0000000000000000 0000000000000001 c00020006c033e38 0000000000004000 
[29824.555239] GPR12: 0000000000000004 c000201fff66f380 0000000000000000 0000000000000000 
[29824.555239] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[29824.555239] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[29824.555239] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[29824.555239] GPR28: 000000000000000b c00000004fe31000 c0000000500a7500 c00020006afdff80 
[29824.555239] CFAR: c00000000024f468 IRQMASK: 0  
[29824.555239] GPR00: c000000000251a5c c00020006c032f90 c00000000250aa00 0000000000000000 
[29824.555239] GPR04: 0000000000002710 00000000ffff7fff 00000000000000f5 fffffffffffe0000 
[29824.555239] GPR08: 0000000000000000 0000000000000001 c00020006c033e38 0000000000004000 
[29824.555239] GPR12: 0000000000000004 c000201fff66f380 0000000000000000 0000000000000000 
[29824.555239] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[29824.555239] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[29824.555239] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[29824.555239] GPR28: 000000000000000b c00000004fe31000 c0000000500a7500 c00020006afdff80 
[29824.594987] NIP [c000000000251a68] do_exit+0x98/0x5d0
[29824.594987] NIP [c000000000251a68] do_exit+0x98/0x5d0
[29824.598232] LR [c000000000251a5c] do_exit+0x8c/0x5d0
[29824.598232] LR [c000000000251a5c] do_exit+0x8c/0x5d0
[29824.600449] Call Trace:
[29824.600449] Call Trace:
[29824.602566] [c00020006c032f90] [c000000000251a5c] do_exit+0x8c/0x5d0 (unreliable)
[29824.602566] [c00020006c032f90] [c000000000251a5c] do_exit+0x8c/0x5d0 (unreliable)
[29824.608917] [c00020006c033030] [c00000000025204c] make_task_dead+0xac/0x1b0
[29824.608917] [c00020006c033030] [c00000000025204c] make_task_dead+0xac/0x1b0
[29824.613237] [c00020006c0330b0] [c000000000029524] oops_end+0x164/0x1a0
[29824.613237] [c00020006c0330b0] [c000000000029524] oops_end+0x164/0x1a0
[29824.616517] [c00020006c033130] [c000000000151ca8] __bad_page_fault+0x188/0x1b0
[29824.616517] [c00020006c033130] [c000000000151ca8] __bad_page_fault+0x188/0x1b0
[29824.621805] [c00020006c0331a0] [c000000000008be0] data_access_common_virt+0x210/0x220
[29824.621805] [c00020006c0331a0] [c000000000008be0] data_access_common_virt+0x210/0x220
[29824.626106] --- interrupt: 300 at pnv_tce_build+0xcc/0x140
[29824.626106] --- interrupt: 300 at pnv_tce_build+0xcc/0x140
[29824.626106] --- interrupt: 300 at pnv_tce_build+0xcc/0x140
[29824.626106] --- interrupt: 300 at pnv_tce_build+0xcc/0x140
[29824.629350] NIP:  c0000000001ba12c LR: c0000000001ba128 CTR: 0000000000000000
[29824.629350] NIP:  c0000000001ba12c LR: c0000000001ba128 CTR: 0000000000000000
[29824.634667] REGS: c00020006c0331d0 TRAP: 0300   Tainted: G      D             (6.14.4-200.fc4
1.ppc64le)
[29824.634667] REGS: c00020006c0331d0 TRAP: 0300   Tainted: G      D             (6.14.4-200.fc4
1.ppc64le)
[29824.641094] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 24004422  XER: 2004002d
[29824.641094] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 24004422  XER: 2004002d
[29824.646489] CFAR: c0000000001b9d90 DAR: 0000000000000000 DSISR: 42000000 IRQMASK: 0 
[29824.646489] GPR00: c0000000001ba128 c00020006c033470 c00000000250aa00 0000000000000000 
[29824.646489] GPR04: c000201ffabd7188 c000201ffabe5280 c00020006c0331b8 0000201ff7a00000 
[29824.646489] GPR08: 0000000000000027 0000000000000000 0000000000000000 000000008800482c 
[29824.646489] GPR12: 0000000000000004 c000201fff66f380 0000000000000001 0000000000000001 
[29824.646489] GPR16: c0002000980202c0 7fffffffffffffff c000200e3cf84000 c0002000980202c0 
[29824.646489] GPR20: 0000000000000000 c0002000203ed0a8 0000000000000001 0000000000001000 
[29824.646489] GPR24: 0000000000000003 0000000001d34000 00000000200e3cf8 0000000000000010 
[29824.646489] GPR28: c00020002027f000 0000000000000000 0000000000000001 0000200e3cf80003 
[29824.646489] CFAR: c0000000001b9d90 DAR: 0000000000000000 DSISR: 42000000 IRQMASK: 0 
[29824.646489] GPR00: c0000000001ba128 c00020006c033470 c00000000250aa00 0000000000000000 
[29824.646489] GPR04: c000201ffabd7188 c000201ffabe5280 c00020006c0331b8 0000201ff7a00000 
[29824.646489] GPR08: 0000000000000027 0000000000000000 0000000000000000 000000008800482c 
[29824.646489] GPR12: 0000000000000004 c000201fff66f380 0000000000000001 0000000000000001 
[29824.646489] GPR16: c0002000980202c0 7fffffffffffffff c000200e3cf84000 c0002000980202c0 
[29824.646489] GPR20: 0000000000000000 c0002000203ed0a8 0000000000000001 0000000000001000 
[29824.646489] GPR24: 0000000000000003 0000000001d34000 00000000200e3cf8 0000000000000010 
[29824.646489] GPR28: c00020002027f000 0000000000000000 0000000000000001 0000200e3cf80003 
[29824.686780] NIP [c0000000001ba12c] pnv_tce_build+0xcc/0x140
[29824.686780] NIP [c0000000001ba12c] pnv_tce_build+0xcc/0x140
[29824.692046] LR [c0000000001ba128] pnv_tce_build+0xc8/0x140
[29824.692046] LR [c0000000001ba128] pnv_tce_build+0xc8/0x140
[29824.694330] --- interrupt: 300
[29824.694330] --- interrupt: 300
[29824.697525] [c00020006c0334d0] [c0000000001b5958] pnv_ioda2_tce_build+0x38/0xb0
[29824.697525] [c00020006c0334d0] [c0000000001b5958] pnv_ioda2_tce_build+0x38/0xb0
[29824.699867] [c00020006c033510] [c00000000005be68] ppc_iommu_map_sg+0x1f8/0x6c0
[29824.699867] [c00020006c033510] [c00000000005be68] ppc_iommu_map_sg+0x1f8/0x6c0
[29824.703189] [c00020006c033640] [c00000000005a564] dma_iommu_map_sg+0x54/0x70
[29824.703189] [c00020006c033640] [c00000000005a564] dma_iommu_map_sg+0x54/0x70
[29824.707498] [c00020006c033660] [c00000000037ad38] __dma_map_sg_attrs+0x1f8/0x2c0
[29824.707498] [c00020006c033660] [c00000000037ad38] __dma_map_sg_attrs+0x1f8/0x2c0
[29824.712747] [c00020006c0336c0] [c00000000037ae1c] dma_map_sg_attrs+0x1c/0x40
[29824.712747] [c00020006c0336c0] [c00000000037ae1c] dma_map_sg_attrs+0x1c/0x40
[29824.716991] [c00020006c0336e0] [c0000000010c202c] scsi_dma_map+0x5c/0x90
[29824.716991] [c00020006c0336e0] [c0000000010c202c] scsi_dma_map+0x5c/0x90
[29824.722191] [c00020006c033700] [c008000017283f0c] megasas_build_io_fusion+0x264/0x3e0 [megara
id_sas]
[29824.722191] [c00020006c033700] [c008000017283f0c] megasas_build_io_fusion+0x264/0x3e0 [megara
id_sas]
[29824.728473] [c00020006c0337b0] [c008000017285868] megasas_build_and_issue_cmd_fusion+0xd0/0x2
f0 [megaraid_sas]
[29824.728473] [c00020006c0337b0] [c008000017285868] megasas_build_and_issue_cmd_fusion+0xd0/0x2
f0 [megaraid_sas]
[29824.734754] [c00020006c033840] [c008000017271844] megasas_queue_command+0x11c/0x280 [megaraid
_sas]
[29824.734754] [c00020006c033840] [c008000017271844] megasas_queue_command+0x11c/0x280 [megaraid
_sas]
[29824.740010] [c00020006c0338a0] [c0000000010bf0ec] scsi_dispatch_cmd+0xbc/0x2e0
[29824.740010] [c00020006c0338a0] [c0000000010bf0ec] scsi_dispatch_cmd+0xbc/0x2e0
[29824.745178] [c00020006c033920] [c0000000010c1078] scsi_queue_rq+0x598/0x880
[29824.745178] [c00020006c033920] [c0000000010c1078] scsi_queue_rq+0x598/0x880
[29824.746299] [c00020006c0339d0] [c000000000c712d0] blk_mq_dispatch_rq_list+0x170/0x690
[29824.746299] [c00020006c0339d0] [c000000000c712d0] blk_mq_dispatch_rq_list+0x170/0x690
[29824.751431] [c00020006c033a70] [c000000000c7a898] __blk_mq_do_dispatch_sched+0x458/0x480
[29824.751431] [c00020006c033a70] [c000000000c7a898] __blk_mq_do_dispatch_sched+0x458/0x480
[29824.756567] [c00020006c033b20] [c000000000c7af28] __blk_mq_sched_dispatch_requests+0x1d8/0x25
0
[29824.756567] [c00020006c033b20] [c000000000c7af28] __blk_mq_sched_dispatch_requests+0x1d8/0x25
0
[29824.763718] [c00020006c033b90] [c000000000c7b034] blk_mq_sched_dispatch_requests+0x44/0xb0
[29824.763718] [c00020006c033b90] [c000000000c7b034] blk_mq_sched_dispatch_requests+0x44/0xb0
[29824.767833] [c00020006c033bc0] [c000000000c6b1b8] blk_mq_run_hw_queue+0x348/0x3e0
[29824.767833] [c00020006c033bc0] [c000000000c6b1b8] blk_mq_run_hw_queue+0x348/0x3e0
[29824.773912] [c00020006c033c10] [c000000000c705d4] blk_mq_dispatch_plug_list+0x1c4/0x470
[29824.773912] [c00020006c033c10] [c000000000c705d4] blk_mq_dispatch_plug_list+0x1c4/0x470
[29824.775008] [c00020006c033cc0] [c000000000c7196c] blk_mq_flush_plug_list+0x17c/0x230
[29824.775008] [c00020006c033cc0] [c000000000c7196c] blk_mq_flush_plug_list+0x17c/0x230
[29824.781103] [c00020006c033d10] [c000000000c593ac] __blk_flush_plug+0x14c/0x1e0
[29824.781103] [c00020006c033d10] [c000000000c593ac] __blk_flush_plug+0x14c/0x1e0
[29824.785198] [c00020006c033d90] [c000000000c59710] blk_finish_plug+0x40/0x60
[29824.785198] [c00020006c033d90] [c000000000c59710] blk_finish_plug+0x40/0x60
[29824.790259] [c00020006c033dc0] [c008000016348cd8] raid5d+0x530/0x720 [raid456]
[29824.790259] [c00020006c033dc0] [c008000016348cd8] raid5d+0x530/0x720 [raid456]
[29824.792377] [c00020006c033f10] [c00000000126f15c] md_thread+0x10c/0x270
[29824.792377] [c00020006c033f10] [c00000000126f15c] md_thread+0x10c/0x270
[29824.796501] [c00020006c033f90] [c00000000028a1b8] kthread+0x138/0x150
[29824.796501] [c00020006c033f90] [c00000000028a1b8] kthread+0x138/0x150
[29824.797608] [c00020006c033fe0] [c00000000000ded8] start_kernel_thread+0x14/0x18
[29824.797608] [c00020006c033fe0] [c00000000000ded8] start_kernel_thread+0x14/0x18
[29824.802743] Code: 3929ffff 2c090000 913e000c 40820010 813e0074 71290004 418203f4 7fa3eb78 4bf
fd9c1 e95f0c00 312affff 7d295110 <0b090000> e87f0b30 4959d839 60000000 
[29824.802743] Code: 3929ffff 2c090000 913e000c 40820010 813e0074 71290004 418203f4 7fa3eb78 4bf
fd9c1 e95f0c00 312affff 7d295110 <0b090000> e87f0b30 4959d839 60000000 
[29824.809068] ---[ end trace 0000000000000000 ]---
[29824.809068] ---[ end trace 0000000000000000 ]---
[30007.778871] sd 1:0:13:0: [sde] tag#115 Command for which abort is issued is not found in outs
tanding commands
[30007.778871] sd 1:0:13:0: [sde] tag#115 Command for which abort is issued is not found in outs
tanding commands
[30007.786669] sd 1:0:13:0: [sde] tag#115 CDB: Read(16) 88 00 00 00 00 00 01 67 64 08 00 00 00 0
8 00 00
[30007.786669] sd 1:0:13:0: [sde] tag#115 CDB: Read(16) 88 00 00 00 00 00 01 67 64 08 00 00 00 0
8 00 00

This machine is:
    description: PowerNV
    product: 9006-22P (supermicro,p9dsu2u)
    vendor: IBM

and yes, it just runs a bunch of fedora 41 vm's that are koji builders. 

Happy to provide more info...

Comment 5 Aditi Mishra 2025-05-18 13:29:47 UTC
Hey Kevin Fenzi, 

Thanks for sharing info, is this the full log? 

Thanks, 
Aditi

Comment 6 Kevin Fenzi 2025-05-31 19:43:16 UTC
Yeah, thats all the oops I think... 

Interestingly I have been updating them as they crash and once they updated to 6.14.6-200.fc41.ppc64le I haven't seen any crashes for the last week.

Was there any power related stuff in 6.14.6 and/or the fedora kernel update for it?

Comment 7 Justin M. Forbes 2025-06-02 13:29:24 UTC
The 6.14.6 update contained:
powerpc64-ftrace-fix-module-loading-without-patchabl.patch
powerpc-boot-check-for-ld-option-support.patch
powerpc-boot-fix-dash-warning.patch

So I don't expect any of those to particularly handle this issue.

Comment 8 Aditi Mishra 2025-06-05 04:04:04 UTC
Hey Kevin Fenzi, can you clear one thing since not able to reproduce the mentioned error. 
                                                      
                                                  PowerNV
                                                     |
                                                     |
                                                  Bunch of Fedora 41 VM(s)

Can you clear how many vm's are running and you have mentioned that this error is seeing "remote not local" that I don't get. 

Thanks and regards,
Aditi


Note You need to log in before you can comment on or make changes to this bug.