I have two machines with similar hardware, similar workload and similar disk partitioning using XFS, both running Fedora 37. Some time ago I encountered a kernel crash on the first machine (kernel 6.3.4-101.fc37.x86_64): watchdog: BUG: soft lockup - CPU#14 stuck for 26s! [rocksdb:low:37079] Modules linked in: nft_masq nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel rfkill tcp_bbr ip_set nf_tables nfnetlink tun nct6775 nct6775_core hwmon_vid vfat fat ipmi_ssif intel_rapl_msr intel_rapl_common edac_mce_amd snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm_amd snd_hda_codec snd_hda_core snd_hwdep kvm snd_pcm snd_timer acpi_ipmi ipmi_si cdc_ether joydev snd usbnet ipmi_devintf irqbypass wmi_bmof soundcore i2c_piix4 k10temp rapl mii ipmi_msghandler fuse loop xfs raid1 crct10dif_pclmul nvme crc32_pclmul crc32c_intel igb polyval_clmulni polyval_generic nvme_core ghash_clmulni_intel ast ccp dca sha512_ssse3 wmi sp5100_tco nvme_common i2c_algo_bit CPU: 14 PID: 37079 Comm: rocksdb:low Not tainted 6.3.4-101.fc37.x86_64 #1 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570D4U, BIOS P1.20 05/19/2021 RIP: 0010:xas_load+0x34/0x50 Code: 22 ff ff ff 48 89 c2 83 e2 03 48 83 fa 02 75 08 48 3d 00 10 00 00 77 07 5b 5d c3 cc cc cc cc 0f b6 4b 10 48 8d 68 fe 38 48 fe <72> ec 48 89 ee 48 89 df e8 cf fd ff ff 80 7d 00 00 75 c7 eb d9 0f RSP: 0018:ffffb0a0c17cfb08 EFLAGS: 00000246 RAX: ffff9a47734716d2 RBX: ffffb0a0c17cfb20 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffff9a4356eb8248 RDI: ffffb0a0c17cfb20 RBP: ffff9a47734716d0 R08: 0000000000000000 R09: 000000000000121c R10: ffff9a4c705a06b0 R11: 0000000000000000 R12: 000000000000d405 R13: 000000000000d403 R14: 000000000000d403 R15: ffffb0a0c17cfdb8 FS: 00007f3b517ff6c0(0000) GS:ffff9a5e3ed80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005609c3547510 CR3: 00000007537ce000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: <TASK> filemap_get_read_batch+0x179/0x270 filemap_get_pages+0xab/0x6a0 ? filemap_get_pages+0xab/0x6a0 ? _copy_to_iter+0xc4/0x650 filemap_read+0xdf/0x350 xfs_file_buffered_read+0x4f/0xd0 [xfs] xfs_file_read_iter+0x74/0xe0 [xfs] vfs_read+0x240/0x310 __x64_sys_pread64+0x98/0xd0 do_syscall_64+0x5f/0x90 ? __x64_sys_pread64+0xa8/0xd0 ? syscall_exit_to_user_mode+0x1b/0x40 ? do_syscall_64+0x6b/0x90 ? irqtime_account_irq+0x40/0xc0 ? __irq_exit_rcu+0x4b/0xf0 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f3b6743c227 Code: 08 89 3c 24 48 89 4c 24 18 e8 b5 e3 f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 05 e4 f8 ff 48 8b RSP: 002b:00007f3b517f9330 EFLAGS: 00000293 ORIG_RAX: 0000000000000011 RAX: ffffffffffffffda RBX: 000000000000121c RCX: 00007f3b6743c227 RDX: 000000000000121c RSI: 00007f3b4ea25800 RDI: 000000000000008c RBP: 00007f3b517f9480 R08: 0000000000000000 R09: 00007f3b517f94c0 R10: 000000000d403fdd R11: 0000000000000293 R12: 000000000000121c R13: 000000000d403fdd R14: 00007f3b4ea25800 R15: 0000000000000000 </TASK> Today similar crash happened on the second machine (kernel 6.3.5-100.fc37.x86_64): kernel: watchdog: BUG: soft lockup - CPU#28 stuck for 26s! [rocksdb:low:2195] kernel: Modules linked in: tls nf_conntrack_netbios_ns nf_conntrack_broadcast nft_masq nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel tcp_bbr rfkill ip_set nf_tables nfnetlink nct6775 nct6775_core tun hwmon_vid jc42 vfat fat ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core kvm snd_hwdep snd_pcm snd_timer cdc_ether irqbypass acpi_ipmi snd usbnet wmi_bmof rapl ipmi_si k10temp soundcore i2c_piix4 joydev mii ipmi_devintf ipmi_msghandler fuse loop xfs uas usb_storage raid1 hid_cp2112 igb crct10dif_pclmul ast crc32_pclmul nvme crc32c_intel polyval_clmulni dca polyval_generic i2c_algo_bit nvme_core ghash_clmulni_intel ccp sha512_ssse3 wmi sp5100_tco nvme_common kernel: CPU: 28 PID: 2195 Comm: rocksdb:low Not tainted 6.3.5-100.fc37.x86_64 #1 kernel: Hardware name: To Be Filled By O.E.M. X570D4U/X570D4U, BIOS T1.29b 05/17/2022 kernel: RIP: 0010:xas_load+0x45/0x50 kernel: Code: 3d 00 10 00 00 77 07 5b 5d c3 cc cc cc cc 0f b6 4b 10 48 8d 68 fe 38 48 fe 72 ec 48 89 ee 48 89 df e8 cf fd ff ff 80 7d 00 00 <75> c7 eb d9 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 kernel: RSP: 0018:ffffaab80392fb40 EFLAGS: 00000246 kernel: RAX: fffff69f82a7c000 RBX: ffffaab80392fb58 RCX: 0000000000000000 kernel: RDX: 0000000000000010 RSI: ffff94a4268a6480 RDI: ffffaab80392fb58 kernel: RBP: ffff94a4268a6480 R08: 0000000000000000 R09: 000000000000424a kernel: R10: ffff94af1ec69ab0 R11: 0000000000000000 R12: 0000000000001610 kernel: R13: 000000000000160c R14: 000000000000160c R15: ffffaab80392fdf0 kernel: FS: 00007f49f7bfe6c0(0000) GS:ffff94b63f100000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007f01446e9000 CR3: 000000014a4be000 CR4: 0000000000750ee0 kernel: PKRU: 55555554 kernel: Call Trace: kernel: <IRQ> kernel: ? watchdog_timer_fn+0x1a8/0x210 kernel: ? __pfx_watchdog_timer_fn+0x10/0x10 kernel: ? __hrtimer_run_queues+0x112/0x2b0 kernel: ? hrtimer_interrupt+0xf8/0x230 kernel: ? __sysvec_apic_timer_interrupt+0x61/0x130 kernel: ? sysvec_apic_timer_interrupt+0x6d/0x90 kernel: </IRQ> kernel: <TASK> kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 kernel: ? xas_load+0x45/0x50 kernel: filemap_get_read_batch+0x179/0x270 kernel: filemap_get_pages+0xab/0x6a0 kernel: ? touch_atime+0x48/0x1b0 kernel: ? filemap_read+0x33f/0x350 kernel: filemap_read+0xdf/0x350 kernel: xfs_file_buffered_read+0x4f/0xd0 [xfs] kernel: xfs_file_read_iter+0x74/0xe0 [xfs] kernel: vfs_read+0x240/0x310 kernel: __x64_sys_pread64+0x98/0xd0 kernel: do_syscall_64+0x5f/0x90 kernel: ? native_flush_tlb_local+0x34/0x40 kernel: ? flush_tlb_func+0x10d/0x240 kernel: ? do_syscall_64+0x6b/0x90 kernel: ? sched_clock_cpu+0xf/0x190 kernel: ? irqtime_account_irq+0x40/0xc0 kernel: ? __irq_exit_rcu+0x4b/0xf0 kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc kernel: RIP: 0033:0x7f4a0c23c227 kernel: Code: 08 89 3c 24 48 89 4c 24 18 e8 b5 e3 f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 05 e4 f8 ff 48 8b kernel: RSP: 002b:00007f49f7bf8310 EFLAGS: 00000293 ORIG_RAX: 0000000000000011 kernel: RAX: ffffffffffffffda RBX: 000000000000424a RCX: 00007f4a0c23c227 kernel: RDX: 000000000000424a RSI: 00007f04294a35c0 RDI: 00000000000004be kernel: RBP: 00007f49f7bf8460 R08: 0000000000000000 R09: 00007f49f7bf84a0 kernel: R10: 000000000160c718 R11: 0000000000000293 R12: 000000000000424a kernel: R13: 000000000160c718 R14: 00007f04294a35c0 R15: 0000000000000000 kernel: </TASK> ... kernel: ------------[ cut here ]------------ kernel: kernel BUG at fs/inode.c:612! kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI kernel: CPU: 21 PID: 2195 Comm: rocksdb:low Tainted: G L 6.3.5-100.fc37.x86_64 #1 kernel: Hardware name: To Be Filled By O.E.M. X570D4U/X570D4U, BIOS T1.29b 05/17/2022 kernel: RIP: 0010:clear_inode+0x76/0x80 kernel: Code: 2d a8 40 75 2b 48 8b 93 28 01 00 00 48 8d 83 28 01 00 00 48 39 c2 75 1a 48 c7 83 98 00 00 00 60 00 00 00 5b 5d c3 cc cc cc cc <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 90 90 90 90 90 90 90 90 90 90 90 90 kernel: RSP: 0018:ffffaab80392fe58 EFLAGS: 00010002 kernel: RAX: 0000000000000000 RBX: ffff94af1ec69938 RCX: 0000000000000000 kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff94af1ec69ab8 kernel: RBP: ffff94af1ec69ab8 R08: ffffaab80392fd38 R09: 0000000000000002 kernel: R10: 0000000000000001 R11: 0000000000000005 R12: ffffffffc08b9860 kernel: R13: ffff94af1ec69938 R14: 00000000ffffff9c R15: ffff94979dd5da40 kernel: FS: 00007f49f7bfe6c0(0000) GS:ffff94b63ef40000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007eefca8e2000 CR3: 000000014a4be000 CR4: 0000000000750ee0 kernel: PKRU: 55555554 kernel: Call Trace: kernel: <TASK> kernel: ? die+0x36/0x90 kernel: ? do_trap+0xda/0x100 kernel: ? clear_inode+0x76/0x80 kernel: ? do_error_trap+0x6a/0x90 kernel: ? clear_inode+0x76/0x80 kernel: ? exc_invalid_op+0x50/0x70 kernel: ? clear_inode+0x76/0x80 kernel: ? asm_exc_invalid_op+0x1a/0x20 kernel: ? clear_inode+0x76/0x80 kernel: ? clear_inode+0x1d/0x80 kernel: evict+0x1b8/0x1d0 kernel: do_unlinkat+0x174/0x320 kernel: __x64_sys_unlink+0x42/0x70 kernel: do_syscall_64+0x5f/0x90 kernel: ? __irq_exit_rcu+0x4b/0xf0 kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc kernel: RIP: 0033:0x7f4a0c23faab kernel: Code: f0 ff ff 73 01 c3 48 8b 0d 82 63 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 63 0d 00 f7 d8 64 89 01 48 kernel: RSP: 002b:00007f49f7bfab58 EFLAGS: 00000206 ORIG_RAX: 0000000000000057 kernel: RAX: ffffffffffffffda RBX: 00007f49f7bfac38 RCX: 00007f4a0c23faab kernel: RDX: 00007f49f7bfadd0 RSI: 00007f4a0bc2fd30 RDI: 00007f49dd3c32d0 kernel: RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000 kernel: R10: ffffffffffffdf58 R11: 0000000000000206 R12: 0000000000280bc0 kernel: R13: 00007f4a0bca77b8 R14: 00007f49f7bfadd0 R15: 00007f49f7bfadd0 kernel: </TASK> kernel: Modules linked in: tls nf_conntrack_netbios_ns nf_conntrack_broadcast nft_masq nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel tcp_bbr rfkill ip_set nf_tables nfnetlink nct6775 nct6775_core tun hwmon_vid jc42 vfat fat ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core kvm snd_hwdep snd_pcm snd_timer cdc_ether irqbypass acpi_ipmi snd usbnet wmi_bmof rapl ipmi_si k10temp soundcore i2c_piix4 joydev mii ipmi_devintf ipmi_msghandler fuse loop xfs uas usb_storage raid1 hid_cp2112 igb crct10dif_pclmul ast crc32_pclmul nvme crc32c_intel polyval_clmulni dca polyval_generic i2c_algo_bit nvme_core ghash_clmulni_intel ccp sha512_ssse3 wmi sp5100_tco nvme_common kernel: ---[ end trace 0000000000000000 ]--- kernel: RIP: 0010:clear_inode+0x76/0x80 kernel: Code: 2d a8 40 75 2b 48 8b 93 28 01 00 00 48 8d 83 28 01 00 00 48 39 c2 75 1a 48 c7 83 98 00 00 00 60 00 00 00 5b 5d c3 cc cc cc cc <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 90 90 90 90 90 90 90 90 90 90 90 90 kernel: RSP: 0018:ffffaab80392fe58 EFLAGS: 00010002 kernel: RAX: 0000000000000000 RBX: ffff94af1ec69938 RCX: 0000000000000000 kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff94af1ec69ab8 kernel: RBP: ffff94af1ec69ab8 R08: ffffaab80392fd38 R09: 0000000000000002 kernel: R10: 0000000000000001 R11: 0000000000000005 R12: ffffffffc08b9860 kernel: R13: ffff94af1ec69938 R14: 00000000ffffff9c R15: ffff94979dd5da40 kernel: FS: 00007f49f7bfe6c0(0000) GS:ffff94b63ef40000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007eefca8e2000 CR3: 000000014a4be000 CR4: 0000000000750ee0 kernel: PKRU: 55555554 kernel: note: rocksdb:low[2195] exited with irqs disabled kernel: note: rocksdb:low[2195] exited with preempt_count 1 Reproducible: Sometimes Steps to Reproduce: Not sure how to reproduce this. But seems to be related to RocksDB multithreaded mostly-write workload on XFS on NVMe. Never happened on kernel 6.2. Looks like a regression. Actual Results: BUG: soft lockup Expected Results: No soft lockup, no crashes, just normal operation.
It happened again, now with Fedora 38 with kernel 6.3.6-200.fc38.x86_64: [ 1088.788665] watchdog: BUG: soft lockup - CPU#23 stuck for 27s! [rocksdb:low:1855] [ 1088.788689] Modules linked in: nft_masq nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill tcp_bbr ip_set nf_tables nfnetlink tun nct6775 nct6775_core hwmon_vid vfat ipmi_ssif fat intel_rapl_msr intel_rapl_common edac_mce_amd snd_hda_intel kvm_amd snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core kvm snd_hwdep snd_pcm acpi_ipmi snd_timer ipmi_si snd cdc_ether usbnet ipmi_devintf irqbypass wmi_bmof soundcore k10temp i2c_piix4 mii ipmi_msghandler rapl joydev fuse loop xfs raid1 igb nvme ast crct10dif_pclmul crc32_pclmul dca crc32c_intel nvme_core i2c_algo_bit polyval_clmulni polyval_generic ghash_clmulni_intel ccp sp5100_tco nvme_common wmi sha512_ssse3 [ 1088.788742] CPU: 23 PID: 1855 Comm: rocksdb:low Not tainted 6.3.6-200.fc38.x86_64 #1 [ 1088.788744] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570D4U, BIOS P1.20 05/19/2021 [ 1088.788746] RIP: 0010:xas_descend+0xa/0x70 [ 1088.788755] Code: 07 48 c1 e8 20 48 89 57 08 c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f b6 0e 48 8b 57 08 48 d3 ea <83> e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 [ 1088.788756] RSP: 0018:ffffbe2701ecfbc8 EFLAGS: 00000246 [ 1088.788758] RAX: ffff974ae7875daa RBX: ffffbe2701ecfbe8 RCX: 0000000000000000 [ 1088.788760] RDX: 000000000000f601 RSI: ffff974ae7875da8 RDI: ffffbe2701ecfbe8 [ 1088.788761] RBP: ffff974ae7875da8 R08: 0000000000000000 R09: 0000000000000ea7 [ 1088.788762] R10: ffff9749f24fdab0 R11: 0000000000000000 R12: 000000000000f601 [ 1088.788763] R13: 000000000000f600 R14: 000000000000f600 R15: ffffbe2701ecfe80 [ 1088.788764] FS: 00007f3a499ff6c0(0000) GS:ffff97683efc0000(0000) knlGS:0000000000000000 [ 1088.788766] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1088.788767] CR2: 00007ef226b27020 CR3: 00000001067f8000 CR4: 0000000000750ee0 [ 1088.788768] PKRU: 55555554 [ 1088.788769] Call Trace: [ 1088.788772] <IRQ> [ 1088.788777] ? watchdog_timer_fn+0x1a8/0x210 [ 1088.788782] ? __pfx_watchdog_timer_fn+0x10/0x10 [ 1088.788784] ? __hrtimer_run_queues+0x112/0x2b0 [ 1088.788787] ? hrtimer_interrupt+0xf8/0x230 [ 1088.788790] ? __sysvec_apic_timer_interrupt+0x61/0x130 [ 1088.788793] ? sysvec_apic_timer_interrupt+0x6d/0x90 [ 1088.788796] </IRQ> [ 1088.788796] <TASK> [ 1088.788797] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 1088.788802] ? xas_descend+0xa/0x70 [ 1088.788804] xas_load+0x41/0x50 [ 1088.788807] filemap_get_read_batch+0x179/0x270 [ 1088.788810] filemap_get_pages+0xab/0x690 [ 1088.788813] ? touch_atime+0x48/0x1b0 [ 1088.788816] ? filemap_read+0x33f/0x350 [ 1088.788818] filemap_read+0xdf/0x350 [ 1088.788822] xfs_file_buffered_read+0x4f/0xd0 [xfs] [ 1088.788945] xfs_file_read_iter+0x74/0xe0 [xfs] [ 1088.789030] vfs_read+0x240/0x310 [ 1088.789034] __x64_sys_pread64+0x98/0xd0 [ 1088.789037] do_syscall_64+0x60/0x90 [ 1088.789040] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 1088.789044] RIP: 0033:0x7f3a71721115 [ 1088.789065] Code: e8 48 89 75 f0 89 7d f8 48 89 4d e0 e8 84 99 f8 ff 4c 8b 55 e0 48 8b 55 e8 41 89 c0 48 8b 75 f0 8b 7d f8 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 44 89 c7 48 89 45 f8 e8 d7 99 f8 ff 48 8b [ 1088.789066] RSP: 002b:00007f3a499f9390 EFLAGS: 00000293 ORIG_RAX: 0000000000000011 [ 1088.789068] RAX: ffffffffffffffda RBX: 00007f3a499f94e0 RCX: 00007f3a71721115 [ 1088.789069] RDX: 0000000000000ea7 RSI: 00007f3a43452000 RDI: 0000000000000427 [ 1088.789070] RBP: 00007f3a499f93b0 R08: 0000000000000000 R09: 00007f3a499f9528 [ 1088.789071] R10: 000000000f600496 R11: 0000000000000293 R12: 000000000f600496 [ 1088.789072] R13: 0000000000000ea7 R14: 00007f3a43452000 R15: 00007f3a3d20b940 [ 1088.789074] </TASK> [ 1116.788144] watchdog: BUG: soft lockup - CPU#23 stuck for 53s! [rocksdb:low:1855] [ 1116.788177] Modules linked in: nft_masq nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill tcp_bbr ip_set nf_tables nfnetlink tun nct6775 nct6775_core hwmon_vid vfat ipmi_ssif fat intel_rapl_msr intel_rapl_common edac_mce_amd snd_hda_intel kvm_amd snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core kvm snd_hwdep snd_pcm acpi_ipmi snd_timer ipmi_si snd cdc_ether usbnet ipmi_devintf irqbypass wmi_bmof soundcore k10temp i2c_piix4 mii ipmi_msghandler rapl joydev fuse loop xfs raid1 igb nvme ast crct10dif_pclmul crc32_pclmul dca crc32c_intel nvme_core i2c_algo_bit polyval_clmulni polyval_generic ghash_clmulni_intel ccp sp5100_tco nvme_common wmi sha512_ssse3 [ 1116.788225] CPU: 23 PID: 1855 Comm: rocksdb:low Tainted: G L 6.3.6-200.fc38.x86_64 #1 [ 1116.788228] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570D4U, BIOS P1.20 05/19/2021 [ 1116.788229] RIP: 0010:xas_start+0x50/0xc0 [ 1116.788238] Code: 48 89 c1 83 e1 03 48 83 f9 02 75 08 48 3d 00 10 00 00 77 12 48 85 d2 75 1a 48 c7 47 18 00 00 00 00 c3 cc cc cc cc 0f b6 48 fe <48> d3 ea 48 83 fa 3f 76 e6 48 c7 47 18 01 00 00 00 31 c0 c3 cc cc [ 1116.788240] RSP: 0018:ffffbe2701ecfbc8 EFLAGS: 00000282 [ 1116.788242] RAX: ffff97493820824a RBX: ffffbe2701ecfbe8 RCX: 000000000000000c [ 1116.788244] RDX: 000000000000f601 RSI: ffff974ae7875da8 RDI: ffffbe2701ecfbe8 [ 1116.788245] RBP: 000000000000f601 R08: 0000000000000000 R09: 0000000000000ea7 [ 1116.788246] R10: ffff9749f24fdab0 R11: 0000000000000000 R12: 000000000000f601 [ 1116.788247] R13: 000000000000f600 R14: 000000000000f600 R15: ffffbe2701ecfe80 [ 1116.788248] FS: 00007f3a499ff6c0(0000) GS:ffff97683efc0000(0000) knlGS:0000000000000000 [ 1116.788250] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1116.788252] CR2: 00007ef226b27020 CR3: 00000001067f8000 CR4: 0000000000750ee0 [ 1116.788253] PKRU: 55555554 [ 1116.788254] Call Trace: [ 1116.788257] <IRQ> [ 1116.788261] ? watchdog_timer_fn+0x1a8/0x210 [ 1116.788267] ? __pfx_watchdog_timer_fn+0x10/0x10 [ 1116.788268] ? __hrtimer_run_queues+0x112/0x2b0 [ 1116.788272] ? hrtimer_interrupt+0xf8/0x230 [ 1116.788274] ? __sysvec_apic_timer_interrupt+0x61/0x130 [ 1116.788277] ? sysvec_apic_timer_interrupt+0x6d/0x90 [ 1116.788279] </IRQ> [ 1116.788279] <TASK> [ 1116.788280] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 1116.788285] ? xas_start+0x50/0xc0 [ 1116.788287] xas_load+0xe/0x50 [ 1116.788289] filemap_get_read_batch+0x179/0x270 [ 1116.788293] filemap_get_pages+0xab/0x690 [ 1116.788295] ? touch_atime+0x48/0x1b0 [ 1116.788298] ? filemap_read+0x33f/0x350 [ 1116.788300] filemap_read+0xdf/0x350 [ 1116.788304] xfs_file_buffered_read+0x4f/0xd0 [xfs] [ 1116.788404] xfs_file_read_iter+0x74/0xe0 [xfs] [ 1116.788474] vfs_read+0x240/0x310 [ 1116.788477] __x64_sys_pread64+0x98/0xd0 [ 1116.788479] do_syscall_64+0x60/0x90 [ 1116.788482] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 1116.788485] RIP: 0033:0x7f3a71721115 [ 1116.788506] Code: e8 48 89 75 f0 89 7d f8 48 89 4d e0 e8 84 99 f8 ff 4c 8b 55 e0 48 8b 55 e8 41 89 c0 48 8b 75 f0 8b 7d f8 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 44 89 c7 48 89 45 f8 e8 d7 99 f8 ff 48 8b [ 1116.788507] RSP: 002b:00007f3a499f9390 EFLAGS: 00000293 ORIG_RAX: 0000000000000011 [ 1116.788508] RAX: ffffffffffffffda RBX: 00007f3a499f94e0 RCX: 00007f3a71721115 [ 1116.788510] RDX: 0000000000000ea7 RSI: 00007f3a43452000 RDI: 0000000000000427 [ 1116.788510] RBP: 00007f3a499f93b0 R08: 0000000000000000 R09: 00007f3a499f9528 [ 1116.788511] R10: 000000000f600496 R11: 0000000000000293 R12: 000000000f600496 [ 1116.788512] R13: 0000000000000ea7 R14: 00007f3a43452000 R15: 00007f3a3d20b940 [ 1116.788515] </TASK> [ 1123.770238] rcu: INFO: rcu_preempt self-detected stall on CPU [ 1123.770243] rcu: 23-....: (59997 ticks this GP) idle=aaac/1/0x4000000000000000 softirq=172344/172344 fqs=13940 [ 1123.770248] rcu: (t=60000 jiffies g=479441 q=95799 ncpus=32) [ 1123.770250] CPU: 23 PID: 1855 Comm: rocksdb:low Tainted: G L 6.3.6-200.fc38.x86_64 #1 [ 1123.770253] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570D4U, BIOS P1.20 05/19/2021 [ 1123.770254] RIP: 0010:xas_start+0xa/0xc0 [ 1123.770262] Code: c0 eb a5 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 8b 57 18 48 89 d0 83 e0 03 <74> 5c 48 81 fa 05 c0 ff ff 76 06 48 83 f8 02 74 46 48 8b 07 48 8b [ 1123.770264] RSP: 0018:ffffbe2701ecfbc8 EFLAGS: 00000206 [ 1123.770266] RAX: 0000000000000003 RBX: ffffbe2701ecfbe8 RCX: 0000000000000000 [ 1123.770268] RDX: 0000000000000003 RSI: ffff974ae7875da8 RDI: ffffbe2701ecfbe8 [ 1123.770269] RBP: 000000000000f601 R08: 0000000000000000 R09: 0000000000000ea7 [ 1123.770270] R10: ffff9749f24fdab0 R11: 0000000000000000 R12: 000000000000f601 [ 1123.770271] R13: 000000000000f600 R14: 000000000000f600 R15: ffffbe2701ecfe80 [ 1123.770272] FS: 00007f3a499ff6c0(0000) GS:ffff97683efc0000(0000) knlGS:0000000000000000 [ 1123.770274] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1123.770275] CR2: 00007ef226b27020 CR3: 00000001067f8000 CR4: 0000000000750ee0 [ 1123.770276] PKRU: 55555554 [ 1123.770277] Call Trace: [ 1123.770279] <IRQ> [ 1123.770282] ? rcu_dump_cpu_stacks+0xc4/0x100 [ 1123.770287] ? rcu_sched_clock_irq+0x4f2/0x1170 [ 1123.770289] ? sched_slice+0x87/0x140 [ 1123.770293] ? task_tick_fair+0x2fc/0x400 [ 1123.770295] ? trigger_load_balance+0x72/0x350 [ 1123.770298] ? update_process_times+0x74/0xb0 [ 1123.770301] ? tick_sched_handle+0x22/0x60 [ 1123.770304] ? tick_sched_timer+0x67/0x80 [ 1123.770306] ? __pfx_tick_sched_timer+0x10/0x10 [ 1123.770308] ? __hrtimer_run_queues+0x112/0x2b0 [ 1123.770310] ? hrtimer_interrupt+0xf8/0x230 [ 1123.770312] ? __sysvec_apic_timer_interrupt+0x61/0x130 [ 1123.770315] ? sysvec_apic_timer_interrupt+0x6d/0x90 [ 1123.770317] </IRQ> [ 1123.770318] <TASK> [ 1123.770318] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 1123.770326] ? xas_start+0xa/0xc0 [ 1123.770329] xas_load+0xe/0x50 [ 1123.770331] filemap_get_read_batch+0x179/0x270 [ 1123.770335] filemap_get_pages+0xab/0x690 [ 1123.770337] ? touch_atime+0x48/0x1b0 [ 1123.770341] ? filemap_read+0x33f/0x350 [ 1123.770342] filemap_read+0xdf/0x350 [ 1123.770347] xfs_file_buffered_read+0x4f/0xd0 [xfs] [ 1123.770477] xfs_file_read_iter+0x74/0xe0 [xfs] [ 1123.770560] vfs_read+0x240/0x310 [ 1123.770564] __x64_sys_pread64+0x98/0xd0 [ 1123.770566] do_syscall_64+0x60/0x90 [ 1123.770569] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 1123.770572] RIP: 0033:0x7f3a71721115 [ 1123.770595] Code: e8 48 89 75 f0 89 7d f8 48 89 4d e0 e8 84 99 f8 ff 4c 8b 55 e0 48 8b 55 e8 41 89 c0 48 8b 75 f0 8b 7d f8 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 44 89 c7 48 89 45 f8 e8 d7 99 f8 ff 48 8b [ 1123.770596] RSP: 002b:00007f3a499f9390 EFLAGS: 00000293 ORIG_RAX: 0000000000000011 [ 1123.770598] RAX: ffffffffffffffda RBX: 00007f3a499f94e0 RCX: 00007f3a71721115 [ 1123.770599] RDX: 0000000000000ea7 RSI: 00007f3a43452000 RDI: 0000000000000427 [ 1123.770600] RBP: 00007f3a499f93b0 R08: 0000000000000000 R09: 00007f3a499f9528 [ 1123.770601] R10: 000000000f600496 R11: 0000000000000293 R12: 000000000f600496 [ 1123.770601] R13: 0000000000000ea7 R14: 00007f3a43452000 R15: 00007f3a3d20b940 [ 1123.770603] </TASK>
Happened yet again on Fedora 38 with kernel 6.3.6-200.fc38.x86_64. This is unusable, I will try to downgrade to 6.2.16-300.fc38.
It looks like I am not the only one affected by this: https://www.spinics.net/lists/kernel/msg4783004.html
Yet another possibly related bug report: https://bugzilla.kernel.org/show_bug.cgi?id=217572 I am not 100% sure that this happens only on 6.3.* kernels, but downgrading to 6.2.16-300.fc38.x86_64 fixed this for me. Or it was caused by some very specific conditions that somehow existed only before 2023-06-15.
This is not an XFS bug, nor is it in any way related to recent XFS issues in early 6.3 kernels. As I commented in https://bugzilla.kernel.org/show_bug.cgi?id=217572: "No, that has nothing to do with the problem you are seeing on 6.1.31 kernels. That was a fix for a regression introduced in 6.3-rc1, and hence does not exist in 6.1.y kernels. The problem you are tripping over appears to be a livelock in the page cache iterator infrastructure, not an issue with the filesystem itself. This has been seen occasionally (maybe once every couple of months of testing across the entire dev community) during testing since large folios were enabled in the page cache, but nobody has been able to reproduce it reliably enough to be able to isolate the root cause and fix it yet. If you can reproduce it reliably and quickly, then putting together a recipe that we can use to trigger it would be a great help." The issue has been around since ~5.17 (IIRC) and it is largely impossible to reproduce, so any help you can providing in crafting a reliable reproducer that we can use to diagnose the root cause and test the fix would be appreciated. -Dave.
This message is a reminder that Fedora Linux 37 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 37 on 2023-12-05. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '37'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 37 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
Fedora Linux 37 entered end-of-life (EOL) status on 2023-12-05. Fedora Linux 37 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed.
Still reproducible on Fedora 39.