1. Please describe the problem: Our Cockpit integration tests found [1] a kernel regression in 6.8.10-300 [2]. When running `sos report` when NFS server is running, it triggers a kernel crash and hangs. [1] https://github.com/cockpit-project/cockpit/issues/20488 [2] https://bodhi.fedoraproject.org/updates/FEDORA-2024-92664ae6fe 2. What is the Version-Release number of the kernel: kernel-6.8.10-300.fc40 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : Still worked up to 6.8.9-300 , the regression got introduced in 6.8.10-300. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: systemctl start nfs-server sos report --batch This hangs at Starting 50/101 multipath [Running: dnf logs memory multipath] Starting 51/101 networking [Running: dnf logs multipath networking] Starting 52/101 networkmanager [Running: dnf logs networking networkmanager] Plugin dnf timed out Plugin logs timed out Plugin networking timed out Plugin networkmanager timed out and dmesg/journal show a kernel crash: [ 70.663153] general protection fault, probably for non-canonical address 0x207325000a646c74: 0000 [#1] PREEMPT SMP NOPTI [ 70.664352] CPU: 0 PID: 5630 Comm: sos Not tainted 6.8.10-300.fc40.x86_64 #1 [ 70.665163] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 [ 70.666123] RIP: 0010:_raw_spin_lock_irqsave+0x27/0x50 [ 70.666668] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 9c 58 0f 1f 40 00 48 89 c3 fa 0f 1f 44 00 00 65 ff 05 48 c5 ec 7d 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 09 48 89 d8 5b c3 cc cc cc cc 89 c6 e8 93 08 00 00 [ 70.668754] RSP: 0018:ffffa42044247a30 EFLAGS: 00010046 [ 70.669346] RAX: 0000000000000000 RBX: 0000000000000282 RCX: 000000000000001d [ 70.670083] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 207325000a646c74 [ 70.670818] RBP: 207325000a646c74 R08: 0000000000000001 R09: 0000000000000000 [ 70.671522] R10: ffffa42044247ac0 R11: 0000000000000000 R12: ffff91c5c94e4bb8 [ 70.672260] R13: 207325000a646974 R14: 0000000000000001 R15: 0000000000000001 [ 70.672966] FS: 00007eff74c006c0(0000) GS:ffff91c606a00000(0000) knlGS:0000000000000000 [ 70.673814] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 70.674444] CR2: 00007eff70003270 CR3: 0000000007dc2006 CR4: 0000000000370ef0 [ 70.675135] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 70.675869] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 70.676596] Call Trace: [ 70.676851] <TASK> [ 70.677072] ? die_addr+0x36/0x90 [ 70.677428] ? exc_general_protection+0x17c/0x450 [ 70.677918] ? asm_exc_general_protection+0x26/0x30 [ 70.678442] ? _raw_spin_lock_irqsave+0x27/0x50 [ 70.678903] __percpu_counter_sum+0x18/0xb0 [ 70.679336] nfsd_show+0x53/0x1f0 [nfsd] [ 70.679814] seq_read_iter+0x11f/0x480 [ 70.680214] seq_read+0x12f/0x170 [ 70.680554] proc_reg_read+0x5a/0xa0 [ 70.681182] vfs_read+0xac/0x380 [ 70.681711] ? do_syscall_64+0x8f/0x170 [ 70.682323] ksys_read+0x6d/0xf0 [ 70.682856] do_syscall_64+0x83/0x170 [ 70.683483] ? syscall_exit_to_user_mode+0x83/0x230 [ 70.684264] ? do_syscall_64+0x8f/0x170 [ 70.684909] ? current_time+0x3e/0xf0 [ 70.685537] ? atime_needs_update+0x9c/0x110 [ 70.686229] ? touch_atime+0x1e/0x120 [ 70.686848] ? splice_direct_to_actor+0x1e4/0x260 [ 70.687585] ? __pfx_direct_splice_actor+0x10/0x10 [ 70.688349] ? do_splice_direct+0x77/0xc0 [ 70.689012] ? __pfx_direct_file_splice_eof+0x10/0x10 [ 70.689817] ? do_sendfile+0x211/0x440 [ 70.690460] ? __x64_sys_sendfile64+0x78/0xd0 [ 70.691182] ? syscall_exit_to_user_mode+0x83/0x230 [ 70.691969] ? do_syscall_64+0x8f/0x170 [ 70.692592] ? syscall_exit_to_user_mode+0x83/0x230 [ 70.693365] ? do_syscall_64+0x8f/0x170 [ 70.694020] ? do_syscall_64+0x8f/0x170 [ 70.694654] ? switch_fpu_return+0x4f/0xe0 [ 70.695302] ? clear_bhb_loop+0x55/0xb0 [ 70.695916] ? clear_bhb_loop+0x55/0xb0 [ 70.696540] ? clear_bhb_loop+0x55/0xb0 [ 70.697166] entry_SYSCALL_64_after_hwframe+0x78/0x80 [ 70.697918] RIP: 0033:0x7eff8351dcfa [ 70.698504] Code: 55 48 89 e5 48 83 ec 20 48 89 55 e8 48 89 75 f0 89 7d f8 e8 e8 74 f8 ff 48 8b 55 e8 48 8b 75 f0 41 89 c0 8b 7d f8 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 2e 44 89 c7 48 89 45 f8 e8 42 75 f8 ff 48 8b [ 70.701071] RSP: 002b:00007eff74bff710 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 70.702129] RAX: ffffffffffffffda RBX: 00007eff74c00638 RCX: 00007eff8351dcfa [ 70.703156] RDX: 0000000000010000 RSI: 00007eff4c00ad70 RDI: 0000000000000007 [ 70.704162] RBP: 00007eff74bff730 R08: 0000000000000000 R09: 0000000000000000 [ 70.705191] R10: 00007eff7f7ad780 R11: 0000000000000246 R12: 0000000000010000 [ 70.706207] R13: 00007eff4c00ad70 R14: 0000000000000007 R15: 00007eff78002120 [ 70.707210] </TASK> [ 70.707625] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nf_conntrack_tftp bridge stp llc overlay nfsd auth_rpcgss nfs_acl lockd grace nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables binfmt_misc intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass rapl virtio_balloon i2c_piix4 pktcdvd cirrus joydev vfat fat loop nfnetlink zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel virtio_net sha512_ssse3 sha256_ssse3 sha1_ssse3 net_failover virtio_blk virtio_scsi failover serio_raw ata_generic pata_acpi sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse dm_multipath qemu_fw_cfg [ 70.718141] ---[ end trace 0000000000000000 ]--- [ 70.718911] RIP: 0010:_raw_spin_lock_irqsave+0x27/0x50 [ 70.719751] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 9c 58 0f 1f 40 00 48 89 c3 fa 0f 1f 44 00 00 65 ff 05 48 c5 ec 7d 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 09 48 89 d8 5b c3 cc cc cc cc 89 c6 e8 93 08 00 00 [ 70.722431] RSP: 0018:ffffa42044247a30 EFLAGS: 00010046 [ 70.723273] RAX: 0000000000000000 RBX: 0000000000000282 RCX: 000000000000001d [ 70.724377] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 207325000a646c74 [ 70.725468] RBP: 207325000a646c74 R08: 0000000000000001 R09: 0000000000000000 [ 70.726576] R10: ffffa42044247ac0 R11: 0000000000000000 R12: ffff91c5c94e4bb8 [ 70.727693] R13: 207325000a646974 R14: 0000000000000001 R15: 0000000000000001 [ 70.728802] FS: 00007eff74c006c0(0000) GS:ffff91c606a00000(0000) knlGS:0000000000000000 [ 70.730057] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 70.730986] CR2: 00007eff70003270 CR3: 0000000007dc2006 CR4: 0000000000370ef0 [ 70.732090] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 70.733194] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 70.734314] note: sos[5630] exited with irqs disabled [ 70.735214] note: sos[5630] exited with preempt_count 1 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Done that (with ``kernel-core`` though), and with 6.9.0-64.fc41 it does not crash. 6. Are you running any modules that not shipped with directly Fedora's kernel?: No, standard Fedora cloud image, no additional repos. 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. Crash excerpt is above, full journal is here: https://cockpit-logs.us-east-1.linodeobjects.com/pull-0-f0d0c718-20240520-012823-fedora-40-updates-testing/TestSOS-testVerbose-fedora-40-127.0.0.2-2201-FAIL-1.log.gz Reproducible: Always
This issue is causing sysstat-collect.service to fail with SIGSEGV: × sysstat-collect.service - system activity accounting tool Loaded: loaded (/usr/lib/systemd/system/sysstat-collect.service; static) Drop-In: /usr/lib/systemd/system/service.d └─10-timeout-abort.conf Active: failed (Result: signal) since Thu 2024-05-23 20:30:05 EDT; 7min ago TriggeredBy: ● sysstat-collect.timer Docs: man:sa1(8) Process: 140780 ExecStart=/usr/lib64/sa/sa1 1 1 (code=killed, signal=SEGV) Main PID: 140780 (code=killed, signal=SEGV) CPU: 34ms May 23 20:30:05 myhost systemd[1]: Starting sysstat-collect.service - system activity accounting tool... May 23 20:30:05 myhost systemd[1]: sysstat-collect.service: Main process exited, code=killed, status=11/SEGV May 23 20:30:05 myhost systemd[1]: sysstat-collect.service: Failed with result 'signal'. May 23 20:30:05 myhost systemd[1]: Failed to start sysstat-collect.service - system activity accounting tool. The "journalctl -k" output shows the call nfsd_show call trace listed above at the same time as this service failure.
I'm seeing a similar issue on nfs servers with 8.6.10 kernel, but the system isn't crashing; just generates periodic backtraces similar to the above. May 27 23:50:02 star kernel: RSP: 0018:ffff98434a423a30 EFLAGS: 00010046 May 27 23:50:02 star kernel: RAX: 0000000000000000 RBX: 0000000000000282 RCX: 000000000000001d May 27 23:50:02 star kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: 7325203a53465100 May 27 23:50:02 star kernel: RBP: 7325203a53465100 R08: 0000000000000001 R09: 0000000000000000 May 27 23:50:02 star kernel: R10: ffff98434a423ac0 R11: 0000000000000000 R12: ffff8d1849e1bb40 May 27 23:50:02 star kernel: R13: 7325203a53464e00 R14: 0000000000000001 R15: ffff98434a423c48 May 27 23:50:02 star kernel: FS: 00007f8ce93c0740(0000) GS:ffff8d196fc80000(0000) knlGS:0000000000000000 May 27 23:50:02 star kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 27 23:50:02 star kernel: CR2: 00007ffe5f054ff0 CR3: 0000000172a58000 CR4: 00000000000406f0 May 27 23:50:02 star kernel: Call Trace: May 27 23:50:02 star kernel: <TASK> May 27 23:50:02 star kernel: ? die_addr+0x36/0x90 May 27 23:50:02 star kernel: ? exc_general_protection+0x1dd/0x450 May 27 23:50:02 star kernel: ? asm_exc_general_protection+0x26/0x30 May 27 23:50:02 star kernel: ? _raw_spin_lock_irqsave+0x27/0x50 May 27 23:50:02 star kernel: __percpu_counter_sum+0x18/0xb0 May 27 23:50:02 star kernel: ? __kmalloc_node+0x48c/0x4f0 May 27 23:50:02 star kernel: nfsd_show+0x53/0x1f0 [nfsd] May 27 23:50:02 star kernel: seq_read_iter+0x123/0x480 May 27 23:50:02 star kernel: seq_read+0x12f/0x170 May 27 23:50:02 star kernel: proc_reg_read+0x5d/0xa0 May 27 23:50:02 star kernel: vfs_read+0xaf/0x380 May 27 23:50:02 star kernel: ? _copy_to_user+0x24/0x40 May 27 23:50:02 star kernel: ? cp_new_stat+0x135/0x170 May 27 23:50:02 star kernel: ksys_read+0x6f/0xf0 May 27 23:50:02 star kernel: do_syscall_64+0x83/0x170 May 27 23:50:02 star kernel: ? __do_sys_newfstatat+0x4e/0x80 May 27 23:50:02 star kernel: ? syscall_exit_to_user_mode+0x83/0x230 May 27 23:50:02 star kernel: ? do_syscall_64+0x90/0x170 May 27 23:50:02 star kernel: ? do_filp_open+0xb3/0x160 May 27 23:50:02 star kernel: ? __pfx_proc_put_link+0x10/0x10 May 27 23:50:02 star kernel: ? __pfx_kfree_link+0x10/0x10 May 27 23:50:02 star kernel: ? do_sys_openat2+0x97/0xe0 May 27 23:50:02 star kernel: ? syscall_exit_to_user_mode+0x83/0x230 May 27 23:50:02 star kernel: ? do_syscall_64+0x90/0x170 May 27 23:50:02 star kernel: ? __irq_exit_rcu+0x4b/0xc0
That should read 6.8.10 kernel ... For now I've just reverted to the previous kernel I had handy, 6.8.4
The problem still exists on kernel 6.8.11 (on Fedora 39). The crash is triggered on systems running nfs-server by sysstat-collect.service, which is called by sysstat-collect.timer every ten minutes. I have stopped the timer temporary. The crash is also triggered by /usr/libexec/pcp/pmdas/linux/pmdalinux which is called by some services of package pcp.
I have a similar problem in my FC39 install - also only noticed it with a regular general protection fault in my syslog. Asked about it here with no response https://forums.fedoraforum.org/showthread.php?332724-Kernel-general-protection-warning-every-10-minutes&p=1883838#post1883838 Today I uninstalled sysstat and I no longer get the warnings in syslog
A local GitLam installation also trigger this in addition to sysstat Currently running 6.8.11 [Mon Jun 3 11:23:46 2024] RIP: 0010:_raw_spin_lock_irqsave+0x27/0x50 [Mon Jun 3 11:23:46 2024] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 9c 58 0f 1f 40 00 48 89 c3 fa 0f 1f 44 00 00 65 ff 05 28 2a ee 4a 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 09 48 89 d8 5b c3 cc cc cc cc 89 c6 e8 93 08 00 00 [Mon Jun 3 11:23:46 2024] RSP: 0018:ffffb880cd8a7950 EFLAGS: 00010046 [Mon Jun 3 11:23:46 2024] RAX: 0000000000000000 RBX: 0000000000000286 RCX: 000000000000003d [Mon Jun 3 11:23:46 2024] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 7325203a53465100 [Mon Jun 3 11:23:46 2024] RBP: 7325203a53465100 R08: 0000000000000001 R09: 0000000000000000 [Mon Jun 3 11:23:46 2024] R10: ffffb880cd8a79e0 R11: 0000000000000000 R12: ffff99cb86e3bca8 [Mon Jun 3 11:23:46 2024] R13: 7325203a53464e00 R14: 0000000000000001 R15: ffffb880cd8a7b68 [Mon Jun 3 11:23:46 2024] FS: 000000c000100090(0000) GS:ffff99ceaf380000(0000) knlGS:0000000000000000 [Mon Jun 3 11:23:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Mon Jun 3 11:23:46 2024] CR2: 000000c0003d7000 CR3: 0000000388ad2003 CR4: 00000000000606f0 [Mon Jun 3 11:23:46 2024] note: node_exporter[173614] exited with irqs disabled [Mon Jun 3 11:23:46 2024] note: node_exporter[173614] exited with preempt_count 1 [Mon Jun 3 11:24:01 2024] general protection fault, probably for non-canonical address 0x7325203a53465100: 0000 [#13] PREEMPT SMP PTI [Mon Jun 3 11:24:01 2024] CPU: 1 PID: 173602 Comm: node_exporter Tainted: P D OE 6.8.11-200.fc39.x86_64 #1 [Mon Jun 3 11:24:01 2024] Hardware name: System manufacturer System Product Name/P8P67 DELUXE, BIOS 1502 03/02/2011 [Mon Jun 3 11:24:01 2024] RIP: 0010:_raw_spin_lock_irqsave+0x27/0x50 [Mon Jun 3 11:24:01 2024] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 9c 58 0f 1f 40 00 48 89 c3 fa 0f 1f 44 00 00 65 ff 05 28 2a ee 4a 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 09 48 89 d8 5b c3 cc cc cc cc 89 c6 e8 93 08 00 00 [Mon Jun 3 11:24:01 2024] RSP: 0018:ffffb880d5d3fa38 EFLAGS: 00010046 [Mon Jun 3 11:24:01 2024] RAX: 0000000000000000 RBX: 0000000000000286 RCX: 000000000000000c [Mon Jun 3 11:24:01 2024] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 7325203a53465100 [Mon Jun 3 11:24:01 2024] RBP: 7325203a53465100 R08: 0000000000000001 R09: 0000000000000000 [Mon Jun 3 11:24:01 2024] R10: ffffb880d5d3fac8 R11: 0000000000000000 R12: ffff99cb8c830690 [Mon Jun 3 11:24:01 2024] R13: 7325203a53464e00 R14: 0000000000000001 R15: ffffb880d5d3fc50 [Mon Jun 3 11:24:01 2024] FS: 000000000112e250(0000) GS:ffff99ceaf280000(0000) knlGS:0000000000000000 [Mon Jun 3 11:24:01 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Mon Jun 3 11:24:01 2024] CR2: 000000c0007a9008 CR3: 0000000388ad2005 CR4: 00000000000606f0 [Mon Jun 3 11:24:01 2024] Call Trace: [Mon Jun 3 11:24:01 2024] <TASK> [Mon Jun 3 11:24:01 2024] ? die_addr+0x36/0x90 [Mon Jun 3 11:24:01 2024] ? exc_general_protection+0x1dd/0x450 [Mon Jun 3 11:24:01 2024] ? asm_exc_general_protection+0x26/0x30 [Mon Jun 3 11:24:01 2024] ? _raw_spin_lock_irqsave+0x27/0x50 [Mon Jun 3 11:24:01 2024] __percpu_counter_sum+0x18/0xb0 [Mon Jun 3 11:24:01 2024] nfsd_show+0x53/0x1f0 [nfsd] [Mon Jun 3 11:24:01 2024] seq_read_iter+0x123/0x480 [Mon Jun 3 11:24:01 2024] seq_read+0x12f/0x170 [Mon Jun 3 11:24:01 2024] proc_reg_read+0x5d/0xa0 [Mon Jun 3 11:24:01 2024] vfs_read+0xaf/0x380 [Mon Jun 3 11:24:01 2024] ? do_syscall_64+0x90/0x170 [Mon Jun 3 11:24:01 2024] ksys_read+0x6f/0xf0 [Mon Jun 3 11:24:01 2024] do_syscall_64+0x83/0x170 [Mon Jun 3 11:24:01 2024] ? __x64_sys_fcntl+0x81/0xc0 [Mon Jun 3 11:24:01 2024] ? syscall_exit_to_user_mode+0x83/0x230 [Mon Jun 3 11:24:01 2024] ? __memcg_slab_post_alloc_hook+0x17d/0x210 [Mon Jun 3 11:24:01 2024] ? kmem_cache_alloc+0x326/0x330 [Mon Jun 3 11:24:01 2024] ? syscall_exit_to_user_mode+0x83/0x230 [Mon Jun 3 11:24:01 2024] ? do_epoll_ctl+0x756/0x1000 [Mon Jun 3 11:24:01 2024] ? do_syscall_64+0x90/0x170 [Mon Jun 3 11:24:01 2024] ? ep_item_poll.isra.0+0x30/0x50 [Mon Jun 3 11:24:01 2024] ? do_epoll_ctl+0x1ce/0x1000 [Mon Jun 3 11:24:01 2024] ? __pfx_ep_ptable_queue_proc+0x10/0x10 [Mon Jun 3 11:24:01 2024] ? __x64_sys_epoll_ctl+0x70/0xa0 [Mon Jun 3 11:24:01 2024] ? syscall_exit_to_user_mode+0x83/0x230 [Mon Jun 3 11:24:01 2024] ? do_syscall_64+0x90/0x170 [Mon Jun 3 11:24:01 2024] ? do_syscall_64+0x90/0x170 [Mon Jun 3 11:24:01 2024] ? exc_page_fault+0x7f/0x180 [Mon Jun 3 11:24:01 2024] entry_SYSCALL_64_after_hwframe+0x78/0x80 [Mon Jun 3 11:24:01 2024] RIP: 0033:0x40720e [Mon Jun 3 11:24:01 2024] Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48 [Mon Jun 3 11:24:01 2024] RSP: 002b:000000c0005291d0 EFLAGS: 00000216 ORIG_RAX: 0000000000000000 [Mon Jun 3 11:24:01 2024] RAX: ffffffffffffffda RBX: 000000000000000b RCX: 000000000040720e [Mon Jun 3 11:24:01 2024] RDX: 0000000000001000 RSI: 000000c00066a000 RDI: 000000000000000b [Mon Jun 3 11:24:01 2024] RBP: 000000c000529210 R08: 0000000000000000 R09: 0000000000000000 [Mon Jun 3 11:24:01 2024] R10: 0000000000000000 R11: 0000000000000216 R12: 000000c000529350 [Mon Jun 3 11:24:01 2024] R13: 000000000112e1c0 R14: 000000c0002f56c0 R15: 0000000000000002 [Mon Jun 3 11:24:01 2024] </TASK> [Mon Jun 3 11:24:01 2024] Modules linked in: 8021q garp mrp overlay xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat ip6table_filter iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc qrtr rpcrdma rdma_cm iw_cm ib_cm ib_core nct6775 nct6775_core hwmon_vid nfsd auth_rpcgss nfs_acl lockd grace sunrpc tls bnep nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) binfmt_misc btusb btrtl btintel btbcm btmtk snd_hda_codec_realtek snd_hda_codec_generic bluetooth snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep pktcdvd xfs snd_seq iTCO_wdt intel_pmc_bxt raid456 async_raid6_recov intel_rapl_msr async_memcpy async_pq snd_seq_device iTCO_vendor_support async_xor async_tx snd_pcm at24 mei_me mei snd_timer snd soundcore intel_rapl_common i2c_i801 eeepc_wmi asus_wmi lpc_ich x86_pkg_temp_thermal intel_powerclamp i2c_smbus ledtrig_audio coretemp sparse_keymap platform_profile rapl [Mon Jun 3 11:24:01 2024] intel_cstate rfkill intel_uncore video wmi_bmof loop zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic firewire_ohci ghash_clmulni_intel sha512_ssse3 mxm_wmi raid1 sha256_ssse3 sha1_ssse3 r8169 realtek firewire_core crc_itu_t sata_mv e1000e wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse i2c_dev [Mon Jun 3 11:24:01 2024] ---[ end trace 0000000000000000 ]--- [Mon Jun 3 11:24:01 2024] RIP: 0010:_raw_spin_lock_irqsave+0x27/0x50 [Mon Jun 3 11:24:01 2024] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 9c 58 0f 1f 40 00 48 89 c3 fa 0f 1f 44 00 00 65 ff 05 28 2a ee 4a 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 09 48 89 d8 5b c3 cc cc cc cc 89 c6 e8 93 08 00 00 [Mon Jun 3 11:24:01 2024] RSP: 0018:ffffb880cd8a7950 EFLAGS: 00010046 [Mon Jun 3 11:24:01 2024] RAX: 0000000000000000 RBX: 0000000000000286 RCX: 000000000000003d [Mon Jun 3 11:24:01 2024] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 7325203a53465100 [Mon Jun 3 11:24:01 2024] RBP: 7325203a53465100 R08: 0000000000000001 R09: 0000000000000000 [Mon Jun 3 11:24:01 2024] R10: ffffb880cd8a79e0 R11: 0000000000000000 R12: ffff99cb86e3bca8 [Mon Jun 3 11:24:01 2024] R13: 7325203a53464e00 R14: 0000000000000001 R15: ffffb880cd8a7b68 [Mon Jun 3 11:24:01 2024] FS: 000000000112e250(0000) GS:ffff99ceaf280000(0000) knlGS:0000000000000000 [Mon Jun 3 11:24:01 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Mon Jun 3 11:24:01 2024] CR2: 000000c0007a9008 CR3: 0000000388ad2005 CR4: 00000000000606f0 [Mon Jun 3 11:24:01 2024] note: node_exporter[173602] exited with irqs disabled [Mon Jun 3 11:24:01 2024] note: node_exporter[173602] exited with preempt_count 1
I don't see this crash on kernel 6.8.12.
Edgar, is nfsd running error-free for you on 6.8.12 or do you see different crashes? There is a discussion on https://bodhi.fedoraproject.org/updates/FEDORA-2024-2c08de9311of a different nfsd issue apparently introduced in 6.8.11 and continuing.
Thomas, I don't see a nfs crash with kernel 6.8.12, neither on Fedora 39 nor on Fedora 40. But I don't use auth_rpcgss, which is mentioned in bug 2284279 and https://bodhi.fedoraproject.org/updates/FEDORA-2024-2c08de9311 . sysstat-collect.timer is running without causing a crash. kernel-6.8.12-200.fc39.x86_64 systemd-254.13-1.fc39.x86_64 sysstat-12.7.4-2.fc39.x86_64 kernel-6.8.12-300.fc40.x86_64 systemd-255.7-1.fc40.x86_64 sysstat-12.7.5-2.fc40.x86_64
Seeing the same, across ~10 machines with F40 with both 6.8.10 and 6.8.11. Reverting back to 6.8.5 (release kernel), solves the problem.
The possibly easiest way to reproduce this bug is to read /proc/net/rpc/nfsd, which is where /usr/lib64/sa/sa1 (as called by sysstat-collect.service) dies. [root@opus ~]# cat /proc/net/rpc/nfsd Segmentation fault [root@opus ~]# Reproduced on Fedora 39 running 6.8.11-200.fc39.x86_64.
The upcoming kernel https://bodhi.fedoraproject.org/updates/FEDORA-2024-f0bbf1af25 fixed that for me. # uname -r 6.8.12-200.fc39.x86_64 # cat /proc/net/rpc/nfsd rc 0 0 0 fh 0 0 0 0 0 io 0 0 th 8 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ra 0 0 0 0 0 0 0 0 0 0 0 0 net 0 0 0 0 rpc 0 0 0 0 0 proc3 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 proc4 2 0 0 proc4ops 76 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 wdeleg_getattr 0
Uwe's referenced bodhi update was for Fedora 39. But indeed our automatic tracker [1] confirms that this is fixed since June 21. [1] https://github.com/cockpit-project/bots/issues/6411