2282287 – 6.8.10-300 regression: general protection fault in nfsd_show when running sosreport with running NFS server

Bug 2282287 - 6.8.10-300 regression: general protection fault in nfsd_show when running sosreport with running NFS server

Summary: 6.8.10-300 regression: general protection fault in nfsd_show when running sos...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	40
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	https://cockpit-logs.us-east-1.linode...
Whiteboard:	CockpitTest
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-05-22 06:20 UTC by Martin Pitt
Modified:	2024-07-14 05:00 UTC (History)
CC List:	26 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2024-07-14 05:00:46 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Martin Pitt 2024-05-22 06:20:42 UTC

1. Please describe the problem: Our Cockpit integration tests found [1] a kernel regression in 6.8.10-300 [2]. When  running `sos report` when NFS server is running, it triggers a kernel crash and hangs.

[1] https://github.com/cockpit-project/cockpit/issues/20488
[2] https://bodhi.fedoraproject.org/updates/FEDORA-2024-92664ae6fe


2. What is the Version-Release number of the kernel:

kernel-6.8.10-300.fc40


3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Still worked up to 6.8.9-300 , the regression got introduced in 6.8.10-300.


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below: 

systemctl start nfs-server
sos report --batch

This hangs at

  Starting 50/101 multipath       [Running: dnf logs memory multipath]
  Starting 51/101 networking      [Running: dnf logs multipath networking]
  Starting 52/101 networkmanager  [Running: dnf logs networking networkmanager]

 Plugin dnf timed out


 Plugin logs timed out


 Plugin networking timed out


 Plugin networkmanager timed out

and dmesg/journal show a kernel crash:

[   70.663153] general protection fault, probably for non-canonical address 0x207325000a646c74: 0000 [#1] PREEMPT SMP NOPTI
[   70.664352] CPU: 0 PID: 5630 Comm: sos Not tainted 6.8.10-300.fc40.x86_64 #1
[   70.665163] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
[   70.666123] RIP: 0010:_raw_spin_lock_irqsave+0x27/0x50
[   70.666668] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 9c 58 0f 1f 40 00 48 89 c3 fa 0f 1f 44 00 00 65 ff 05 48 c5 ec 7d 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 09 48 89 d8 5b c3 cc cc cc cc 89 c6 e8 93 08 00 00
[   70.668754] RSP: 0018:ffffa42044247a30 EFLAGS: 00010046
[   70.669346] RAX: 0000000000000000 RBX: 0000000000000282 RCX: 000000000000001d
[   70.670083] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 207325000a646c74
[   70.670818] RBP: 207325000a646c74 R08: 0000000000000001 R09: 0000000000000000
[   70.671522] R10: ffffa42044247ac0 R11: 0000000000000000 R12: ffff91c5c94e4bb8
[   70.672260] R13: 207325000a646974 R14: 0000000000000001 R15: 0000000000000001
[   70.672966] FS:  00007eff74c006c0(0000) GS:ffff91c606a00000(0000) knlGS:0000000000000000
[   70.673814] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   70.674444] CR2: 00007eff70003270 CR3: 0000000007dc2006 CR4: 0000000000370ef0
[   70.675135] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   70.675869] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   70.676596] Call Trace:
[   70.676851]  <TASK>
[   70.677072]  ? die_addr+0x36/0x90
[   70.677428]  ? exc_general_protection+0x17c/0x450
[   70.677918]  ? asm_exc_general_protection+0x26/0x30
[   70.678442]  ? _raw_spin_lock_irqsave+0x27/0x50
[   70.678903]  __percpu_counter_sum+0x18/0xb0
[   70.679336]  nfsd_show+0x53/0x1f0 [nfsd]
[   70.679814]  seq_read_iter+0x11f/0x480
[   70.680214]  seq_read+0x12f/0x170
[   70.680554]  proc_reg_read+0x5a/0xa0
[   70.681182]  vfs_read+0xac/0x380
[   70.681711]  ? do_syscall_64+0x8f/0x170
[   70.682323]  ksys_read+0x6d/0xf0
[   70.682856]  do_syscall_64+0x83/0x170
[   70.683483]  ? syscall_exit_to_user_mode+0x83/0x230
[   70.684264]  ? do_syscall_64+0x8f/0x170
[   70.684909]  ? current_time+0x3e/0xf0
[   70.685537]  ? atime_needs_update+0x9c/0x110
[   70.686229]  ? touch_atime+0x1e/0x120
[   70.686848]  ? splice_direct_to_actor+0x1e4/0x260
[   70.687585]  ? __pfx_direct_splice_actor+0x10/0x10
[   70.688349]  ? do_splice_direct+0x77/0xc0
[   70.689012]  ? __pfx_direct_file_splice_eof+0x10/0x10
[   70.689817]  ? do_sendfile+0x211/0x440
[   70.690460]  ? __x64_sys_sendfile64+0x78/0xd0
[   70.691182]  ? syscall_exit_to_user_mode+0x83/0x230
[   70.691969]  ? do_syscall_64+0x8f/0x170
[   70.692592]  ? syscall_exit_to_user_mode+0x83/0x230
[   70.693365]  ? do_syscall_64+0x8f/0x170
[   70.694020]  ? do_syscall_64+0x8f/0x170
[   70.694654]  ? switch_fpu_return+0x4f/0xe0
[   70.695302]  ? clear_bhb_loop+0x55/0xb0
[   70.695916]  ? clear_bhb_loop+0x55/0xb0
[   70.696540]  ? clear_bhb_loop+0x55/0xb0
[   70.697166]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[   70.697918] RIP: 0033:0x7eff8351dcfa
[   70.698504] Code: 55 48 89 e5 48 83 ec 20 48 89 55 e8 48 89 75 f0 89 7d f8 e8 e8 74 f8 ff 48 8b 55 e8 48 8b 75 f0 41 89 c0 8b 7d f8 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 2e 44 89 c7 48 89 45 f8 e8 42 75 f8 ff 48 8b
[   70.701071] RSP: 002b:00007eff74bff710 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[   70.702129] RAX: ffffffffffffffda RBX: 00007eff74c00638 RCX: 00007eff8351dcfa
[   70.703156] RDX: 0000000000010000 RSI: 00007eff4c00ad70 RDI: 0000000000000007
[   70.704162] RBP: 00007eff74bff730 R08: 0000000000000000 R09: 0000000000000000
[   70.705191] R10: 00007eff7f7ad780 R11: 0000000000000246 R12: 0000000000010000
[   70.706207] R13: 00007eff4c00ad70 R14: 0000000000000007 R15: 00007eff78002120
[   70.707210]  </TASK>
[   70.707625] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nf_conntrack_tftp bridge stp llc overlay nfsd auth_rpcgss nfs_acl lockd grace nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables binfmt_misc intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass rapl virtio_balloon i2c_piix4 pktcdvd cirrus joydev vfat fat loop nfnetlink zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel virtio_net sha512_ssse3 sha256_ssse3 sha1_ssse3 net_failover virtio_blk virtio_scsi failover serio_raw ata_generic pata_acpi sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse dm_multipath qemu_fw_cfg
[   70.718141] ---[ end trace 0000000000000000 ]---
[   70.718911] RIP: 0010:_raw_spin_lock_irqsave+0x27/0x50
[   70.719751] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 9c 58 0f 1f 40 00 48 89 c3 fa 0f 1f 44 00 00 65 ff 05 48 c5 ec 7d 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 09 48 89 d8 5b c3 cc cc cc cc 89 c6 e8 93 08 00 00
[   70.722431] RSP: 0018:ffffa42044247a30 EFLAGS: 00010046
[   70.723273] RAX: 0000000000000000 RBX: 0000000000000282 RCX: 000000000000001d
[   70.724377] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 207325000a646c74
[   70.725468] RBP: 207325000a646c74 R08: 0000000000000001 R09: 0000000000000000
[   70.726576] R10: ffffa42044247ac0 R11: 0000000000000000 R12: ffff91c5c94e4bb8
[   70.727693] R13: 207325000a646974 R14: 0000000000000001 R15: 0000000000000001
[   70.728802] FS:  00007eff74c006c0(0000) GS:ffff91c606a00000(0000) knlGS:0000000000000000
[   70.730057] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   70.730986] CR2: 00007eff70003270 CR3: 0000000007dc2006 CR4: 0000000000370ef0
[   70.732090] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   70.733194] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   70.734314] note: sos[5630] exited with irqs disabled
[   70.735214] note: sos[5630] exited with preempt_count 1




5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Done that (with ``kernel-core`` though), and with 6.9.0-64.fc41 it does not crash.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No, standard Fedora cloud image, no additional repos.


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Crash excerpt is above, full journal is here:
https://cockpit-logs.us-east-1.linodeobjects.com/pull-0-f0d0c718-20240520-012823-fedora-40-updates-testing/TestSOS-testVerbose-fedora-40-127.0.0.2-2201-FAIL-1.log.gz

Reproducible: Always

Comment 1 John F Sullivan 2024-05-24 00:42:15 UTC

This issue is causing sysstat-collect.service to fail with SIGSEGV:

× sysstat-collect.service - system activity accounting tool
     Loaded: loaded (/usr/lib/systemd/system/sysstat-collect.service; static)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: failed (Result: signal) since Thu 2024-05-23 20:30:05 EDT; 7min ago
TriggeredBy: ● sysstat-collect.timer
       Docs: man:sa1(8)
    Process: 140780 ExecStart=/usr/lib64/sa/sa1 1 1 (code=killed, signal=SEGV)
   Main PID: 140780 (code=killed, signal=SEGV)
        CPU: 34ms

May 23 20:30:05 myhost systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
May 23 20:30:05 myhost systemd[1]: sysstat-collect.service: Main process exited, code=killed, status=11/SEGV
May 23 20:30:05 myhost systemd[1]: sysstat-collect.service: Failed with result 'signal'.
May 23 20:30:05 myhost systemd[1]: Failed to start sysstat-collect.service - system activity accounting tool.

The "journalctl -k" output shows the call nfsd_show call trace listed above at the same time as this service failure.

Comment 2 Ian Donaldson 2024-05-28 06:10:20 UTC

I'm seeing a similar issue on nfs servers with 8.6.10 kernel, but the system isn't crashing; just generates periodic
backtraces similar to the above.

May 27 23:50:02 star kernel: RSP: 0018:ffff98434a423a30 EFLAGS: 00010046
May 27 23:50:02 star kernel: RAX: 0000000000000000 RBX: 0000000000000282 RCX: 000000000000001d
May 27 23:50:02 star kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: 7325203a53465100
May 27 23:50:02 star kernel: RBP: 7325203a53465100 R08: 0000000000000001 R09: 0000000000000000
May 27 23:50:02 star kernel: R10: ffff98434a423ac0 R11: 0000000000000000 R12: ffff8d1849e1bb40
May 27 23:50:02 star kernel: R13: 7325203a53464e00 R14: 0000000000000001 R15: ffff98434a423c48
May 27 23:50:02 star kernel: FS:  00007f8ce93c0740(0000) GS:ffff8d196fc80000(0000) knlGS:0000000000000000
May 27 23:50:02 star kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 27 23:50:02 star kernel: CR2: 00007ffe5f054ff0 CR3: 0000000172a58000 CR4: 00000000000406f0
May 27 23:50:02 star kernel: Call Trace:
May 27 23:50:02 star kernel: <TASK>
May 27 23:50:02 star kernel: ? die_addr+0x36/0x90
May 27 23:50:02 star kernel: ? exc_general_protection+0x1dd/0x450
May 27 23:50:02 star kernel: ? asm_exc_general_protection+0x26/0x30
May 27 23:50:02 star kernel: ? _raw_spin_lock_irqsave+0x27/0x50
May 27 23:50:02 star kernel: __percpu_counter_sum+0x18/0xb0
May 27 23:50:02 star kernel: ? __kmalloc_node+0x48c/0x4f0
May 27 23:50:02 star kernel: nfsd_show+0x53/0x1f0 [nfsd]
May 27 23:50:02 star kernel: seq_read_iter+0x123/0x480
May 27 23:50:02 star kernel: seq_read+0x12f/0x170
May 27 23:50:02 star kernel: proc_reg_read+0x5d/0xa0
May 27 23:50:02 star kernel: vfs_read+0xaf/0x380
May 27 23:50:02 star kernel: ? _copy_to_user+0x24/0x40
May 27 23:50:02 star kernel: ? cp_new_stat+0x135/0x170
May 27 23:50:02 star kernel: ksys_read+0x6f/0xf0
May 27 23:50:02 star kernel: do_syscall_64+0x83/0x170
May 27 23:50:02 star kernel: ? __do_sys_newfstatat+0x4e/0x80
May 27 23:50:02 star kernel: ? syscall_exit_to_user_mode+0x83/0x230
May 27 23:50:02 star kernel: ? do_syscall_64+0x90/0x170
May 27 23:50:02 star kernel: ? do_filp_open+0xb3/0x160
May 27 23:50:02 star kernel: ? __pfx_proc_put_link+0x10/0x10
May 27 23:50:02 star kernel: ? __pfx_kfree_link+0x10/0x10
May 27 23:50:02 star kernel: ? do_sys_openat2+0x97/0xe0
May 27 23:50:02 star kernel: ? syscall_exit_to_user_mode+0x83/0x230
May 27 23:50:02 star kernel: ? do_syscall_64+0x90/0x170
May 27 23:50:02 star kernel: ? __irq_exit_rcu+0x4b/0xc0

Comment 3 Ian Donaldson 2024-05-28 06:13:06 UTC

That should read 6.8.10 kernel ...

For now I've just reverted to the previous kernel I had handy, 6.8.4

Comment 4 Edgar Hoch 2024-05-28 12:17:25 UTC

The problem still exists on kernel 6.8.11 (on Fedora 39).

The crash is triggered on systems running nfs-server by sysstat-collect.service, which is called by sysstat-collect.timer every ten minutes. I have stopped the timer temporary.

The crash is also triggered by /usr/libexec/pcp/pmdas/linux/pmdalinux which is called by some services of package pcp.

Comment 5 Anthony 2024-05-30 15:20:27 UTC

I have a similar problem in my FC39 install - also only noticed it with a regular general protection fault in my syslog. Asked about it here with no response  https://forums.fedoraforum.org/showthread.php?332724-Kernel-general-protection-warning-every-10-minutes&p=1883838#post1883838

Today I uninstalled sysstat and I no longer get the warnings in syslog

Comment 6 Kjell Randa 2024-06-03 10:11:16 UTC

A local GitLam installation also trigger this in addition to sysstat
Currently running 6.8.11

[Mon Jun  3 11:23:46 2024] RIP: 0010:_raw_spin_lock_irqsave+0x27/0x50
[Mon Jun  3 11:23:46 2024] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 9c 58 0f 1f 40 00 48 89 c3 fa 0f 1f 44 00 00 65 ff 05 28 2a ee 4a 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 09 48 89 d8 5b c3 cc cc cc cc 89 c6 e8 93 08 00 00
[Mon Jun  3 11:23:46 2024] RSP: 0018:ffffb880cd8a7950 EFLAGS: 00010046
[Mon Jun  3 11:23:46 2024] RAX: 0000000000000000 RBX: 0000000000000286 RCX: 000000000000003d
[Mon Jun  3 11:23:46 2024] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 7325203a53465100
[Mon Jun  3 11:23:46 2024] RBP: 7325203a53465100 R08: 0000000000000001 R09: 0000000000000000
[Mon Jun  3 11:23:46 2024] R10: ffffb880cd8a79e0 R11: 0000000000000000 R12: ffff99cb86e3bca8
[Mon Jun  3 11:23:46 2024] R13: 7325203a53464e00 R14: 0000000000000001 R15: ffffb880cd8a7b68
[Mon Jun  3 11:23:46 2024] FS:  000000c000100090(0000) GS:ffff99ceaf380000(0000) knlGS:0000000000000000
[Mon Jun  3 11:23:46 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Jun  3 11:23:46 2024] CR2: 000000c0003d7000 CR3: 0000000388ad2003 CR4: 00000000000606f0
[Mon Jun  3 11:23:46 2024] note: node_exporter[173614] exited with irqs disabled
[Mon Jun  3 11:23:46 2024] note: node_exporter[173614] exited with preempt_count 1
[Mon Jun  3 11:24:01 2024] general protection fault, probably for non-canonical address 0x7325203a53465100: 0000 [#13] PREEMPT SMP PTI
[Mon Jun  3 11:24:01 2024] CPU: 1 PID: 173602 Comm: node_exporter Tainted: P      D    OE      6.8.11-200.fc39.x86_64 #1
[Mon Jun  3 11:24:01 2024] Hardware name: System manufacturer System Product Name/P8P67 DELUXE, BIOS 1502 03/02/2011
[Mon Jun  3 11:24:01 2024] RIP: 0010:_raw_spin_lock_irqsave+0x27/0x50
[Mon Jun  3 11:24:01 2024] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 9c 58 0f 1f 40 00 48 89 c3 fa 0f 1f 44 00 00 65 ff 05 28 2a ee 4a 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 09 48 89 d8 5b c3 cc cc cc cc 89 c6 e8 93 08 00 00
[Mon Jun  3 11:24:01 2024] RSP: 0018:ffffb880d5d3fa38 EFLAGS: 00010046
[Mon Jun  3 11:24:01 2024] RAX: 0000000000000000 RBX: 0000000000000286 RCX: 000000000000000c
[Mon Jun  3 11:24:01 2024] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 7325203a53465100
[Mon Jun  3 11:24:01 2024] RBP: 7325203a53465100 R08: 0000000000000001 R09: 0000000000000000
[Mon Jun  3 11:24:01 2024] R10: ffffb880d5d3fac8 R11: 0000000000000000 R12: ffff99cb8c830690
[Mon Jun  3 11:24:01 2024] R13: 7325203a53464e00 R14: 0000000000000001 R15: ffffb880d5d3fc50
[Mon Jun  3 11:24:01 2024] FS:  000000000112e250(0000) GS:ffff99ceaf280000(0000) knlGS:0000000000000000
[Mon Jun  3 11:24:01 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Jun  3 11:24:01 2024] CR2: 000000c0007a9008 CR3: 0000000388ad2005 CR4: 00000000000606f0
[Mon Jun  3 11:24:01 2024] Call Trace:
[Mon Jun  3 11:24:01 2024]  <TASK>
[Mon Jun  3 11:24:01 2024]  ? die_addr+0x36/0x90
[Mon Jun  3 11:24:01 2024]  ? exc_general_protection+0x1dd/0x450
[Mon Jun  3 11:24:01 2024]  ? asm_exc_general_protection+0x26/0x30
[Mon Jun  3 11:24:01 2024]  ? _raw_spin_lock_irqsave+0x27/0x50
[Mon Jun  3 11:24:01 2024]  __percpu_counter_sum+0x18/0xb0
[Mon Jun  3 11:24:01 2024]  nfsd_show+0x53/0x1f0 [nfsd]
[Mon Jun  3 11:24:01 2024]  seq_read_iter+0x123/0x480
[Mon Jun  3 11:24:01 2024]  seq_read+0x12f/0x170
[Mon Jun  3 11:24:01 2024]  proc_reg_read+0x5d/0xa0
[Mon Jun  3 11:24:01 2024]  vfs_read+0xaf/0x380
[Mon Jun  3 11:24:01 2024]  ? do_syscall_64+0x90/0x170
[Mon Jun  3 11:24:01 2024]  ksys_read+0x6f/0xf0
[Mon Jun  3 11:24:01 2024]  do_syscall_64+0x83/0x170
[Mon Jun  3 11:24:01 2024]  ? __x64_sys_fcntl+0x81/0xc0
[Mon Jun  3 11:24:01 2024]  ? syscall_exit_to_user_mode+0x83/0x230
[Mon Jun  3 11:24:01 2024]  ? __memcg_slab_post_alloc_hook+0x17d/0x210
[Mon Jun  3 11:24:01 2024]  ? kmem_cache_alloc+0x326/0x330
[Mon Jun  3 11:24:01 2024]  ? syscall_exit_to_user_mode+0x83/0x230
[Mon Jun  3 11:24:01 2024]  ? do_epoll_ctl+0x756/0x1000
[Mon Jun  3 11:24:01 2024]  ? do_syscall_64+0x90/0x170
[Mon Jun  3 11:24:01 2024]  ? ep_item_poll.isra.0+0x30/0x50
[Mon Jun  3 11:24:01 2024]  ? do_epoll_ctl+0x1ce/0x1000
[Mon Jun  3 11:24:01 2024]  ? __pfx_ep_ptable_queue_proc+0x10/0x10
[Mon Jun  3 11:24:01 2024]  ? __x64_sys_epoll_ctl+0x70/0xa0
[Mon Jun  3 11:24:01 2024]  ? syscall_exit_to_user_mode+0x83/0x230
[Mon Jun  3 11:24:01 2024]  ? do_syscall_64+0x90/0x170
[Mon Jun  3 11:24:01 2024]  ? do_syscall_64+0x90/0x170
[Mon Jun  3 11:24:01 2024]  ? exc_page_fault+0x7f/0x180
[Mon Jun  3 11:24:01 2024]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[Mon Jun  3 11:24:01 2024] RIP: 0033:0x40720e
[Mon Jun  3 11:24:01 2024] Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
[Mon Jun  3 11:24:01 2024] RSP: 002b:000000c0005291d0 EFLAGS: 00000216 ORIG_RAX: 0000000000000000
[Mon Jun  3 11:24:01 2024] RAX: ffffffffffffffda RBX: 000000000000000b RCX: 000000000040720e
[Mon Jun  3 11:24:01 2024] RDX: 0000000000001000 RSI: 000000c00066a000 RDI: 000000000000000b
[Mon Jun  3 11:24:01 2024] RBP: 000000c000529210 R08: 0000000000000000 R09: 0000000000000000
[Mon Jun  3 11:24:01 2024] R10: 0000000000000000 R11: 0000000000000216 R12: 000000c000529350
[Mon Jun  3 11:24:01 2024] R13: 000000000112e1c0 R14: 000000c0002f56c0 R15: 0000000000000002
[Mon Jun  3 11:24:01 2024]  </TASK>
[Mon Jun  3 11:24:01 2024] Modules linked in: 8021q garp mrp overlay xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat ip6table_filter iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc qrtr rpcrdma rdma_cm iw_cm ib_cm ib_core nct6775 nct6775_core hwmon_vid nfsd auth_rpcgss nfs_acl lockd grace sunrpc tls bnep nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) binfmt_misc btusb btrtl btintel btbcm btmtk snd_hda_codec_realtek snd_hda_codec_generic bluetooth snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep pktcdvd xfs snd_seq iTCO_wdt intel_pmc_bxt raid456 async_raid6_recov intel_rapl_msr async_memcpy async_pq snd_seq_device iTCO_vendor_support async_xor async_tx snd_pcm at24 mei_me mei snd_timer snd soundcore intel_rapl_common i2c_i801 eeepc_wmi asus_wmi lpc_ich x86_pkg_temp_thermal intel_powerclamp i2c_smbus ledtrig_audio coretemp sparse_keymap platform_profile rapl
[Mon Jun  3 11:24:01 2024]  intel_cstate rfkill intel_uncore video wmi_bmof loop zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic firewire_ohci ghash_clmulni_intel sha512_ssse3 mxm_wmi raid1 sha256_ssse3 sha1_ssse3 r8169 realtek firewire_core crc_itu_t sata_mv e1000e wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse i2c_dev
[Mon Jun  3 11:24:01 2024] ---[ end trace 0000000000000000 ]---
[Mon Jun  3 11:24:01 2024] RIP: 0010:_raw_spin_lock_irqsave+0x27/0x50
[Mon Jun  3 11:24:01 2024] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 9c 58 0f 1f 40 00 48 89 c3 fa 0f 1f 44 00 00 65 ff 05 28 2a ee 4a 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 09 48 89 d8 5b c3 cc cc cc cc 89 c6 e8 93 08 00 00
[Mon Jun  3 11:24:01 2024] RSP: 0018:ffffb880cd8a7950 EFLAGS: 00010046
[Mon Jun  3 11:24:01 2024] RAX: 0000000000000000 RBX: 0000000000000286 RCX: 000000000000003d
[Mon Jun  3 11:24:01 2024] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 7325203a53465100
[Mon Jun  3 11:24:01 2024] RBP: 7325203a53465100 R08: 0000000000000001 R09: 0000000000000000
[Mon Jun  3 11:24:01 2024] R10: ffffb880cd8a79e0 R11: 0000000000000000 R12: ffff99cb86e3bca8
[Mon Jun  3 11:24:01 2024] R13: 7325203a53464e00 R14: 0000000000000001 R15: ffffb880cd8a7b68
[Mon Jun  3 11:24:01 2024] FS:  000000000112e250(0000) GS:ffff99ceaf280000(0000) knlGS:0000000000000000
[Mon Jun  3 11:24:01 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Jun  3 11:24:01 2024] CR2: 000000c0007a9008 CR3: 0000000388ad2005 CR4: 00000000000606f0
[Mon Jun  3 11:24:01 2024] note: node_exporter[173602] exited with irqs disabled
[Mon Jun  3 11:24:01 2024] note: node_exporter[173602] exited with preempt_count 1

Comment 7 Edgar Hoch 2024-06-03 10:21:44 UTC

I don't see this crash on kernel 6.8.12.

Comment 8 Thomas Clark 2024-06-05 20:16:23 UTC

Edgar, is nfsd running error-free for you on 6.8.12 or do you see different crashes? There is a discussion on https://bodhi.fedoraproject.org/updates/FEDORA-2024-2c08de9311of a different nfsd issue apparently introduced in 6.8.11 and continuing.

Comment 9 Edgar Hoch 2024-06-06 00:08:15 UTC

Thomas, I don't see a nfs crash with kernel 6.8.12, neither on Fedora 39 nor on Fedora 40.
But I don't use auth_rpcgss, which is mentioned in bug 2284279 and https://bodhi.fedoraproject.org/updates/FEDORA-2024-2c08de9311 .

sysstat-collect.timer is running without causing a crash.

kernel-6.8.12-200.fc39.x86_64
systemd-254.13-1.fc39.x86_64
sysstat-12.7.4-2.fc39.x86_64

kernel-6.8.12-300.fc40.x86_64
systemd-255.7-1.fc40.x86_64
sysstat-12.7.5-2.fc40.x86_64

Comment 10 Gilboa Davara 2024-06-08 08:35:11 UTC

Seeing the same, across ~10 machines with F40 with both 6.8.10 and 6.8.11.
Reverting back to 6.8.5 (release kernel), solves the problem.

Comment 11 Richard G 2024-06-09 02:35:20 UTC

The possibly easiest way to reproduce this bug is to read /proc/net/rpc/nfsd, which is where /usr/lib64/sa/sa1 (as called by sysstat-collect.service) dies.

[root@opus ~]# cat /proc/net/rpc/nfsd
Segmentation fault
[root@opus ~]#

Reproduced on Fedora 39 running 6.8.11-200.fc39.x86_64.

Comment 12 Uwe Menges 2024-06-12 12:19:52 UTC

The upcoming kernel https://bodhi.fedoraproject.org/updates/FEDORA-2024-f0bbf1af25 fixed that for me.
# uname -r
6.8.12-200.fc39.x86_64
# cat /proc/net/rpc/nfsd
rc 0 0 0
fh 0 0 0 0 0
io 0 0
th 8 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
ra 0 0 0 0 0 0 0 0 0 0 0 0
net 0 0 0 0
rpc 0 0 0 0 0
proc3 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
proc4 2 0 0
proc4ops 76 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
wdeleg_getattr 0

Comment 13 Martin Pitt 2024-07-14 05:00:46 UTC

Uwe's referenced bodhi update was for Fedora 39. But indeed our automatic tracker [1] confirms that this is fixed since June 21.

[1] https://github.com/cockpit-project/bots/issues/6411

Note You need to log in before you can comment on or make changes to this bug.

acaringi
adscvr
airlied
alciregi
anthony
bskeggs
edgar.hoch
fedoraproject
gilboad
hdegoede
hpa
idonaldson0
jonny
josef
jsullivan3
kernel-maint
kjell.m.randa
linville
masami256
mchehab
ngaywood
ptalbert
rg4redhat
steved
uwe.menges
vaibhav