Bug 2150630 - kernel: general protection fault, probably for non-canonical address PREEMPT SMP PT
Summary: kernel: general protection fault, probably for non-canonical address PREEMPT ...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 37
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jeff Layton
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2148276 2154477 2171185 (view as bug list)
Depends On:
Blocks: 2179342
TreeView+ depends on / blocked
 
Reported: 2022-12-04 16:19 UTC by Dario Lesca
Modified: 2023-03-22 09:55 UTC (History)
29 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2179342 (view as bug list)
Environment:
Last Closed: 2023-03-22 09:55:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
kernel logs (54.25 KB, application/gzip)
2022-12-04 16:19 UTC, Dario Lesca
no flags Details

Description Dario Lesca 2022-12-04 16:19:29 UTC
Created attachment 1929862 [details]
kernel logs

Created attachment 1929862 [details]
kernel logs

1. Please describe the problem:

On a Fedora 37 Workstation used as a Server up to date (kernel 6.0.10-300.fc37.x86_64), sometime I get the errors "general protection fault" show into attached file "kernel-6.0.10-300.fc37.x86_64.txt"

This problem occur often when I use from another PC a shared folder via NFS and if I try to restart nfs on the server via systemctl (or kill nfs) the command it remains stuck and I must press power off button and restart the Server.

Other times the Server simply freezes and I must power off and restart it.

I have try to boot with old previous kernel 5.18.6-200.fc36.x86_64, fortunately still left installed from the previous version of Fedora 36, without changing anything in the hardware and until now, the server seem work great and the problem  "general protection fault" it's gone.
See attached file "kernel-5.18.6-200.fc36.x86_64.txt"

2. What is the Version-Release number of the kernel:

kernel 6.0.10-300.fc37.x86_64

With kernel 5.18.6-200.fc36.x86_64 the problem seem do not occur

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Seem this problem occur after I update from Fedora 36 (not Up to Date and with kernel 5.18.x) to Fedora 37

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

I cannot reproduce with some specific command, this problem occur random, but only if I use the kernel 6.0.x, with 5.18.x seem do not occur.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

This is a production server, and for now I do not have try a new kernel from rawhide.
I you thinks I must try it let me know.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

I have produce two log, one with lernel 6.0.x when the problem occur and one with lernel 5.18.x when the problem do not occur.

Here some errors logged[1]

Let me know if I must produce some other test.

Many thanks.
Dario


[1] Some error logged

[  155.090391] kernel: general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#3] PREEMPT SMP PTI
[  155.090409] kernel: CPU: 7 PID: 2101 Comm: nfsd Tainted: G      D            6.0.10-300.fc37.x86_64 #1
[  155.090420] kernel: Hardware name: LENOVO 30BGS1BV00/103D, BIOS S06KT40A 03/15/2019
[  155.090425] kernel: RIP: 0010:release_pages+0x46/0x590
[  155.090445] kernel: Code: 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a2 00 00 00 48 63 f6 31 db 49 89 fc 45 31 ed 48 8d 2c f7 4d 8b 3c 24 <49> 8b 47 08 a8 01 0f 85 a9 01 00 00 0f 1f 44 00 00 4d 85 ed 74 0c
[  155.090453] kernel: RSP: 0018:ffffa9a901687e40 EFLAGS: 00010216
[  155.090463] kernel: RAX: 00000000ffff8c03 RBX: 0000000000000000 RCX: 0000000000000000
[  155.090470] kernel: RDX: fffffc6304b365c8 RSI: ffffa9a901687e68 RDI: fffffc6304aec988
[  155.090477] kernel: RBP: ffff8c03de810b70 R08: fffffc6304b365c8 R09: 0000000000000000
[  155.090483] kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c03de810b28
[  155.090489] kernel: R13: 0000000000000000 R14: fffffc6304aec988 R15: 0017ffffc0000000
[  155.090495] kernel: FS:  0000000000000000(0000) GS:ffff8c0b0ddc0000(0000) knlGS:0000000000000000
[  155.090503] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  155.090510] kernel: CR2: 00007f58641e3020 CR3: 000000035d010001 CR4: 00000000003726e0
[  155.090517] kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  155.090522] kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  155.090528] kernel: Call Trace:
[  155.090534] kernel:  <TASK>
[  155.090547] kernel:  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  155.090653] kernel:  __pagevec_release+0x1b/0x30
[  155.090670] kernel:  svc_xprt_release+0x1a1/0x200 [sunrpc]
[  155.090841] kernel:  svc_send+0x59/0x160 [sunrpc]
[  155.090997] kernel:  nfsd+0xd5/0x190 [nfsd]
[  155.091098] kernel:  kthread+0xe6/0x110
[  155.091111] kernel:  ? kthread_complete_and_exit+0x20/0x20
[  155.091124] kernel:  ret_from_fork+0x1f/0x30
[  155.091147] kernel:  </TASK>
[  155.091151] kernel: Modules linked in: tls snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap xt_recent xt_conntrack xt_hashlimit xt_addrtype xt_mark xt_TCPMSS nft_chain_nat xt_MASQUERADE xt_REDIRECT xt_multiport xt_nat xt_CT xt_NFLOG nfnetlink_log xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc rpcrdma nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp rdma_cm nf_conntrack_amanda nf_nat iw_cm nf_conntrack_sane ib_cm nf_conntrack_tftp nf_conntrack_sip ib_core nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun pppoe pppox ppp_generic slhc ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink bridge stp llc qrtr snd_hda_codec_hdmi snd_hda_codec_realtek intel_rapl_msr intel_rapl_common snd_hda_codec_generic iwlmvm ledtrig_audio intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp snd_hda_intel coretemp
[  155.091332] kernel:  snd_intel_dspcfg kvm_intel snd_intel_sdw_acpi mac80211 snd_hda_codec kvm libarc4 iTCO_wdt ee1004 mei_wdt intel_pmc_bxt mei_hdcp iTCO_vendor_support mei_pxp snd_hda_core iwlwifi snd_hwdep irqbypass snd_seq snd_seq_device rapl intel_cstate snd_pcm cfg80211 snd_timer mei_me snd think_lmi intel_uncore i2c_i801 intel_wmi_thunderbolt firmware_attributes_class wmi_bmof soundcore pcspkr rfkill i2c_smbus mei joydev intel_pch_thermal acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic raid1 ghash_clmulni_intel e1000e r8169 drm_buddy drm_display_helper cec ttm wmi video uas usb_storage scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
[  155.091522] kernel: ---[ end trace 0000000000000000 ]---
[  155.099238] kernel: RIP: 0010:release_pages+0x46/0x590
[  155.099246] kernel: Code: 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a2 00 00 00 48 63 f6 31 db 49 89 fc 45 31 ed 48 8d 2c f7 4d 8b 3c 24 <49> 8b 47 08 a8 01 0f 85 a9 01 00 00 0f 1f 44 00 00 4d 85 ed 74 0c
[  155.099248] kernel: RSP: 0018:ffffa9a901697e40 EFLAGS: 00010206
[  155.099251] kernel: RAX: 00000000ffff8c03 RBX: 0000000000000000 RCX: 0000000000000000
[  155.099253] kernel: RDX: fffffc6305e4dd48 RSI: ffffa9a901697e68 RDI: fffffc6304df1f48
[  155.099255] kernel: RBP: ffff8c03dea64b78 R08: fffffc6305e4dd48 R09: 0000000000000000
[  155.099257] kernel: R10: 0000000000000000 R11: fffffc63043ec208 R12: ffff8c03dea64b28
[  155.099259] kernel: R13: 0000000000000000 R14: fffffc6304df1f48 R15: 0017ffffc0000000
[  155.099261] kernel: FS:  0000000000000000(0000) GS:ffff8c0b0ddc0000(0000) knlGS:0000000000000000
[  155.099263] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  155.099265] kernel: CR2: 00007f58641e3020 CR3: 000000035d010001 CR4: 00000000003726e0
[  155.099267] kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  155.099269] kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  169.303848] kernel: general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#4] PREEMPT SMP PTI
[  169.303866] kernel: CPU: 5 PID: 2100 Comm: nfsd Tainted: G      D            6.0.10-300.fc37.x86_64 #1
[  169.303877] kernel: Hardware name: LENOVO 30BGS1BV00/103D, BIOS S06KT40A 03/15/2019
[  169.303882] kernel: RIP: 0010:release_pages+0x46/0x590
[  169.303900] kernel: Code: 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a2 00 00 00 48 63 f6 31 db 49 89 fc 45 31 ed 48 8d 2c f7 4d 8b 3c 24 <49> 8b 47 08 a8 01 0f 85 a9 01 00 00 0f 1f 44 00 00 4d 85 ed 74 0c
[  169.303909] kernel: RSP: 0018:ffffa9a90167fe40 EFLAGS: 00010216
[  169.303919] kernel: RAX: 00000000ffff8c03 RBX: 0000000000000000 RCX: 0000000000000000
[  169.303926] kernel: RDX: fffffc6304b32408 RSI: ffffa9a90167fe68 RDI: fffffc6304ae0f08
[  169.303933] kernel: RBP: ffff8c03e5fecb70 R08: fffffc6304b32408 R09: 0000000000000000
[  169.303939] kernel: R10: 0000000000000000 R11: fffffc63040df008 R12: ffff8c03e5fecb28
[  169.303944] kernel: R13: 0000000000000000 R14: fffffc6304ae0f08 R15: 0017ffffc0000000
[  169.303951] kernel: FS:  0000000000000000(0000) GS:ffff8c0b0dd40000(0000) knlGS:0000000000000000
[  169.303959] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  169.303965] kernel: CR2: 00007fea7c5aa000 CR3: 000000035d010006 CR4: 00000000003726e0
[  169.303972] kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  169.303978] kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  169.303984] kernel: Call Trace:
[  169.303990] kernel:  <TASK>
[  169.304004] kernel:  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  169.304111] kernel:  __pagevec_release+0x1b/0x30
[  169.304131] kernel:  svc_xprt_release+0x1a1/0x200 [sunrpc]
[  169.304338] kernel:  svc_send+0x59/0x160 [sunrpc]
[  169.304546] kernel:  nfsd+0xd5/0x190 [nfsd]
[  169.304673] kernel:  kthread+0xe6/0x110
[  169.304686] kernel:  ? kthread_complete_and_exit+0x20/0x20
[  169.304700] kernel:  ret_from_fork+0x1f/0x30
[  169.304723] kernel:  </TASK>
[  169.304727] kernel: Modules linked in: tls snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap xt_recent xt_conntrack xt_hashlimit xt_addrtype xt_mark xt_TCPMSS nft_chain_nat xt_MASQUERADE xt_REDIRECT xt_multiport xt_nat xt_CT xt_NFLOG nfnetlink_log xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc rpcrdma nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp rdma_cm nf_conntrack_amanda nf_nat iw_cm nf_conntrack_sane ib_cm nf_conntrack_tftp nf_conntrack_sip ib_core nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun pppoe pppox ppp_generic slhc ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink bridge stp llc qrtr snd_hda_codec_hdmi snd_hda_codec_realtek intel_rapl_msr intel_rapl_common snd_hda_codec_generic iwlmvm ledtrig_audio intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp snd_hda_intel coretemp
[  169.304928] kernel:  snd_intel_dspcfg kvm_intel snd_intel_sdw_acpi mac80211 snd_hda_codec kvm libarc4 iTCO_wdt ee1004 mei_wdt intel_pmc_bxt mei_hdcp iTCO_vendor_support mei_pxp snd_hda_core iwlwifi snd_hwdep irqbypass snd_seq snd_seq_device rapl intel_cstate snd_pcm cfg80211 snd_timer mei_me snd think_lmi intel_uncore i2c_i801 intel_wmi_thunderbolt firmware_attributes_class wmi_bmof soundcore pcspkr rfkill i2c_smbus mei joydev intel_pch_thermal acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic raid1 ghash_clmulni_intel e1000e r8169 drm_buddy drm_display_helper cec ttm wmi video uas usb_storage scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
[  169.305177] kernel: ---[ end trace 0000000000000000 ]---
[  169.312087] kernel: RIP: 0010:release_pages+0x46/0x590
[  169.312094] kernel: Code: 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a2 00 00 00 48 63 f6 31 db 49 89 fc 45 31 ed 48 8d 2c f7 4d 8b 3c 24 <49> 8b 47 08 a8 01 0f 85 a9 01 00 00 0f 1f 44 00 00 4d 85 ed 74 0c
[  169.312098] kernel: RSP: 0018:ffffa9a901697e40 EFLAGS: 00010206
[  169.312102] kernel: RAX: 00000000ffff8c03 RBX: 0000000000000000 RCX: 0000000000000000
[  169.312104] kernel: RDX: fffffc6305e4dd48 RSI: ffffa9a901697e68 RDI: fffffc6304df1f48
[  169.312107] kernel: RBP: ffff8c03dea64b78 R08: fffffc6305e4dd48 R09: 0000000000000000
[  169.312109] kernel: R10: 0000000000000000 R11: fffffc63043ec208 R12: ffff8c03dea64b28
[  169.312111] kernel: R13: 0000000000000000 R14: fffffc6304df1f48 R15: 0017ffffc0000000
[  169.312113] kernel: FS:  0000000000000000(0000) GS:ffff8c0b0dd40000(0000) knlGS:0000000000000000
[  169.312115] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  169.312117] kernel: CR2: 00007fea7c5aa000 CR3: 000000035d010006 CR4: 00000000003726e0
[  169.312119] kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  169.312120] kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  183.416141] kernel: general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#5] PREEMPT SMP PTI
[  183.416158] kernel: CPU: 6 PID: 2099 Comm: nfsd Tainted: G      D            6.0.10-300.fc37.x86_64 #1
[  183.416169] kernel: Hardware name: LENOVO 30BGS1BV00/103D, BIOS S06KT40A 03/15/2019
[  183.416175] kernel: RIP: 0010:release_pages+0x46/0x590
[  183.416194] kernel: Code: 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a2 00 00 00 48 63 f6 31 db 49 89 fc 45 31 ed 48 8d 2c f7 4d 8b 3c 24 <49> 8b 47 08 a8 01 0f 85 a9 01 00 00 0f 1f 44 00 00 4d 85 ed 74 0c
[  183.416202] kernel: RSP: 0018:ffffa9a901677e40 EFLAGS: 00010216
[  183.416212] kernel: RAX: 00000000ffff8c03 RBX: 0000000000000000 RCX: 0000000000000000
[  183.416219] kernel: RDX: fffffc6304b2e248 RSI: ffffa9a901677e68 RDI: fffffc6304294648
[  183.416226] kernel: RBP: ffff8c03e71f0b70 R08: fffffc6304b2e248 R09: 0000000000000000
[  183.416232] kernel: R10: 0000000000000000 R11: fffffc6304616008 R12: ffff8c03e71f0b28
[  183.416238] kernel: R13: 0000000000000000 R14: fffffc6304294648 R15: 0017ffffc0000000
[  183.416245] kernel: FS:  0000000000000000(0000) GS:ffff8c0b0dd80000(0000) knlGS:0000000000000000
[  183.416252] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  183.416259] kernel: CR2: 00007f42e2ffcfff CR3: 000000035d010002 CR4: 00000000003726e0
[  183.416266] kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  183.416271] kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  183.416278] kernel: Call Trace:
[  183.416283] kernel:  <TASK>
[  183.416296] kernel:  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  183.416403] kernel:  __pagevec_release+0x1b/0x30
[  183.416419] kernel:  svc_xprt_release+0x1a1/0x200 [sunrpc]
[  183.416591] kernel:  svc_send+0x59/0x160 [sunrpc]
[  183.416747] kernel:  nfsd+0xd5/0x190 [nfsd]
[  183.416849] kernel:  kthread+0xe6/0x110
[  183.416861] kernel:  ? kthread_complete_and_exit+0x20/0x20
[  183.416875] kernel:  ret_from_fork+0x1f/0x30
[  183.416897] kernel:  </TASK>
[  183.416901] kernel: Modules linked in: tls snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap xt_recent xt_conntrack xt_hashlimit xt_addrtype xt_mark xt_TCPMSS nft_chain_nat xt_MASQUERADE xt_REDIRECT xt_multiport xt_nat xt_CT xt_NFLOG nfnetlink_log xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc rpcrdma nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp rdma_cm nf_conntrack_amanda nf_nat iw_cm nf_conntrack_sane ib_cm nf_conntrack_tftp nf_conntrack_sip ib_core nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun pppoe pppox ppp_generic slhc ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink bridge stp llc qrtr snd_hda_codec_hdmi snd_hda_codec_realtek intel_rapl_msr intel_rapl_common snd_hda_codec_generic iwlmvm ledtrig_audio intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp snd_hda_intel coretemp
[  183.417086] kernel:  snd_intel_dspcfg kvm_intel snd_intel_sdw_acpi mac80211 snd_hda_codec kvm libarc4 iTCO_wdt ee1004 mei_wdt intel_pmc_bxt mei_hdcp iTCO_vendor_support mei_pxp snd_hda_core iwlwifi snd_hwdep irqbypass snd_seq snd_seq_device rapl intel_cstate snd_pcm cfg80211 snd_timer mei_me snd think_lmi intel_uncore i2c_i801 intel_wmi_thunderbolt firmware_attributes_class wmi_bmof soundcore pcspkr rfkill i2c_smbus mei joydev intel_pch_thermal acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic raid1 ghash_clmulni_intel e1000e r8169 drm_buddy drm_display_helper cec ttm wmi video uas usb_storage scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
[  183.417281] kernel: ---[ end trace 0000000000000000 ]---
[  183.423103] kernel: RIP: 0010:release_pages+0x46/0x590
[  183.423108] kernel: Code: 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a2 00 00 00 48 63 f6 31 db 49 89 fc 45 31 ed 48 8d 2c f7 4d 8b 3c 24 <49> 8b 47 08 a8 01 0f 85 a9 01 00 00 0f 1f 44 00 00 4d 85 ed 74 0c
[  183.423111] kernel: RSP: 0018:ffffa9a901697e40 EFLAGS: 00010206
[  183.423114] kernel: RAX: 00000000ffff8c03 RBX: 0000000000000000 RCX: 0000000000000000
[  183.423115] kernel: RDX: fffffc6305e4dd48 RSI: ffffa9a901697e68 RDI: fffffc6304df1f48
[  183.423117] kernel: RBP: ffff8c03dea64b78 R08: fffffc6305e4dd48 R09: 0000000000000000
[  183.423119] kernel: R10: 0000000000000000 R11: fffffc63043ec208 R12: ffff8c03dea64b28
[  183.423120] kernel: R13: 0000000000000000 R14: fffffc6304df1f48 R15: 0017ffffc0000000
[  183.423122] kernel: FS:  0000000000000000(0000) GS:ffff8c0b0dd80000(0000) knlGS:0000000000000000
[  183.423124] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  183.423126] kernel: CR2: 00007f42e2ffcfff CR3: 000000035d010002 CR4: 00000000003726e0
[  183.423128] kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  183.423129] kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Comment 1 Andy Lawrence 2022-12-17 00:29:15 UTC
*** Bug 2154477 has been marked as a duplicate of this bug. ***

Comment 2 Andy Lawrence 2022-12-17 00:46:32 UTC
This same issue started for me when I upgraded to F37 and continued with 6.0.12-300.fc37.x86_64 update kernel.  I can also trigger this fault with NFS traffic and cure it by downgrading to 5.18.6-200.fc36.x86_64.

Comment 3 Dario Lesca 2023-01-02 09:17:00 UTC
I have work with kernel 5.18.6-200.fc36.x86_64 for 4 week without problem.
Last day I have try to update to last kernel 6.0.15-300.fc37.x86_64 and reboot the server.

After some hours I have start to use a share nfs from another client and I get the same error:

[211062.414524] kernel: general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#1] PREEMPT SMP PTI
[211062.414542] kernel: CPU: 6 PID: 2289 Comm: nfsd Not tainted 6.0.15-300.fc37.x86_64 #1
[211062.414553] kernel: Hardware name: LENOVO 30BGS1BV00/103D, BIOS S06KT40A 03/15/2019
[211062.414559] kernel: RIP: 0010:release_pages+0x46/0x590
[211062.414577] kernel: Code: 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a2 00 00 00 48 63 f6 31 db 49 89 fc 45 31 ed 48 8d 2c f7 4d 8b 3c 24 <49> 8b 47 08 a8 01 0f 85 a9 01 00 00 0f 1f 44 00 00 4d 85 ed 74 0c
[211062.414587] kernel: RSP: 0018:ffffa7284169be40 EFLAGS: 00010206
[211062.414597] kernel: RAX: 00000000ffff93ad RBX: 0000000000000000 RCX: 0000000000000000
[211062.414604] kernel: RDX: ffffe90b1ec1d5c8 RSI: ffffa7284169be68 RDI: ffffe90b1ec1d588
[211062.414610] kernel: RBP: ffff93ad5ac70b78 R08: ffffe90b1ec1d5c8 R09: 00000000000000c6
[211062.414616] kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: ffff93ad5ac70b28
[211062.414623] kernel: R13: 0000000000000000 R14: ffffe90b1ec1d588 R15: 0017ffffc0000000
[211062.414630] kernel: FS:  0000000000000000(0000) GS:ffff93b48dd80000(0000) knlGS:0000000000000000
[211062.414638] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[211062.414645] kernel: CR2: 00007fc7fc188000 CR3: 00000004a5010002 CR4: 00000000003726e0
[211062.414652] kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[211062.414658] kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[211062.414664] kernel: Call Trace:
[211062.414671] kernel:  <TASK>
[211062.414685] kernel:  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[211062.414764] kernel:  __pagevec_release+0x1b/0x30
[211062.414780] kernel:  svc_xprt_release+0x1a1/0x200 [sunrpc]
[211062.414916] kernel:  svc_send+0x59/0x160 [sunrpc]
[211062.415034] kernel:  nfsd+0xd5/0x190 [nfsd]
[211062.415107] kernel:  kthread+0xe6/0x110
[211062.415120] kernel:  ? kthread_complete_and_exit+0x20/0x20
[211062.415134] kernel:  ret_from_fork+0x1f/0x30
[211062.415157] kernel:  </TASK>
[211062.415161] kernel: Modules linked in: ntfs3 tls snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap xt_recent xt_conntrack xt_hashlimit xt_addrtype xt_mark xt_TCPMSS nft_chain_nat xt_MASQUERADE xt_REDIRECT xt_multiport xt_nat xt_CT xt_NFLOG nfnetlink_log xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat nf_conntrack_sane rpcrdma nf_conntrack_tftp nf_conntrack_sip rdma_cm nf_conntrack_pptp iw_cm nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast ib_cm nf_conntrack_irc nf_conntrack_h323 ib_core nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun pppoe pppox ppp_generic slhc ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink bridge stp llc qrtr iwlmvm snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio intel_tcc_cooling mac80211 x86_pkg_temp_thermal intel_powerclamp
[211062.415343] kernel:  snd_hda_intel coretemp snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec iTCO_wdt snd_hda_core intel_pmc_bxt ee1004 iTCO_vendor_support snd_hwdep snd_seq mei_hdcp mei_pxp mei_wdt libarc4 kvm_intel iwlwifi snd_seq_device kvm cfg80211 snd_pcm irqbypass rapl snd_timer intel_cstate i2c_i801 intel_uncore think_lmi intel_wmi_thunderbolt wmi_bmof pcspkr snd rfkill firmware_attributes_class i2c_smbus soundcore joydev mei_me mei acpi_pad intel_pch_thermal nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram xfs i915 raid1 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni drm_buddy polyval_generic drm_display_helper r8169 ghash_clmulni_intel e1000e cec ttm wmi video uas usb_storage scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
[211062.415547] kernel: ---[ end trace 0000000000000000 ]---
[211063.953237] kernel: RIP: 0010:release_pages+0x46/0x590
[211063.953247] kernel: Code: 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a2 00 00 00 48 63 f6 31 db 49 89 fc 45 31 ed 48 8d 2c f7 4d 8b 3c 24 <49> 8b 47 08 a8 01 0f 85 a9 01 00 00 0f 1f 44 00 00 4d 85 ed 74 0c
[211063.953250] kernel: RSP: 0018:ffffa7284169be40 EFLAGS: 00010206
[211063.953253] kernel: RAX: 00000000ffff93ad RBX: 0000000000000000 RCX: 0000000000000000
[211063.953256] kernel: RDX: ffffe90b1ec1d5c8 RSI: ffffa7284169be68 RDI: ffffe90b1ec1d588
[211063.953258] kernel: RBP: ffff93ad5ac70b78 R08: ffffe90b1ec1d5c8 R09: 00000000000000c6
[211063.953259] kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: ffff93ad5ac70b28
[211063.953261] kernel: R13: 0000000000000000 R14: ffffe90b1ec1d588 R15: 0017ffffc0000000
[211063.953263] kernel: FS:  0000000000000000(0000) GS:ffff93b48dd80000(0000) knlGS:0000000000000000
[211063.953266] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[211063.953267] kernel: CR2: 00007fc7fc188000 CR3: 00000004a5010002 CR4: 00000000003726e0
[211063.953270] kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[211063.953271] kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
....and so on

Now I have restart with kernel 5 and wait for a kernel 6 solution (I'm not a developer and do not know how to debug).
Dario

Comment 4 Dario Lesca 2023-01-11 09:53:08 UTC
I have try last kernel released 6.0.18-300.fc37.x86_64, but none is changed

Come back to 5.18.6-200.fc36.x86_64

Comment 5 Dario Lesca 2023-01-16 18:11:36 UTC
I have try last kernel released 6.1.5-200.fc37.x86_64, but none is changed[1]

Come back to 5.18.6-200.fc36.x86_64

[1]
gen 15 19:22:07 igloo.home.solinos.it rpc.mountd[2054]: authenticated mount request from 192.168.61.88:765 for /multimedia (/multimedia)
gen 15 19:22:12 igloo.home.solinos.it kernel: general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#1] PREEMPT SMP PTI
gen 15 19:22:12 igloo.home.solinos.it kernel: CPU: 1 PID: 2196 Comm: nfsd Not tainted 6.1.5-200.fc37.x86_64 #1
gen 15 19:22:12 igloo.home.solinos.it kernel: Hardware name: LENOVO 30BGS1BV00/103D, BIOS S06KT40A 03/15/2019
gen 15 19:22:12 igloo.home.solinos.it kernel: RIP: 0010:release_pages+0x45/0x580
gen 15 19:22:12 igloo.home.solinos.it kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 15 19:22:12 igloo.home.solinos.it kernel: RSP: 0018:ffffa746412d7e40 EFLAGS: 00010206
gen 15 19:22:12 igloo.home.solinos.it kernel: RAX: 00000000ffff8ab4 RBX: 0000000000000000 RCX: 0000000000000000
gen 15 19:22:12 igloo.home.solinos.it kernel: RDX: ffffcf01c4bfe948 RSI: ffffa746412d7e68 RDI: ffffcf01c4bfe988
gen 15 19:22:12 igloo.home.solinos.it kernel: RBP: ffff8ab448784b78 R08: ffffcf01c4bfe948 R09: 00000000000000a3
gen 15 19:22:12 igloo.home.solinos.it kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
gen 15 19:22:12 igloo.home.solinos.it kernel: R13: 0000000000000000 R14: ffffcf01c4bfe988 R15: ffff8ab448784b28
gen 15 19:22:12 igloo.home.solinos.it kernel: FS:  0000000000000000(0000) GS:ffff8abb8dc40000(0000) knlGS:0000000000000000
gen 15 19:22:12 igloo.home.solinos.it kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 15 19:22:12 igloo.home.solinos.it kernel: CR2: 00007f0cf288f000 CR3: 0000000363010003 CR4: 00000000003726e0
gen 15 19:22:12 igloo.home.solinos.it kernel: Call Trace:
gen 15 19:22:12 igloo.home.solinos.it kernel:  <TASK>
gen 15 19:22:12 igloo.home.solinos.it kernel:  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
gen 15 19:22:12 igloo.home.solinos.it kernel:  __pagevec_release+0x1b/0x30
gen 15 19:22:12 igloo.home.solinos.it kernel:  svc_xprt_release+0x1a1/0x200 [sunrpc]
gen 15 19:22:12 igloo.home.solinos.it kernel:  svc_send+0x59/0x160 [sunrpc]
gen 15 19:22:12 igloo.home.solinos.it kernel:  nfsd+0xd5/0x190 [nfsd]
gen 15 19:22:12 igloo.home.solinos.it kernel:  kthread+0xe6/0x110
gen 15 19:22:12 igloo.home.solinos.it kernel:  ? kthread_complete_and_exit+0x20/0x20
gen 15 19:22:12 igloo.home.solinos.it kernel:  ret_from_fork+0x1f/0x30
gen 15 19:22:12 igloo.home.solinos.it kernel:  </TASK>
gen 15 19:22:12 igloo.home.solinos.it kernel: Modules linked in: tls snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap xt_recent xt_conntrack xt_hashlimit xt_addrtype xt_mark xt_TCPMSS nft_chain_nat xt_MASQUERADE xt_REDIRECT xt_multiport xt_nat xt_CT xt_NFLOG nfnetlink_log rpcrdma xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic rdma_cm nf_conntrack_snmp nf_nat_sip iw_cm nf_nat_pptp nf_nat_irc ib_cm nf_nat_h323 nf_nat_ftp ib_core nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun pppoe pppox ppp_generic slhc ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink bridge stp llc qrtr iwlmvm snd_hda_codec_hdmi mac80211 snd_hda_codec_realtek intel_rapl_msr intel_rapl_common snd_hda_codec_generic ledtrig_audio libarc4 iTCO_wdt intel_tcc_cooling mei_hdcp intel_pmc_bxt
gen 15 19:22:12 igloo.home.solinos.it kernel:  x86_pkg_temp_thermal snd_hda_intel mei_pxp intel_powerclamp snd_intel_dspcfg iTCO_vendor_support snd_intel_sdw_acpi coretemp mei_wdt ee1004 snd_usb_audio snd_hda_codec iwlwifi snd_hda_core snd_usbmidi_lib kvm_intel snd_rawmidi snd_hwdep kvm snd_seq snd_seq_device uvcvideo snd_pcm cfg80211 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 irqbypass rapl videobuf2_common snd_timer intel_cstate think_lmi mei_me videodev intel_wmi_thunderbolt intel_uncore wmi_bmof firmware_attributes_class pcspkr snd i2c_i801 rfkill i2c_smbus joydev mc mei soundcore intel_pch_thermal acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel raid1 sha512_ssse3 drm_buddy e1000e drm_display_helper cec uas ttm r8169 usb_storage video wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
gen 15 19:22:12 igloo.home.solinos.it kernel: ---[ end trace 0000000000000000 ]---
gen 15 19:22:13 igloo.home.solinos.it kernel: RIP: 0010:release_pages+0x45/0x580
gen 15 19:22:13 igloo.home.solinos.it kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 15 19:22:13 igloo.home.solinos.it kernel: RSP: 0018:ffffa746412d7e40 EFLAGS: 00010206
gen 15 19:22:13 igloo.home.solinos.it kernel: RAX: 00000000ffff8ab4 RBX: 0000000000000000 RCX: 0000000000000000
gen 15 19:22:13 igloo.home.solinos.it kernel: RDX: ffffcf01c4bfe948 RSI: ffffa746412d7e68 RDI: ffffcf01c4bfe988
gen 15 19:22:13 igloo.home.solinos.it kernel: RBP: ffff8ab448784b78 R08: ffffcf01c4bfe948 R09: 00000000000000a3
gen 15 19:22:13 igloo.home.solinos.it kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
gen 15 19:22:13 igloo.home.solinos.it kernel: R13: 0000000000000000 R14: ffffcf01c4bfe988 R15: ffff8ab448784b28
gen 15 19:22:13 igloo.home.solinos.it kernel: FS:  0000000000000000(0000) GS:ffff8abb8dc40000(0000) knlGS:0000000000000000
gen 15 19:22:13 igloo.home.solinos.it kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 15 19:22:13 igloo.home.solinos.it kernel: CR2: 00007f0cf288f000 CR3: 0000000363010003 CR4: 00000000003726e0
gen 15 19:22:13 igloo.home.solinos.it kernel: general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#2] PREEMPT SMP PTI
gen 15 19:22:13 igloo.home.solinos.it kernel: CPU: 0 PID: 2195 Comm: nfsd Tainted: G      D            6.1.5-200.fc37.x86_64 #1
gen 15 19:22:13 igloo.home.solinos.it kernel: Hardware name: LENOVO 30BGS1BV00/103D, BIOS S06KT40A 03/15/2019
gen 15 19:22:13 igloo.home.solinos.it kernel: RIP: 0010:release_pages+0x45/0x580
gen 15 19:22:13 igloo.home.solinos.it kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 15 19:22:13 igloo.home.solinos.it kernel: RSP: 0018:ffffa74641423e40 EFLAGS: 00010206
gen 15 19:22:13 igloo.home.solinos.it kernel: RAX: 00000000ffff8ab4 RBX: 0000000000000000 RCX: 0000000000000000
gen 15 19:22:13 igloo.home.solinos.it kernel: RDX: ffffcf01c4bfa988 RSI: ffffa74641423e68 RDI: ffffcf01c5e2a2c8
gen 15 19:22:13 igloo.home.solinos.it kernel: RBP: ffff8ab46021cb78 R08: ffffcf01c4bfa988 R09: 00000000000000c3
gen 15 19:22:13 igloo.home.solinos.it kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
gen 15 19:22:13 igloo.home.solinos.it kernel: R13: 0000000000000000 R14: ffffcf01c5e2a2c8 R15: ffff8ab46021cb28
gen 15 19:22:13 igloo.home.solinos.it kernel: FS:  0000000000000000(0000) GS:ffff8abb8dc00000(0000) knlGS:0000000000000000
gen 15 19:22:13 igloo.home.solinos.it kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 15 19:22:13 igloo.home.solinos.it kernel: CR2: 00007f1b5c5f7002 CR3: 0000000363010006 CR4: 00000000003726f0
gen 15 19:22:13 igloo.home.solinos.it kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
gen 15 19:22:13 igloo.home.solinos.it kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
gen 15 19:22:13 igloo.home.solinos.it kernel: Call Trace:
gen 15 19:22:13 igloo.home.solinos.it kernel:  <TASK>
gen 15 19:22:13 igloo.home.solinos.it kernel:  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
gen 15 19:22:13 igloo.home.solinos.it kernel:  __pagevec_release+0x1b/0x30
gen 15 19:22:13 igloo.home.solinos.it kernel:  svc_xprt_release+0x1a1/0x200 [sunrpc]
gen 15 19:22:13 igloo.home.solinos.it kernel:  svc_send+0x59/0x160 [sunrpc]
gen 15 19:22:13 igloo.home.solinos.it kernel:  nfsd+0xd5/0x190 [nfsd]
gen 15 19:22:13 igloo.home.solinos.it kernel:  kthread+0xe6/0x110
gen 15 19:22:13 igloo.home.solinos.it kernel:  ? kthread_complete_and_exit+0x20/0x20
gen 15 19:22:13 igloo.home.solinos.it kernel:  ret_from_fork+0x1f/0x30
gen 15 19:22:13 igloo.home.solinos.it kernel:  </TASK>
gen 15 19:22:13 igloo.home.solinos.it kernel: Modules linked in: tls snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap xt_recent xt_conntrack xt_hashlimit xt_addrtype xt_mark xt_TCPMSS nft_chain_nat xt_MASQUERADE xt_REDIRECT xt_multiport xt_nat xt_CT xt_NFLOG nfnetlink_log rpcrdma xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic rdma_cm nf_conntrack_snmp nf_nat_sip iw_cm nf_nat_pptp nf_nat_irc ib_cm nf_nat_h323 nf_nat_ftp ib_core nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun pppoe pppox ppp_generic slhc ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink bridge stp llc qrtr iwlmvm snd_hda_codec_hdmi mac80211 snd_hda_codec_realtek intel_rapl_msr intel_rapl_common snd_hda_codec_generic ledtrig_audio libarc4 iTCO_wdt intel_tcc_cooling mei_hdcp intel_pmc_bxt
gen 15 19:22:13 igloo.home.solinos.it kernel:  x86_pkg_temp_thermal snd_hda_intel mei_pxp intel_powerclamp snd_intel_dspcfg iTCO_vendor_support snd_intel_sdw_acpi coretemp mei_wdt ee1004 snd_usb_audio snd_hda_codec iwlwifi snd_hda_core snd_usbmidi_lib kvm_intel snd_rawmidi snd_hwdep kvm snd_seq snd_seq_device uvcvideo snd_pcm cfg80211 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 irqbypass rapl videobuf2_common snd_timer intel_cstate think_lmi mei_me videodev intel_wmi_thunderbolt intel_uncore wmi_bmof firmware_attributes_class pcspkr snd i2c_i801 rfkill i2c_smbus joydev mc mei soundcore intel_pch_thermal acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel raid1 sha512_ssse3 drm_buddy e1000e drm_display_helper cec uas ttm r8169 usb_storage video wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
gen 15 19:22:13 igloo.home.solinos.it kernel: ---[ end trace 0000000000000000 ]---
gen 15 19:22:13 igloo.home.solinos.it kernel: RIP: 0010:release_pages+0x45/0x580
gen 15 19:22:13 igloo.home.solinos.it kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 15 19:22:13 igloo.home.solinos.it kernel: RSP: 0018:ffffa746412d7e40 EFLAGS: 00010206
gen 15 19:22:13 igloo.home.solinos.it kernel: RAX: 00000000ffff8ab4 RBX: 0000000000000000 RCX: 0000000000000000
gen 15 19:22:13 igloo.home.solinos.it kernel: RDX: ffffcf01c4bfe948 RSI: ffffa746412d7e68 RDI: ffffcf01c4bfe988
gen 15 19:22:13 igloo.home.solinos.it kernel: RBP: ffff8ab448784b78 R08: ffffcf01c4bfe948 R09: 00000000000000a3
gen 15 19:22:13 igloo.home.solinos.it kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
gen 15 19:22:13 igloo.home.solinos.it kernel: R13: 0000000000000000 R14: ffffcf01c4bfe988 R15: ffff8ab448784b28
gen 15 19:22:13 igloo.home.solinos.it kernel: FS:  0000000000000000(0000) GS:ffff8abb8dc00000(0000) knlGS:0000000000000000
gen 15 19:22:13 igloo.home.solinos.it kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 15 19:22:13 igloo.home.solinos.it kernel: CR2: 00007f1b5c5f7002 CR3: 0000000363010006 CR4: 00000000003726f0
gen 15 19:22:13 igloo.home.solinos.it kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
gen 15 19:22:13 igloo.home.solinos.it kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
gen 15 19:22:20 igloo.home.solinos.it kernel: general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#3] PREEMPT SMP PTI
gen 15 19:22:20 igloo.home.solinos.it kernel: CPU: 2 PID: 2194 Comm: nfsd Tainted: G      D            6.1.5-200.fc37.x86_64 #1
gen 15 19:22:20 igloo.home.solinos.it kernel: Hardware name: LENOVO 30BGS1BV00/103D, BIOS S06KT40A 03/15/2019
gen 15 19:22:20 igloo.home.solinos.it kernel: RIP: 0010:release_pages+0x45/0x580
gen 15 19:22:20 igloo.home.solinos.it kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 15 19:22:20 igloo.home.solinos.it kernel: RSP: 0018:ffffa7464141be40 EFLAGS: 00010206
gen 15 19:22:20 igloo.home.solinos.it kernel: RAX: 00000000ffff8ab4 RBX: 0000000000000000 RCX: 0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: RDX: ffffcf01ce0b8708 RSI: ffffa7464141be68 RDI: ffffcf01c5e278c8
gen 15 19:22:20 igloo.home.solinos.it kernel: RBP: ffff8ab4607ecb78 R08: ffffcf01ce0b8708 R09: 00000000000000be
gen 15 19:22:20 igloo.home.solinos.it kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
gen 15 19:22:20 igloo.home.solinos.it kernel: R13: 0000000000000000 R14: ffffcf01c5e278c8 R15: ffff8ab4607ecb28
gen 15 19:22:20 igloo.home.solinos.it kernel: FS:  0000000000000000(0000) GS:ffff8abb8dc80000(0000) knlGS:0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 15 19:22:20 igloo.home.solinos.it kernel: CR2: 000055fd5e5d7000 CR3: 0000000363010003 CR4: 00000000003726e0
gen 15 19:22:20 igloo.home.solinos.it kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
gen 15 19:22:20 igloo.home.solinos.it kernel: Call Trace:
gen 15 19:22:20 igloo.home.solinos.it kernel:  <TASK>
gen 15 19:22:20 igloo.home.solinos.it kernel:  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
gen 15 19:22:20 igloo.home.solinos.it kernel:  __pagevec_release+0x1b/0x30
gen 15 19:22:20 igloo.home.solinos.it kernel:  svc_xprt_release+0x1a1/0x200 [sunrpc]
gen 15 19:22:20 igloo.home.solinos.it kernel:  svc_send+0x59/0x160 [sunrpc]
gen 15 19:22:20 igloo.home.solinos.it kernel:  nfsd+0xd5/0x190 [nfsd]
gen 15 19:22:20 igloo.home.solinos.it kernel:  kthread+0xe6/0x110
gen 15 19:22:20 igloo.home.solinos.it kernel:  ? kthread_complete_and_exit+0x20/0x20
gen 15 19:22:20 igloo.home.solinos.it kernel:  ret_from_fork+0x1f/0x30
gen 15 19:22:20 igloo.home.solinos.it kernel:  </TASK>
gen 15 19:22:20 igloo.home.solinos.it kernel: Modules linked in: tls snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap xt_recent xt_conntrack xt_hashlimit xt_addrtype xt_mark xt_TCPMSS nft_chain_nat xt_MASQUERADE xt_REDIRECT xt_multiport xt_nat xt_CT xt_NFLOG nfnetlink_log rpcrdma xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic rdma_cm nf_conntrack_snmp nf_nat_sip iw_cm nf_nat_pptp nf_nat_irc ib_cm nf_nat_h323 nf_nat_ftp ib_core nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun pppoe pppox ppp_generic slhc ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink bridge stp llc qrtr iwlmvm snd_hda_codec_hdmi mac80211 snd_hda_codec_realtek intel_rapl_msr intel_rapl_common snd_hda_codec_generic ledtrig_audio libarc4 iTCO_wdt intel_tcc_cooling mei_hdcp intel_pmc_bxt
gen 15 19:22:20 igloo.home.solinos.it kernel:  x86_pkg_temp_thermal snd_hda_intel mei_pxp intel_powerclamp snd_intel_dspcfg iTCO_vendor_support snd_intel_sdw_acpi coretemp mei_wdt ee1004 snd_usb_audio snd_hda_codec iwlwifi snd_hda_core snd_usbmidi_lib kvm_intel snd_rawmidi snd_hwdep kvm snd_seq snd_seq_device uvcvideo snd_pcm cfg80211 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 irqbypass rapl videobuf2_common snd_timer intel_cstate think_lmi mei_me videodev intel_wmi_thunderbolt intel_uncore wmi_bmof firmware_attributes_class pcspkr snd i2c_i801 rfkill i2c_smbus joydev mc mei soundcore intel_pch_thermal acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel raid1 sha512_ssse3 drm_buddy e1000e drm_display_helper cec uas ttm r8169 usb_storage video wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
gen 15 19:22:20 igloo.home.solinos.it kernel: ---[ end trace 0000000000000000 ]---
gen 15 19:22:20 igloo.home.solinos.it kernel: RIP: 0010:release_pages+0x45/0x580
gen 15 19:22:20 igloo.home.solinos.it kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 15 19:22:20 igloo.home.solinos.it kernel: RSP: 0018:ffffa746412d7e40 EFLAGS: 00010206
gen 15 19:22:20 igloo.home.solinos.it kernel: RAX: 00000000ffff8ab4 RBX: 0000000000000000 RCX: 0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: RDX: ffffcf01c4bfe948 RSI: ffffa746412d7e68 RDI: ffffcf01c4bfe988
gen 15 19:22:20 igloo.home.solinos.it kernel: RBP: ffff8ab448784b78 R08: ffffcf01c4bfe948 R09: 00000000000000a3
gen 15 19:22:20 igloo.home.solinos.it kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
gen 15 19:22:20 igloo.home.solinos.it kernel: R13: 0000000000000000 R14: ffffcf01c4bfe988 R15: ffff8ab448784b28
gen 15 19:22:20 igloo.home.solinos.it kernel: FS:  0000000000000000(0000) GS:ffff8abb8dc80000(0000) knlGS:0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 15 19:22:20 igloo.home.solinos.it kernel: CR2: 000055fd5e5d7000 CR3: 0000000363010003 CR4: 00000000003726e0
gen 15 19:22:20 igloo.home.solinos.it kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
gen 15 19:22:20 igloo.home.solinos.it kernel: general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#4] PREEMPT SMP PTI
gen 15 19:22:20 igloo.home.solinos.it kernel: CPU: 0 PID: 2193 Comm: nfsd Tainted: G      D            6.1.5-200.fc37.x86_64 #1
gen 15 19:22:20 igloo.home.solinos.it kernel: Hardware name: LENOVO 30BGS1BV00/103D, BIOS S06KT40A 03/15/2019
gen 15 19:22:20 igloo.home.solinos.it kernel: RIP: 0010:release_pages+0x45/0x580
gen 15 19:22:20 igloo.home.solinos.it kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 15 19:22:20 igloo.home.solinos.it kernel: RSP: 0018:ffffa74641413e40 EFLAGS: 00010206
gen 15 19:22:20 igloo.home.solinos.it kernel: RAX: 00000000ffff8ab4 RBX: 0000000000000000 RCX: 0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: RDX: ffffcf01c4bf2888 RSI: ffffa74641413e68 RDI: ffffcf01c466b508
gen 15 19:22:20 igloo.home.solinos.it kernel: RBP: ffff8ab448550b78 R08: ffffcf01c4bf2888 R09: 00000000000000be
gen 15 19:22:20 igloo.home.solinos.it kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
gen 15 19:22:20 igloo.home.solinos.it kernel: R13: 0000000000000000 R14: ffffcf01c466b508 R15: ffff8ab448550b28
gen 15 19:22:20 igloo.home.solinos.it kernel: FS:  0000000000000000(0000) GS:ffff8abb8dc00000(0000) knlGS:0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 15 19:22:20 igloo.home.solinos.it kernel: CR2: 00007f8b780013b0 CR3: 0000000363010006 CR4: 00000000003726f0
gen 15 19:22:20 igloo.home.solinos.it kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
gen 15 19:22:20 igloo.home.solinos.it kernel: Call Trace:
gen 15 19:22:20 igloo.home.solinos.it kernel:  <TASK>
gen 15 19:22:20 igloo.home.solinos.it kernel:  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
gen 15 19:22:20 igloo.home.solinos.it kernel:  __pagevec_release+0x1b/0x30
gen 15 19:22:20 igloo.home.solinos.it kernel:  svc_xprt_release+0x1a1/0x200 [sunrpc]
gen 15 19:22:20 igloo.home.solinos.it kernel:  svc_send+0x59/0x160 [sunrpc]
gen 15 19:22:20 igloo.home.solinos.it kernel:  nfsd+0xd5/0x190 [nfsd]
gen 15 19:22:20 igloo.home.solinos.it kernel:  kthread+0xe6/0x110
gen 15 19:22:20 igloo.home.solinos.it kernel:  ? kthread_complete_and_exit+0x20/0x20
gen 15 19:22:20 igloo.home.solinos.it kernel:  ret_from_fork+0x1f/0x30
gen 15 19:22:20 igloo.home.solinos.it kernel:  </TASK>
gen 15 19:22:20 igloo.home.solinos.it kernel: Modules linked in: tls snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap xt_recent xt_conntrack xt_hashlimit xt_addrtype xt_mark xt_TCPMSS nft_chain_nat xt_MASQUERADE xt_REDIRECT xt_multiport xt_nat xt_CT xt_NFLOG nfnetlink_log rpcrdma xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic rdma_cm nf_conntrack_snmp nf_nat_sip iw_cm nf_nat_pptp nf_nat_irc ib_cm nf_nat_h323 nf_nat_ftp ib_core nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun pppoe pppox ppp_generic slhc ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink bridge stp llc qrtr iwlmvm snd_hda_codec_hdmi mac80211 snd_hda_codec_realtek intel_rapl_msr intel_rapl_common snd_hda_codec_generic ledtrig_audio libarc4 iTCO_wdt intel_tcc_cooling mei_hdcp intel_pmc_bxt
gen 15 19:22:20 igloo.home.solinos.it kernel:  x86_pkg_temp_thermal snd_hda_intel mei_pxp intel_powerclamp snd_intel_dspcfg iTCO_vendor_support snd_intel_sdw_acpi coretemp mei_wdt ee1004 snd_usb_audio snd_hda_codec iwlwifi snd_hda_core snd_usbmidi_lib kvm_intel snd_rawmidi snd_hwdep kvm snd_seq snd_seq_device uvcvideo snd_pcm cfg80211 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 irqbypass rapl videobuf2_common snd_timer intel_cstate think_lmi mei_me videodev intel_wmi_thunderbolt intel_uncore wmi_bmof firmware_attributes_class pcspkr snd i2c_i801 rfkill i2c_smbus joydev mc mei soundcore intel_pch_thermal acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel raid1 sha512_ssse3 drm_buddy e1000e drm_display_helper cec uas ttm r8169 usb_storage video wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
gen 15 19:22:20 igloo.home.solinos.it kernel: ---[ end trace 0000000000000000 ]---
gen 15 19:22:20 igloo.home.solinos.it kernel: RIP: 0010:release_pages+0x45/0x580
gen 15 19:22:20 igloo.home.solinos.it kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 15 19:22:20 igloo.home.solinos.it kernel: RSP: 0018:ffffa746412d7e40 EFLAGS: 00010206
gen 15 19:22:20 igloo.home.solinos.it kernel: RAX: 00000000ffff8ab4 RBX: 0000000000000000 RCX: 0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: RDX: ffffcf01c4bfe948 RSI: ffffa746412d7e68 RDI: ffffcf01c4bfe988
gen 15 19:22:20 igloo.home.solinos.it kernel: RBP: ffff8ab448784b78 R08: ffffcf01c4bfe948 R09: 00000000000000a3
gen 15 19:22:20 igloo.home.solinos.it kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
gen 15 19:22:20 igloo.home.solinos.it kernel: R13: 0000000000000000 R14: ffffcf01c4bfe988 R15: ffff8ab448784b28
gen 15 19:22:20 igloo.home.solinos.it kernel: FS:  0000000000000000(0000) GS:ffff8abb8dc00000(0000) knlGS:0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 15 19:22:20 igloo.home.solinos.it kernel: CR2: 00007f8b780013b0 CR3: 0000000363010006 CR4: 00000000003726f0
gen 15 19:22:20 igloo.home.solinos.it kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
gen 15 19:22:20 igloo.home.solinos.it kernel: general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#5] PREEMPT SMP PTI
gen 15 19:22:20 igloo.home.solinos.it kernel: CPU: 4 PID: 2192 Comm: nfsd Tainted: G      D            6.1.5-200.fc37.x86_64 #1
gen 15 19:22:20 igloo.home.solinos.it kernel: Hardware name: LENOVO 30BGS1BV00/103D, BIOS S06KT40A 03/15/2019
gen 15 19:22:20 igloo.home.solinos.it kernel: RIP: 0010:release_pages+0x45/0x580
gen 15 19:22:20 igloo.home.solinos.it kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 15 19:22:20 igloo.home.solinos.it kernel: RSP: 0018:ffffa7464140be40 EFLAGS: 00010216
gen 15 19:22:20 igloo.home.solinos.it kernel: RAX: 00000000ffff8ab4 RBX: 0000000000000000 RCX: 0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: RDX: ffffcf01c4bde7c8 RSI: ffffa7464140be68 RDI: ffffcf01c4b9fb88
gen 15 19:22:20 igloo.home.solinos.it kernel: RBP: ffff8ab4607d0b70 R08: ffffcf01c4bde7c8 R09: 00000000000000be
gen 15 19:22:20 igloo.home.solinos.it kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
gen 15 19:22:20 igloo.home.solinos.it kernel: R13: 0000000000000000 R14: ffffcf01c4b9fb88 R15: ffff8ab4607d0b28
gen 15 19:22:20 igloo.home.solinos.it kernel: FS:  0000000000000000(0000) GS:ffff8abb8dd00000(0000) knlGS:0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 15 19:22:20 igloo.home.solinos.it kernel: CR2: 00007f99f8fdf000 CR3: 0000000363010004 CR4: 00000000003726e0
gen 15 19:22:20 igloo.home.solinos.it kernel: Call Trace:
gen 15 19:22:20 igloo.home.solinos.it kernel:  <TASK>
gen 15 19:22:20 igloo.home.solinos.it kernel:  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
gen 15 19:22:20 igloo.home.solinos.it kernel:  __pagevec_release+0x1b/0x30
gen 15 19:22:20 igloo.home.solinos.it kernel:  svc_xprt_release+0x1a1/0x200 [sunrpc]
gen 15 19:22:20 igloo.home.solinos.it kernel:  svc_send+0x59/0x160 [sunrpc]
gen 15 19:22:20 igloo.home.solinos.it kernel:  nfsd+0xd5/0x190 [nfsd]
gen 15 19:22:20 igloo.home.solinos.it kernel:  kthread+0xe6/0x110
gen 15 19:22:20 igloo.home.solinos.it kernel:  ? kthread_complete_and_exit+0x20/0x20
gen 15 19:22:20 igloo.home.solinos.it kernel:  ret_from_fork+0x1f/0x30
gen 15 19:22:20 igloo.home.solinos.it kernel:  </TASK>
gen 15 19:22:20 igloo.home.solinos.it kernel: Modules linked in: tls snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap xt_recent xt_conntrack xt_hashlimit xt_addrtype xt_mark xt_TCPMSS nft_chain_nat xt_MASQUERADE xt_REDIRECT xt_multiport xt_nat xt_CT xt_NFLOG nfnetlink_log rpcrdma xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic rdma_cm nf_conntrack_snmp nf_nat_sip iw_cm nf_nat_pptp nf_nat_irc ib_cm nf_nat_h323 nf_nat_ftp ib_core nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun pppoe pppox ppp_generic slhc ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink bridge stp llc qrtr iwlmvm snd_hda_codec_hdmi mac80211 snd_hda_codec_realtek intel_rapl_msr intel_rapl_common snd_hda_codec_generic ledtrig_audio libarc4 iTCO_wdt intel_tcc_cooling mei_hdcp intel_pmc_bxt
gen 15 19:22:20 igloo.home.solinos.it kernel:  x86_pkg_temp_thermal snd_hda_intel mei_pxp intel_powerclamp snd_intel_dspcfg iTCO_vendor_support snd_intel_sdw_acpi coretemp mei_wdt ee1004 snd_usb_audio snd_hda_codec iwlwifi snd_hda_core snd_usbmidi_lib kvm_intel snd_rawmidi snd_hwdep kvm snd_seq snd_seq_device uvcvideo snd_pcm cfg80211 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 irqbypass rapl videobuf2_common snd_timer intel_cstate think_lmi mei_me videodev intel_wmi_thunderbolt intel_uncore wmi_bmof firmware_attributes_class pcspkr snd i2c_i801 rfkill i2c_smbus joydev mc mei soundcore intel_pch_thermal acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel raid1 sha512_ssse3 drm_buddy e1000e drm_display_helper cec uas ttm r8169 usb_storage video wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
gen 15 19:22:20 igloo.home.solinos.it kernel: ---[ end trace 0000000000000000 ]---
gen 15 19:22:20 igloo.home.solinos.it kernel: RIP: 0010:release_pages+0x45/0x580
gen 15 19:22:20 igloo.home.solinos.it kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 15 19:22:20 igloo.home.solinos.it kernel: RSP: 0018:ffffa746412d7e40 EFLAGS: 00010206
gen 15 19:22:20 igloo.home.solinos.it kernel: RAX: 00000000ffff8ab4 RBX: 0000000000000000 RCX: 0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: RDX: ffffcf01c4bfe948 RSI: ffffa746412d7e68 RDI: ffffcf01c4bfe988
gen 15 19:22:20 igloo.home.solinos.it kernel: RBP: ffff8ab448784b78 R08: ffffcf01c4bfe948 R09: 00000000000000a3
gen 15 19:22:20 igloo.home.solinos.it kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
gen 15 19:22:20 igloo.home.solinos.it kernel: R13: 0000000000000000 R14: ffffcf01c4bfe988 R15: ffff8ab448784b28
gen 15 19:22:20 igloo.home.solinos.it kernel: FS:  0000000000000000(0000) GS:ffff8abb8dd00000(0000) knlGS:0000000000000000
gen 15 19:22:20 igloo.home.solinos.it kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 15 19:22:20 igloo.home.solinos.it kernel: CR2: 00007f99f8fdf000 CR3: 0000000363010004 CR4: 00000000003726e0
gen 15 19:23:01 igloo.home.solinos.it kernel: general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#6] PREEMPT SMP PTI
gen 15 19:23:01 igloo.home.solinos.it kernel: CPU: 1 PID: 2191 Comm: nfsd Tainted: G      D            6.1.5-200.fc37.x86_64 #1
gen 15 19:23:01 igloo.home.solinos.it kernel: Hardware name: LENOVO 30BGS1BV00/103D, BIOS S06KT40A 03/15/2019
gen 15 19:23:01 igloo.home.solinos.it kernel: RIP: 0010:release_pages+0x45/0x580
gen 15 19:23:01 igloo.home.solinos.it kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 15 19:23:01 igloo.home.solinos.it kernel: RSP: 0018:ffffa74641403e40 EFLAGS: 00010206
gen 15 19:23:01 igloo.home.solinos.it kernel: RAX: 00000000ffff8ab4 RBX: 0000000000000000 RCX: 0000000000000000
gen 15 19:23:01 igloo.home.solinos.it kernel: RDX: ffffcf01c4bfde48 RSI: ffffa74641403e68 RDI: ffffcf01c46a92c8
gen 15 19:23:01 igloo.home.solinos.it kernel: RBP: ffff8ab46bd38b78 R08: ffffcf01c4bfde48 R09: 0000000000000000
gen 15 19:23:01 igloo.home.solinos.it kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0017ffffc0000000
gen 15 19:23:01 igloo.home.solinos.it kernel: R13: 0000000000000000 R14: ffffcf01c46a92c8 R15: ffff8ab46bd38b28
gen 15 19:23:01 igloo.home.solinos.it kernel: FS:  0000000000000000(0000) GS:ffff8abb8dc40000(0000) knlGS:0000000000000000
gen 15 19:23:01 igloo.home.solinos.it kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 15 19:23:01 igloo.home.solinos.it kernel: CR2: 00007f92a3ffefff CR3: 0000000363010004 CR4: 00000000003726e0
gen 15 19:23:01 igloo.home.solinos.it kernel: Call Trace:
gen 15 19:23:01 igloo.home.solinos.it kernel:  <TASK>
gen 15 19:23:01 igloo.home.solinos.it kernel:  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
gen 15 19:23:01 igloo.home.solinos.it kernel:  __pagevec_release+0x1b/0x30
gen 15 19:23:01 igloo.home.solinos.it kernel:  svc_xprt_release+0x1a1/0x200 [sunrpc]
gen 15 19:23:01 igloo.home.solinos.it kernel:  svc_send+0x59/0x160 [sunrpc]
gen 15 19:23:01 igloo.home.solinos.it kernel:  nfsd+0xd5/0x190 [nfsd]
gen 15 19:23:01 igloo.home.solinos.it kernel:  kthread+0xe6/0x110
gen 15 19:23:01 igloo.home.solinos.it kernel:  ? kthread_complete_and_exit+0x20/0x20
gen 15 19:23:01 igloo.home.solinos.it kernel:  ret_from_fork+0x1f/0x30
gen 15 19:23:01 igloo.home.solinos.it kernel:  </TASK>
gen 15 19:23:01 igloo.home.solinos.it kernel: Modules linked in: tls snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap xt_recent xt_conntrack xt_hashlimit xt_addrtype xt_mark xt_TCPMSS nft_chain_nat xt_MASQUERADE xt_REDIRECT xt_multiport xt_nat xt_CT xt_NFLOG nfnetlink_log rpcrdma xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic rdma_cm nf_conntrack_snmp nf_nat_sip iw_cm nf_nat_pptp nf_nat_irc ib_cm nf_nat_h323 nf_nat_ftp ib_core nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun pppoe pppox ppp_generic slhc ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink bridge stp llc qrtr iwlmvm snd_hda_codec_hdmi mac80211 snd_hda_codec_realtek intel_rapl_msr intel_rapl_common snd_hda_codec_generic ledtrig_audio libarc4 iTCO_wdt intel_tcc_cooling mei_hdcp intel_pmc_bxt
gen 15 19:23:01 igloo.home.solinos.it kernel:  x86_pkg_temp_thermal snd_hda_intel mei_pxp intel_powerclamp snd_intel_dspcfg iTCO_vendor_support snd_intel_sdw_acpi coretemp mei_wdt ee1004 snd_usb_audio snd_hda_codec iwlwifi snd_hda_core snd_usbmidi_lib kvm_intel snd_rawmidi snd_hwdep kvm snd_seq snd_seq_device uvcvideo snd_pcm cfg80211 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 irqbypass rapl videobuf2_common snd_timer intel_cstate think_lmi mei_me videodev intel_wmi_thunderbolt intel_uncore wmi_bmof firmware_attributes_class pcspkr snd i2c_i801 rfkill i2c_smbus joydev mc mei soundcore intel_pch_thermal acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel raid1 sha512_ssse3 drm_buddy e1000e drm_display_helper cec uas ttm r8169 usb_storage video wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
gen 15 19:23:01 igloo.home.solinos.it kernel: ---[ end trace 0000000000000000 ]---
gen 15 19:23:01 igloo.home.solinos.it kernel: RIP: 0010:release_pages+0x45/0x580
gen 15 19:23:01 igloo.home.solinos.it kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 15 19:23:01 igloo.home.solinos.it kernel: RSP: 0018:ffffa746412d7e40 EFLAGS: 00010206
gen 15 19:23:01 igloo.home.solinos.it kernel: RAX: 00000000ffff8ab4 RBX: 0000000000000000 RCX: 0000000000000000
gen 15 19:23:01 igloo.home.solinos.it kernel: RDX: ffffcf01c4bfe948 RSI: ffffa746412d7e68 RDI: ffffcf01c4bfe988
gen 15 19:23:01 igloo.home.solinos.it kernel: RBP: ffff8ab448784b78 R08: ffffcf01c4bfe948 R09: 00000000000000a3
gen 15 19:23:01 igloo.home.solinos.it kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
gen 15 19:23:01 igloo.home.solinos.it kernel: R13: 0000000000000000 R14: ffffcf01c4bfe988 R15: ffff8ab448784b28
gen 15 19:23:01 igloo.home.solinos.it kernel: FS:  0000000000000000(0000) GS:ffff8abb8dc40000(0000) knlGS:0000000000000000
gen 15 19:23:01 igloo.home.solinos.it kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 15 19:23:01 igloo.home.solinos.it kernel: CR2: 00007f92a3ffefff CR3: 0000000363010004 CR4: 00000000003726e0
gen 15 19:23:01 igloo.home.solinos.it kernel: general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#7] PREEMPT SMP PTI
gen 15 19:23:01 igloo.home.solinos.it kernel: CPU: 0 PID: 2190 Comm: nfsd Tainted: G      D            6.1.5-200.fc37.x86_64 #1
gen 15 19:23:01 igloo.home.solinos.it kernel: Hardware name: LENOVO 30BGS1BV00/103D, BIOS S06KT40A 03/15/2019
gen 15 19:23:01 igloo.home.solinos.it kernel: RIP: 0010:release_pages+0x45/0x580
gen 15 19:23:01 igloo.home.solinos.it kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 15 19:23:01 igloo.home.solinos.it kernel: RSP: 0018:ffffa74640b17e40 EFLAGS: 00010206
gen 15 19:23:01 igloo.home.solinos.it kernel: RAX: 00000000ffff8ab4 RBX: 0000000000000000 RCX: 0000000000000000
gen 15 19:23:01 igloo.home.solinos.it kernel: RDX: ffffcf01c4bd82c8 RSI: ffffa74640b17e68 RDI: ffffcf01c466b488
gen 15 19:23:01 igloo.home.solinos.it kernel: RBP: ffff8ab46bcd0b78 R08: ffffcf01c4bd82c8 R09: 00000000000000af
gen 15 19:23:01 igloo.home.solinos.it kernel: R10: 000000000000e7e8 R11: 0000000000000000 R12: 0017ffffc0000000
gen 15 19:23:01 igloo.home.solinos.it kernel: R13: 0000000000000000 R14: ffffcf01c466b488 R15: ffff8ab46bcd0b28
gen 15 19:23:01 igloo.home.solinos.it kernel: FS:  0000000000000000(0000) GS:ffff8abb8dc00000(0000) knlGS:0000000000000000
gen 15 19:23:01 igloo.home.solinos.it kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 15 19:23:01 igloo.home.solinos.it kernel: CR2: 00007f99f8fdf000 CR3: 0000000363010002 CR4: 00000000003726f0
gen 15 19:23:01 igloo.home.solinos.it kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
gen 15 19:23:01 igloo.home.solinos.it kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
gen 15 19:23:01 igloo.home.solinos.it kernel: Call Trace:
gen 15 19:23:01 igloo.home.solinos.it kernel:  <TASK>
gen 15 19:23:01 igloo.home.solinos.it kernel:  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
gen 15 19:23:01 igloo.home.solinos.it kernel:  __pagevec_release+0x1b/0x30
gen 15 19:23:01 igloo.home.solinos.it kernel:  svc_xprt_release+0x1a1/0x200 [sunrpc]
gen 15 19:23:01 igloo.home.solinos.it kernel:  svc_send+0x59/0x160 [sunrpc]
gen 15 19:23:01 igloo.home.solinos.it kernel:  nfsd+0xd5/0x190 [nfsd]
gen 15 19:23:01 igloo.home.solinos.it kernel:  kthread+0xe6/0x110
gen 15 19:23:01 igloo.home.solinos.it kernel:  ? kthread_complete_and_exit+0x20/0x20
gen 15 19:23:01 igloo.home.solinos.it kernel:  ret_from_fork+0x1f/0x30
gen 15 19:23:01 igloo.home.solinos.it kernel:  </TASK>
gen 15 19:23:01 igloo.home.solinos.it kernel: Modules linked in: tls snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap xt_recent xt_conntrack xt_hashlimit xt_addrtype xt_mark xt_TCPMSS nft_chain_nat xt_MASQUERADE xt_REDIRECT xt_multiport xt_nat xt_CT xt_NFLOG nfnetlink_log rpcrdma xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic rdma_cm nf_conntrack_snmp nf_nat_sip iw_cm nf_nat_pptp nf_nat_irc ib_cm nf_nat_h323 nf_nat_ftp ib_core nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun pppoe pppox ppp_generic slhc ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink bridge stp llc qrtr iwlmvm snd_hda_codec_hdmi mac80211 snd_hda_codec_realtek intel_rapl_msr intel_rapl_common snd_hda_codec_generic ledtrig_audio libarc4 iTCO_wdt intel_tcc_cooling mei_hdcp intel_pmc_bxt
gen 15 19:23:01 igloo.home.solinos.it kernel:  x86_pkg_temp_thermal snd_hda_intel mei_pxp intel_powerclamp snd_intel_dspcfg iTCO_vendor_support snd_intel_sdw_acpi coretemp mei_wdt ee1004 snd_usb_audio snd_hda_codec iwlwifi snd_hda_core snd_usbmidi_lib kvm_intel snd_rawmidi snd_hwdep kvm snd_seq snd_seq_device uvcvideo snd_pcm cfg80211 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 irqbypass rapl videobuf2_common snd_timer intel_cstate think_lmi mei_me videodev intel_wmi_thunderbolt intel_uncore wmi_bmof firmware_attributes_class pcspkr snd i2c_i801 rfkill i2c_smbus joydev mc mei soundcore intel_pch_thermal acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel raid1 sha512_ssse3 drm_buddy e1000e drm_display_helper cec uas ttm r8169 usb_storage video wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
gen 15 19:23:01 igloo.home.solinos.it kernel: ---[ end trace 0000000000000000 ]---
gen 15 19:23:01 igloo.home.solinos.it kernel: RIP: 0010:release_pages+0x45/0x580
gen 15 19:23:01 igloo.home.solinos.it kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 15 19:23:01 igloo.home.solinos.it kernel: RSP: 0018:ffffa746412d7e40 EFLAGS: 00010206
gen 15 19:23:01 igloo.home.solinos.it kernel: RAX: 00000000ffff8ab4 RBX: 0000000000000000 RCX: 0000000000000000
gen 15 19:23:01 igloo.home.solinos.it kernel: RDX: ffffcf01c4bfe948 RSI: ffffa746412d7e68 RDI: ffffcf01c4bfe988
gen 15 19:23:01 igloo.home.solinos.it kernel: RBP: ffff8ab448784b78 R08: ffffcf01c4bfe948 R09: 00000000000000a3
gen 15 19:23:01 igloo.home.solinos.it kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
gen 15 19:23:01 igloo.home.solinos.it kernel: R13: 0000000000000000 R14: ffffcf01c4bfe988 R15: ffff8ab448784b28
gen 15 19:23:01 igloo.home.solinos.it kernel: FS:  0000000000000000(0000) GS:ffff8abb8dc00000(0000) knlGS:0000000000000000
gen 15 19:23:01 igloo.home.solinos.it kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 15 19:23:01 igloo.home.solinos.it kernel: CR2: 00007f99f8fdf000 CR3: 0000000363010002 CR4: 00000000003726f0
gen 15 19:23:01 igloo.home.solinos.it kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
gen 15 19:23:01 igloo.home.solinos.it kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Comment 6 Dario Lesca 2023-02-01 14:07:03 UTC
After last update with kernel 6.1.8-200.fc37.x86_64 none is change[1]

As soon as I start using NFS the server crashes

Roll back to 5.18.6-200.fc36.x86_64

[1]
gen 31 21:34:00 kernel: general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#1] PREEMPT SMP PTI
gen 31 21:34:00 kernel: CPU: 3 PID: 2310 Comm: nfsd Not tainted 6.1.8-200.fc37.x86_64 #1
gen 31 21:34:00 kernel: Hardware name: LENOVO 30BGS1BV00/103D, BIOS S06KT40A 03/15/2019
gen 31 21:34:00 kernel: RIP: 0010:release_pages+0x45/0x580
gen 31 21:34:00 kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 31 21:34:00 kernel: RSP: 0018:ffffa77901473e40 EFLAGS: 00010206
gen 31 21:34:00 kernel: RAX: 00000000ffff8fc1 RBX: 0000000000000000 RCX: 0000000000000000
gen 31 21:34:00 kernel: RDX: ffffdc6b84c36788 RSI: ffffa77901473e68 RDI: ffffdc6b848145c8
gen 31 21:34:00 kernel: RBP: ffff8fc16d9c8b78 R08: ffffdc6b84c36788 R09: 000000008fd9a10c
gen 31 21:34:00 kernel: R10: 00000000000005a8 R11: 0000000000008550 R12: 0017ffffc0000000
gen 31 21:34:00 kernel: R13: 0000000000000000 R14: ffffdc6b848145c8 R15: ffff8fc16d9c8b28
gen 31 21:34:00 kernel: FS:  0000000000000000(0000) GS:ffff8fc88dcc0000(0000) knlGS:0000000000000000
gen 31 21:34:00 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 31 21:34:00 kernel: CR2: 00007fb66b562002 CR3: 0000000361010002 CR4: 00000000003726e0
gen 31 21:34:00 kernel: Call Trace:
gen 31 21:34:00 kernel:  <TASK>
gen 31 21:34:00 kernel:  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
gen 31 21:34:00 kernel:  __pagevec_release+0x1b/0x30
gen 31 21:34:00 kernel:  svc_xprt_release+0x1a1/0x200 [sunrpc]
gen 31 21:34:00 kernel:  svc_send+0x59/0x160 [sunrpc]
gen 31 21:34:00 kernel:  nfsd+0xd5/0x190 [nfsd]
gen 31 21:34:00 kernel:  kthread+0xe6/0x110
gen 31 21:34:00 kernel:  ? kthread_complete_and_exit+0x20/0x20
gen 31 21:34:00 kernel:  ret_from_fork+0x1f/0x30
gen 31 21:34:00 kernel:  </TASK>
gen 31 21:34:00 kernel: Modules linked in: tls snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap xt_recent xt_conntrack xt_hashlimit xt_addrtype xt_mark xt_TCPMSS nft_chain_nat xt_MASQUERADE xt_REDIRECT xt_multiport xt_nat xt_CT xt_NFLOG nfnetlink_log xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rpcrdma rdma_cm iw_cm ib_cm ib_core tun pppoe pppox ppp_generic slhc ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink bridge stp llc qrtr intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi iwlmvm intel_tcc_cooling snd_hda_codec_realtek x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_generic ledtrig_audio mac80211 iTCO_wdt
gen 31 21:34:00 kernel:  kvm_intel mei_hdcp intel_pmc_bxt ee1004 mei_pxp iTCO_vendor_support mei_wdt snd_hda_intel libarc4 snd_intel_dspcfg kvm snd_intel_sdw_acpi snd_usb_audio snd_hda_codec snd_usbmidi_lib iwlwifi uvcvideo irqbypass snd_hda_core snd_rawmidi snd_hwdep rapl videobuf2_vmalloc videobuf2_memops snd_seq cfg80211 videobuf2_v4l2 snd_seq_device intel_cstate think_lmi snd_pcm videobuf2_common intel_uncore i2c_i801 firmware_attributes_class wmi_bmof intel_wmi_thunderbolt pcspkr i2c_smbus snd_timer videodev rfkill snd joydev mc soundcore mei_me acpi_pad mei intel_pch_thermal nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel raid1 sha512_ssse3 e1000e drm_buddy drm_display_helper cec r8169 ttm video wmi uas usb_storage scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
gen 31 21:34:00 kernel: ---[ end trace 0000000000000000 ]---
gen 31 21:34:00 kernel: RIP: 0010:release_pages+0x45/0x580
gen 31 21:34:00 kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 31 21:34:00 kernel: RSP: 0018:ffffa77901473e40 EFLAGS: 00010206
gen 31 21:34:00 kernel: RAX: 00000000ffff8fc1 RBX: 0000000000000000 RCX: 0000000000000000
gen 31 21:34:00 kernel: RDX: ffffdc6b84c36788 RSI: ffffa77901473e68 RDI: ffffdc6b848145c8
gen 31 21:34:00 kernel: RBP: ffff8fc16d9c8b78 R08: ffffdc6b84c36788 R09: 000000008fd9a10c
gen 31 21:34:00 kernel: R10: 00000000000005a8 R11: 0000000000008550 R12: 0017ffffc0000000
gen 31 21:34:00 kernel: R13: 0000000000000000 R14: ffffdc6b848145c8 R15: ffff8fc16d9c8b28
gen 31 21:34:00 kernel: FS:  0000000000000000(0000) GS:ffff8fc88dcc0000(0000) knlGS:0000000000000000
gen 31 21:34:00 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 31 21:34:00 kernel: CR2: 00007fb66b562002 CR3: 0000000361010002 CR4: 00000000003726e0
gen 31 21:34:00 kernel: general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#2] PREEMPT SMP PTI
gen 31 21:34:00 kernel: CPU: 7 PID: 2309 Comm: nfsd Tainted: G      D            6.1.8-200.fc37.x86_64 #1
gen 31 21:34:00 kernel: Hardware name: LENOVO 30BGS1BV00/103D, BIOS S06KT40A 03/15/2019
gen 31 21:34:00 kernel: RIP: 0010:release_pages+0x45/0x580
gen 31 21:34:00 kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 31 21:34:00 kernel: RSP: 0018:ffffa7790145be40 EFLAGS: 00010206
gen 31 21:34:00 kernel: RAX: 00000000ffff8fc1 RBX: 0000000000000000 RCX: 0000000000000000
gen 31 21:34:00 kernel: RDX: ffffdc6b85cea608 RSI: ffffa7790145be68 RDI: ffffdc6b8481cc88
gen 31 21:34:00 kernel: RBP: ffff8fc16d9c4b78 R08: ffffdc6b85cea608 R09: 00000000000000b2
gen 31 21:34:00 kernel: R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
gen 31 21:34:00 kernel: R13: 0000000000000000 R14: ffffdc6b8481cc88 R15: ffff8fc16d9c4b28
gen 31 21:34:00 kernel: FS:  0000000000000000(0000) GS:ffff8fc88ddc0000(0000) knlGS:0000000000000000
gen 31 21:34:00 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 31 21:34:00 kernel: CR2: 00007f48d8253000 CR3: 0000000361010006 CR4: 00000000003726e0
gen 31 21:34:00 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
gen 31 21:34:00 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
gen 31 21:34:00 kernel: Call Trace:
gen 31 21:34:00 kernel:  <TASK>
gen 31 21:34:00 kernel:  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
gen 31 21:34:00 kernel:  __pagevec_release+0x1b/0x30
gen 31 21:34:00 kernel:  svc_xprt_release+0x1a1/0x200 [sunrpc]
gen 31 21:34:00 kernel:  svc_send+0x59/0x160 [sunrpc]
gen 31 21:34:00 kernel:  nfsd+0xd5/0x190 [nfsd]
gen 31 21:34:00 kernel:  kthread+0xe6/0x110
gen 31 21:34:00 kernel:  ? kthread_complete_and_exit+0x20/0x20
gen 31 21:34:00 kernel:  ret_from_fork+0x1f/0x30
gen 31 21:34:00 kernel:  </TASK>
gen 31 21:34:00 kernel: Modules linked in: tls snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap xt_recent xt_conntrack xt_hashlimit xt_addrtype xt_mark xt_TCPMSS nft_chain_nat xt_MASQUERADE xt_REDIRECT xt_multiport xt_nat xt_CT xt_NFLOG nfnetlink_log xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rpcrdma rdma_cm iw_cm ib_cm ib_core tun pppoe pppox ppp_generic slhc ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink bridge stp llc qrtr intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi iwlmvm intel_tcc_cooling snd_hda_codec_realtek x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_generic ledtrig_audio mac80211 iTCO_wdt
gen 31 21:34:00 kernel:  kvm_intel mei_hdcp intel_pmc_bxt ee1004 mei_pxp iTCO_vendor_support mei_wdt snd_hda_intel libarc4 snd_intel_dspcfg kvm snd_intel_sdw_acpi snd_usb_audio snd_hda_codec snd_usbmidi_lib iwlwifi uvcvideo irqbypass snd_hda_core snd_rawmidi snd_hwdep rapl videobuf2_vmalloc videobuf2_memops snd_seq cfg80211 videobuf2_v4l2 snd_seq_device intel_cstate think_lmi snd_pcm videobuf2_common intel_uncore i2c_i801 firmware_attributes_class wmi_bmof intel_wmi_thunderbolt pcspkr i2c_smbus snd_timer videodev rfkill snd joydev mc soundcore mei_me acpi_pad mei intel_pch_thermal nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel raid1 sha512_ssse3 e1000e drm_buddy drm_display_helper cec r8169 ttm video wmi uas usb_storage scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
gen 31 21:34:00 kernel: ---[ end trace 0000000000000000 ]---
gen 31 21:34:00 kernel: RIP: 0010:release_pages+0x45/0x580
gen 31 21:34:00 kernel: Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
gen 31 21:34:00 kernel: RSP: 0018:ffffa77901473e40 EFLAGS: 00010206
gen 31 21:34:00 kernel: RAX: 00000000ffff8fc1 RBX: 0000000000000000 RCX: 0000000000000000
gen 31 21:34:00 kernel: RDX: ffffdc6b84c36788 RSI: ffffa77901473e68 RDI: ffffdc6b848145c8
gen 31 21:34:00 kernel: RBP: ffff8fc16d9c8b78 R08: ffffdc6b84c36788 R09: 000000008fd9a10c
gen 31 21:34:00 kernel: R10: 00000000000005a8 R11: 0000000000008550 R12: 0017ffffc0000000
gen 31 21:34:00 kernel: R13: 0000000000000000 R14: ffffdc6b848145c8 R15: ffff8fc16d9c8b28
gen 31 21:34:00 kernel: FS:  0000000000000000(0000) GS:ffff8fc88ddc0000(0000) knlGS:0000000000000000
gen 31 21:34:00 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gen 31 21:34:00 kernel: CR2: 00007f48d8253000 CR3: 0000000361010006 CR4: 00000000003726e0
gen 31 21:34:00 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
gen 31 21:34:00 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Comment 7 David Critch 2023-02-03 14:36:28 UTC
I'm having the same problem w/ kernel 6+.

If the client is also Linux (Fedora or RHCOS at least) it can access and r/w to the NFS share fine. However I have a couple of Android based devices and as soon as they try and access the NFS server, I get the same crash/symptoms in the original post.

Comment 8 Benjamin Coddington 2023-02-26 12:04:43 UTC
I believe the fix for this one is
0b3a551fa58b nfsd: fix handling of cached open files in nfsd4_open codepath

.. which is on the v6.2 upstream kernel release.  The problem affects kernels v5.19 through 6.1.

Comment 9 Jeff Layton 2023-02-26 13:17:09 UTC
(In reply to Benjamin Coddington from comment #8)
> I believe the fix for this one is
> 0b3a551fa58b nfsd: fix handling of cached open files in nfsd4_open codepath
> 

I'm not sure I agree here. We did have some problems around shutdown (particularly with containerized nfsd), but the stack traces here look quite different from the ones that prompted 0b3a551fa58b. v6.1.9 and up should have all of the relevant filecache fixes, so it'd be good to try those kernels first to rule that out. If that doesn't help, we may want to take a look at a vmcore.


(In reply to David Critch from comment #7)
> I'm having the same problem w/ kernel 6+.
> 
> If the client is also Linux (Fedora or RHCOS at least) it can access and r/w
> to the NFS share fine. However I have a couple of Android based devices and
> as soon as they try and access the NFS server, I get the same crash/symptoms
> in the original post.

Very interesting. It might be nice to see a capture of the NFS traffic that prompts this (but please try a more recent kernel first to rule out the known nfsd filecache bugs).

Comment 10 Dario Lesca 2023-02-26 19:34:23 UTC
I use as client a raspberry P4 with OSMC and Kody on Debian and on my server I get this problem (NFS server freeze) when I mount the share from OSMC user's control panel and I browse NFS share and use the URL nfs://ip/share/ manner.

if I mount the folder manually or via fstab the problem do not occur, all work fine.

Now I have resolve this issue mount my server share via /etc/fstab and remove all share previous setup via nfs:// URL manage from applications (Kody?) and not visible like a classic filesystem mount via mount command or df.

Hope this help. 
Dario

Comment 11 Dario Lesca 2023-02-28 10:30:42 UTC
(In reply to Dario Lesca from comment #10)
> I use as client a raspberry P4 with OSMC and Kody on Debian

About Kodi (sorry, not Kody)

I have fount this:
https://forum.kodi.tv/showthread.php?tid=370054&pid=3142258#pid3142258

and this:
https://community.ipfire.org/t/nfs-compatibility-kodi-libnfs/8342

Comment 12 Jeff Layton 2023-02-28 12:13:39 UTC
To be clear, if anyone is able to reproduce this bug on a v6.1.9 or later kernel. Please speak up (and post a stack trace).

Comment 13 David Critch 2023-02-28 13:45:56 UTC
I can reproduce on 6.1.12-200.fc37.x86_64. And the other comments align with my case. The nodes that work (Fedora, RHCOS) would be accessing the shares via standard NFS mounts, while those other clients are running some kodi/weird nfs client thing.

I'll get a stack trace this evening and share.

Comment 14 David Critch 2023-03-02 00:55:56 UTC
Here's a stack trace on 6.1.13-200.fc37.x86_64:

[  301.866758] kernel BUG at include/linux/mm.h:1129!
[  301.868684] invalid opcode: 0000 [#5] PREEMPT SMP PTI
[  301.870469] CPU: 1 PID: 2104 Comm: nfsd Kdump: loaded Tainted: G      D            6.1.13-200.fc37.x86_64 #1
[  301.872257] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77X-D3H, BIOS F16 10/24/2012
[  301.874394] RIP: 0010:svc_rqst_replace_page+0xd4/0xe0 [sunrpc]
[  301.876236] Code: 32 42 43 d2 48 8b 83 f0 0a 00 00 48 8b 34 24 48 8b 10 0f b6 83 00 0b 00 00 e9 5f ff ff ff 48 c7 c6 30 e8 ec c0 e8 9c 32 46 d2 <0f> 0b 48 83 ef 01 e9 6c ff ff ff 90 0f 1f 44 00 00 41 56 49 89 ce
[  301.878103] RSP: 0018:ffffb57d00da7ca0 EFLAGS: 00010282
[  301.880032] RAX: 000000000000005c RBX: ffff92145f5a8000 RCX: 0000000000000000
[  301.881959] RDX: 0000000000000001 RSI: ffffffff94749b33 RDI: 00000000ffffffff
[  301.883870] RBP: ffffe3a605899dc0 R08: 0000000000000000 R09: ffffb57d00da7b00
[  301.885780] R10: 0000000000000003 R11: ffffffff95147448 R12: ffff92145f5a8000
[  301.887596] R13: ffffb57d00da7d90 R14: 0000000000007f0c R15: ffff92144ae0c9e0
[  301.889332] FS:  0000000000000000(0000) GS:ffff921b3f480000(0000) knlGS:0000000000000000
[  301.891071] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  301.892815] CR2: 00007fa2bff942d0 CR3: 0000000402010004 CR4: 00000000001706e0
[  301.894560] Call Trace:
[  301.896327]  <TASK>
[  301.898063]  nfsd_splice_actor+0x4e/0x90 [nfsd]
[  301.899855]  __splice_from_pipe+0x8a/0x1b0
[  301.901593]  ? nfsd_direct_splice_actor+0x20/0x20 [nfsd]
[  301.903356]  nfsd_direct_splice_actor+0x11/0x20 [nfsd]
[  301.905107]  splice_direct_to_actor+0xc8/0x1d0
[  301.906836]  ? fsid_source+0x60/0x60 [nfsd]
[  301.908599]  nfsd_splice_read+0x6b/0xf0 [nfsd]
[  301.910371]  nfsd_read+0x11d/0x180 [nfsd]
[  301.912124]  nfsd3_proc_read+0x156/0x210 [nfsd]
[  301.913882]  nfsd_dispatch+0x16a/0x280 [nfsd]
[  301.915638]  svc_process_common+0x265/0x5c0 [sunrpc]
[  301.917414]  ? nfsd_svc+0x360/0x360 [nfsd]
[  301.919163]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  301.920923]  svc_process+0xad/0x100 [sunrpc]
[  301.922693]  nfsd+0xd5/0x190 [nfsd]
[  301.924439]  kthread+0xe9/0x110
[  301.926152]  ? kthread_complete_and_exit+0x20/0x20
[  301.927859]  ret_from_fork+0x22/0x30
[  301.929566]  </TASK>
[  301.931284] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl dm_crypt rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink sunrpc snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_via snd_hda_codec_generic coretemp ledtrig_audio kvm_intel snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm iTCO_wdt snd_hda_codec intel_pmc_bxt at24 iTCO_vendor_support mei_pxp mei_hdcp irqbypass rapl snd_hda_core intel_cstate snd_hwdep snd_pcsp i2c_i801 snd_seq mxm_wmi intel_uncore snd_seq_device snd_pcm i2c_smbus snd_timer lpc_ich joydev mei_me alx e1000e snd mei soundcore mdio fuse zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3
[  301.931370]  drm_buddy drm_display_helper video cec wmi ttm
[  301.942585] ---[ end trace 0000000000000000 ]---
[  301.945377] RIP: 0010:release_pages+0x45/0x580
[  301.947331] Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
[  301.949258] RSP: 0018:ffffb57d00de3e40 EFLAGS: 00010206
[  301.951248] RAX: 00000000ffff9214 RBX: 0000000000000000 RCX: 0000000000000000
[  301.953185] RDX: ffffe3a6057affc8 RSI: ffffb57d00de3e68 RDI: ffffe3a6048817c8
[  301.955164] RBP: ffff921453d6cb78 R08: ffffe3a6057affc8 R09: 000000000000016e
[  301.957131] R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
[  301.958978] R13: 0000000000000000 R14: ffffe3a6048817c8 R15: ffff921453d6cb28
[  301.960812] FS:  0000000000000000(0000) GS:ffff921b3f480000(0000) knlGS:0000000000000000
[  301.962576] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  301.964409] CR2: 00007fa2bff942d0 CR3: 0000000402010004 CR4: 00000000001706e0

Comment 15 David Critch 2023-03-02 00:59:47 UTC
Looks like that one was slightly cut off. Here's the whole thing:

[  271.810225] kernel BUG at include/linux/mm.h:1129!
[  271.812070] invalid opcode: 0000 [#4] PREEMPT SMP PTI
[  271.814011] CPU: 0 PID: 2105 Comm: nfsd Kdump: loaded Tainted: G      D            6.1.13-200.fc37.x86_64 #1
[  271.815824] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77X-D3H, BIOS F16 10/24/2012
[  271.817747] RIP: 0010:svc_rqst_replace_page+0xd4/0xe0 [sunrpc]
[  271.819664] Code: 32 42 43 d2 48 8b 83 f0 0a 00 00 48 8b 34 24 48 8b 10 0f b6 83 00 0b 00 00 e9 5f ff ff ff 48 c7 c6 30 e8 ec c0 e8 9c 32 46 d2 <0f> 0b 48 83 ef 01 e9 6c ff ff ff 90 0f 1f 44 00 00 41 56 49 89 ce
[  271.821784] RSP: 0018:ffffb57d00db7ca0 EFLAGS: 00010282
[  271.823726] RAX: 000000000000005c RBX: ffff921453774000 RCX: 0000000000000000
[  271.825663] RDX: 0000000000000001 RSI: ffffffff94749b33 RDI: 00000000ffffffff
[  271.827578] RBP: ffffe3a605899dc0 R08: 0000000000000000 R09: ffffb57d00db7b00
[  271.829488] R10: 0000000000000003 R11: ffffffff95147448 R12: ffff921453774000
[  271.831304] R13: ffffb57d00db7d90 R14: 0000000000007fc0 R15: ffff9214504369e0
[  271.833042] FS:  0000000000000000(0000) GS:ffff921b3f400000(0000) knlGS:0000000000000000
[  271.834789] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  271.836533] CR2: 000055dabc5c54f8 CR3: 0000000402010003 CR4: 00000000001706f0
[  271.838282] Call Trace:
[  271.840036]  <TASK>
[  271.841777]  nfsd_splice_actor+0x4e/0x90 [nfsd]
[  271.843579]  __splice_from_pipe+0x8a/0x1b0
[  271.845308]  ? nfsd_direct_splice_actor+0x20/0x20 [nfsd]
[  271.847082]  nfsd_direct_splice_actor+0x11/0x20 [nfsd]
[  271.848845]  splice_direct_to_actor+0xc8/0x1d0
[  271.850578]  ? fsid_source+0x60/0x60 [nfsd]
[  271.852343]  nfsd_splice_read+0x6b/0xf0 [nfsd]
[  271.854116]  nfsd_read+0x11d/0x180 [nfsd]
[  271.855881]  nfsd3_proc_read+0x156/0x210 [nfsd]
[  271.857657]  nfsd_dispatch+0x16a/0x280 [nfsd]
[  271.859416]  svc_process_common+0x265/0x5c0 [sunrpc]
[  271.861192]  ? nfsd_svc+0x360/0x360 [nfsd]
[  271.862957]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  271.864715]  svc_process+0xad/0x100 [sunrpc]
[  271.866495]  nfsd+0xd5/0x190 [nfsd]
[  271.868245]  kthread+0xe9/0x110
[  271.869951]  ? kthread_complete_and_exit+0x20/0x20
[  271.871669]  ret_from_fork+0x22/0x30
[  271.873375]  </TASK>
[  271.875075] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl dm_crypt rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink sunrpc snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_via snd_hda_codec_generic coretemp ledtrig_audio kvm_intel snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm iTCO_wdt snd_hda_codec intel_pmc_bxt at24 iTCO_vendor_support mei_pxp mei_hdcp irqbypass rapl snd_hda_core intel_cstate snd_hwdep snd_pcsp i2c_i801 snd_seq mxm_wmi intel_uncore snd_seq_device snd_pcm i2c_smbus snd_timer lpc_ich joydev mei_me alx e1000e snd mei soundcore mdio fuse zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3
[  271.875162]  drm_buddy drm_display_helper video cec wmi ttm
[  271.886360] ---[ end trace 0000000000000000 ]---
[  271.889142] RIP: 0010:release_pages+0x45/0x580
[  271.890967] Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
[  271.892966] RSP: 0018:ffffb57d00de3e40 EFLAGS: 00010206
[  271.894900] RAX: 00000000ffff9214 RBX: 0000000000000000 RCX: 0000000000000000
[  271.896887] RDX: ffffe3a6057affc8 RSI: ffffb57d00de3e68 RDI: ffffe3a6048817c8
[  271.898822] RBP: ffff921453d6cb78 R08: ffffe3a6057affc8 R09: 000000000000016e
[  271.900814] R10: 000000000000fe88 R11: 0000000000000000 R12: 0017ffffc0000000
[  271.902666] R13: 0000000000000000 R14: ffffe3a6048817c8 R15: ffff921453d6cb28
[  271.904506] FS:  0000000000000000(0000) GS:ffff921b3f400000(0000) knlGS:0000000000000000
[  271.906268] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  271.908096] CR2: 000055dabc5c54f8 CR3: 0000000402010003 CR4: 00000000001706f0
[  301.854063] page:000000008d2d287c refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x162675
[  301.855863] head:000000008d2d287c order:1 compound_mapcount:-559087615 compound_pincount:1024
[  301.857641] flags: 0xffffe3a605899f40(error|workingset|slab|owner_priv_1|arch_1|reserved|writeback|head|swapbacked|hwpoison|young|arch_2|node=1023|zone=7|lastcpupid=0x1f8e98)
[  301.859476] raw: ffffe3a605899f40 ffffe3a605899f80 ffffe3a605899fc0 ffffe3a60589a000
[  301.861304] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
[  301.863103] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))
[  301.864938] ------------[ cut here ]------------

Comment 16 Jeff Layton 2023-03-02 10:56:01 UTC
Thanks:

[  301.863103] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))

Looks like a folio (page) refcount underflow on a read. Basically, the refcount on a folio during a read went below zero. This is probably unrelated to the filecache changes that @bcodding indicated. I'll have to do some investigation.

You mentioned that activity from certain clients seems to trigger this. Would you be able to get a (binary) capture of the traffic between client and server that leads to the crash? If there is a certain sort of request that triggers this crash then that may help us reproduce it.

Comment 17 Jeff Layton 2023-03-02 13:06:18 UTC
Ran a capture between the client and server while running this test to get a better picture of what's happening on the wire. The stuck lock was (again) granted via a NLM_GRANTED_MSG callback:

    96840	2023-03-02 07:51:48.014619	192.168.1.137	192.168.1.136	NLM	234	V4 GRANTED_MSG Call (Reply In 96843) FH:0x67be0d7b svid:5698 pos:0-1

...at about the same time, there was a flurry of CANCEL calls to the server, including one from the same svid as in the GRANTED_MSG call:

    96855	2023-03-02 07:51:48.015595	192.168.1.136	192.168.1.137	NLM	786	V4 CANCEL Call (Reply In 96869) FH:0x67be0d7b svid:5698 pos:0-1  ; V4 CANCEL Call (Reply In 96871) FH:0x67be0d7b svid:5713 pos:0-1  ; V4 CANCEL Call (Reply In 96872) FH:0x67be0d7b svid:5716 pos:0-1  ; V4 CANCEL Call (Reply In 96874) FH:0x67be0d7b svid:5714 pos:0-1

I think that gives further creedence to the idea that we have a race between a GRANTED_MSG callback from the server and the wait for the lock being cancelled. This bit has always been difficult to get right, unfortunately.

Comment 18 Jeff Layton 2023-03-02 13:07:18 UTC
(In reply to Jeff Layton from comment #17)
> Ran a capture between the client and server while running this test to get a
> better picture of what's happening on the wire. The stuck lock was (again)
> granted via a NLM_GRANTED_MSG callback:

Sorry for the noise. Wrong bz for this comment, please ignore!

Comment 19 Dario Lesca 2023-03-02 14:28:50 UTC
I have reproduce this error install and use kodi (dnf install kodi) on a Fedora 37 VM fresh installed.

Start kodi, add a new external NFS resource, point it to my usual Fedora 37 server via nfs://IP/ URL and browse the shared folder.
This work, do not cause any problem.

But when I play a video on this resource, the error on server happens again and the server crash.

If I share a local resource from fedora 37 VM where run kodi and use this resource mounted in the same manner, when I play some video, error on this server do not occur, all work fine.

I have not been able to reproduce the error on client/server VM, only with external usual server.

Hope this can help you to reproduce the error in local.

Thanks
Dario

Comment 21 Jeff Layton 2023-03-06 13:18:03 UTC
Thanks for the pcap, David. The first thing I should point out is that the stack traces that you posted are quite different from the ones that were originally posted to this bug. They may or may not be the same issue. Since yours seems to be reproducible, let's assume they're the same issue for now.

The last call to the server is this one:

 7551   1.182845   10.42.3.25 → 10.42.0.15   NFS 186 V3 READ Call, FH: 0x1d6fc771 Offset: 0 Len: 1048576

...which never gets a reply. I assume the server crashed at that point. This call seems unexceptional -- a 1M READ starting at the beginning of the file. Looking back at the other traffic involving this filehandle, we can see several other identical calls earlier:

[jlayton@tleilax sparse]$ tshark -r /tmp/nfs.pcap nfs | grep 0x1d6fc771
   83   0.094919   10.42.0.15 → 10.42.3.25   NFS 310 V3 LOOKUP Reply (Call In 82), FH: 0x1d6fc771
   84   0.097547   10.42.3.25 → 10.42.0.15   NFS 178 V3 ACCESS Call, FH: 0x1d6fc771, [Check: RD]
   97   0.112967   10.42.0.15 → 10.42.3.25   NFS 310 V3 LOOKUP Reply (Call In 96), FH: 0x1d6fc771
   98   0.115504   10.42.3.25 → 10.42.0.15   NFS 174 V3 GETATTR Call, FH: 0x1d6fc771
  111   0.131602   10.42.0.15 → 10.42.3.25   NFS 310 V3 LOOKUP Reply (Call In 110), FH: 0x1d6fc771
  112   0.133879   10.42.3.25 → 10.42.0.15   NFS 178 V3 ACCESS Call, FH: 0x1d6fc771, [Check: RD]
  125   0.149989   10.42.0.15 → 10.42.3.25   NFS 310 V3 LOOKUP Reply (Call In 124), FH: 0x1d6fc771
  126   0.152546   10.42.3.25 → 10.42.0.15   NFS 174 V3 GETATTR Call, FH: 0x1d6fc771
  128   0.155363   10.42.3.25 → 10.42.0.15   NFS 186 V3 READ Call, FH: 0x1d6fc771 Offset: 0 Len: 1048576
  982   0.295484   10.42.3.25 → 10.42.0.15   NFS 186 V3 READ Call, FH: 0x1d6fc771 Offset: 0 Len: 1048576
 1815   0.398019   10.42.3.25 → 10.42.0.15   NFS 186 V3 READ Call, FH: 0x1d6fc771 Offset: 1048576 Len: 1048576
 2628   0.507991   10.42.3.25 → 10.42.0.15   NFS 174 V3 GETATTR Call, FH: 0x1d6fc771
 2630   0.510608   10.42.3.25 → 10.42.0.15   NFS 186 V3 READ Call, FH: 0x1d6fc771 Offset: 2097152 Len: 1048576
 3446   0.618736   10.42.3.25 → 10.42.0.15   NFS 186 V3 READ Call, FH: 0x1d6fc771 Offset: 21787511 Len: 1048576
 3448   0.632199   10.42.3.25 → 10.42.0.15   NFS 186 V3 READ Call, FH: 0x1d6fc771 Offset: 3145728 Len: 1048576
 4263   0.739887   10.42.3.25 → 10.42.0.15   NFS 186 V3 READ Call, FH: 0x1d6fc771 Offset: 0 Len: 1048576
 5079   0.840607   10.42.3.25 → 10.42.0.15   NFS 186 V3 READ Call, FH: 0x1d6fc771 Offset: 4194304 Len: 1048576
 5906   0.953981   10.42.0.15 → 10.42.3.25   NFS 310 V3 LOOKUP Reply (Call In 5905), FH: 0x1d6fc771
 5907   0.956556   10.42.3.25 → 10.42.0.15   NFS 178 V3 ACCESS Call, FH: 0x1d6fc771, [Check: RD]
 5920   0.972509   10.42.0.15 → 10.42.3.25   NFS 310 V3 LOOKUP Reply (Call In 5919), FH: 0x1d6fc771
 5921   0.975082   10.42.3.25 → 10.42.0.15   NFS 174 V3 GETATTR Call, FH: 0x1d6fc771
 5923   0.977640   10.42.3.25 → 10.42.0.15   NFS 186 V3 READ Call, FH: 0x1d6fc771 Offset: 4 Len: 1048576
 6737   1.073346   10.42.3.25 → 10.42.0.15   NFS 186 V3 READ Call, FH: 0x1d6fc771 Offset: 5242880 Len: 1048576
 7551   1.182845   10.42.3.25 → 10.42.0.15   NFS 186 V3 READ Call, FH: 0x1d6fc771 Offset: 0 Len: 1048576


...one thing that is interesting is that in frames 3446 and 5923, the do not align on a page boundary. That's unusual in that buffered I/O will be aligned, and most DIO accesses are also aligned (since a lot of filesystems require that anyway). I may see if I can try to replicate this access pattern using libnfs.

Comment 22 Jeff Layton 2023-03-06 14:19:45 UTC
I rolled up a quick reproducer that issued reads at exactly those offsets and lengths, and created a 21787639 file with random junk in it and ran the reproducer against it. No crash, even on v6.1.13. What sort of filesystem are you exporting?  Also, can you paste the relevant /etc/exports line for this export here? This may be filesystem-dependent.

Comment 23 David Critch 2023-03-06 15:05:10 UTC
Just updated to 6.1.14, problem persists.

Filesystem is XFS.

Export is:
/nfs/hoard	f.q.d.n(ro,no_root_squash,no_all_squash)


(The stack trace might be a bit different since I've installed kernel-debuginfo since hitting the problem and it seemed to show a bit more info)

Comment 24 Jeff Layton 2023-03-06 16:00:46 UTC
(In reply to David Critch from comment #23)
> Just updated to 6.1.14, problem persists.
> 
> Filesystem is XFS.
> 
> Export is:
> /nfs/hoard	f.q.d.n(ro,no_root_squash,no_all_squash)
> 

Thanks. That's pretty close to what I'm using.

> 
> (The stack trace might be a bit different since I've installed
> kernel-debuginfo since hitting the problem and it seemed to show a bit more
> info)

I don't think so. Having debuginfo won't make the kernel print anything different to the ring buffer.

The first traces look pretty clearly like something went wrong when cleaning up after an RPC call. That call may have been a read, but it may have been something else as well. The more recent traces show it happening while it's still processing a splice read (before that teardown) to satisfy a NFSv3 READ call. It's possible the two are related and ultimately due to the same cause, but we shouldn't assume that at this point.

Do you have the ability to collect a vmcore?

Comment 25 Jeff Layton 2023-03-06 17:18:15 UTC
Looking more closely. David's oops happened because the page refcount was too low when we tried to take a reference:

/**
 * folio_get - Increment the reference count on a folio.
 * @folio: The folio.
 *
 * Context: May be called in any context, as long as you know that
 * you have a refcount on the folio.  If you do not already have one,
 * folio_try_get() may be the right interface for you to use.
 */
static inline void folio_get(struct folio *folio)
{
        VM_BUG_ON_FOLIO(folio_ref_zero_or_close_to_overflow(folio), folio);
        folio_ref_inc(folio);
}

That was called from svc_rqst_replace_page:

/**                                                                                            
 * svc_rqst_replace_page - Replace one page in rq_pages[]                                      
 * @rqstp: svc_rqst with pages to replace                                                      
 * @page: replacement page                                                                     
 *                                                                                             
 * When replacing a page in rq_pages, batch the release of the                                 
 * replaced pages to avoid hammering the page allocator.                                       
 */                                                                                            
void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)                          
{                                                                                              
        if (*rqstp->rq_next_page) {                                                            
                if (!pagevec_space(&rqstp->rq_pvec))                                           
                        __pagevec_release(&rqstp->rq_pvec);                                    
                pagevec_add(&rqstp->rq_pvec, *rqstp->rq_next_page);                            
        }                                                                                      
                                                                                               
        get_page(page);                                                                        
        *(rqstp->rq_next_page++) = page;                                                       
}                                                                                              
EXPORT_SYMBOL_GPL(svc_rqst_replace_page);

Which was in turn called from nfsd_splice_actor:

static int
nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
                  struct splice_desc *sd)
{
        struct svc_rqst *rqstp = sd->u.data;
        struct page *page = buf->page;  // may be a compound one
        unsigned offset = buf->offset;
        struct page *last_page;

        last_page = page + (offset + sd->len - 1) / PAGE_SIZE;
        for (page += offset / PAGE_SIZE; page <= last_page; page++)
                svc_rqst_replace_page(rqstp, page);
        if (rqstp->rq_res.page_len == 0)        // first call
                rqstp->rq_res.page_base = offset % PAGE_SIZE;
        rqstp->rq_res.page_len += sd->len;
        return sd->len;
}

...but looking at the definition of a pipe_buffer, I'm not at all certain that the page pointer in it is an array:

/**
 *      struct pipe_buffer - a linux kernel pipe buffer
 *      @page: the page containing the data for the pipe buffer
 *      @offset: offset of data inside the @page
 *      @len: length of data inside the @page
 *      @ops: operations associated with this buffer. See @pipe_buf_operations.
 *      @flags: pipe buffer flags. See above.
 *      @private: private data owned by the ops.
 **/
struct pipe_buffer {
        struct page *page;
        unsigned int offset, len;
        const struct pipe_buf_operations *ops;
        unsigned int flags;
        unsigned long private;
};

Comment 26 Jeff Layton 2023-03-06 18:11:55 UTC
David privately sent me a vmcore:

[ 9380.915824] general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#1] PREEMPT SMP PTI
[ 9380.915871] CPU: 3 PID: 2144 Comm: nfsd Kdump: loaded Not tainted 6.1.14-200.fc37.x86_64 #1
[ 9380.915898] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77X-D3H, BIOS F16 10/24/2012
[ 9380.915926] RIP: 0010:release_pages+0x45/0x580
[ 9380.915949] Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
[ 9380.915996] RSP: 0018:ffffbaa700d17e40 EFLAGS: 00010216
[ 9380.916016] RAX: 00000000ffff93fb RBX: 0000000000000000 RCX: 0000000000000000
[ 9380.916037] RDX: ffffe67c85732c88 RSI: ffffbaa700d17e68 RDI: ffffe67c848e7e48
[ 9380.916058] RBP: ffff93fb477b4b70 R08: ffffe67c85732c88 R09: 000000000000022b
[ 9380.916079] R10: 00000000000076c8 R11: 0000000000000000 R12: 0017ffffc0000000
[ 9380.916100] R13: 0000000000000000 R14: ffffe67c848e7e48 R15: ffff93fb477b4b28
[ 9380.916121] FS:  0000000000000000(0000) GS:ffff94023f580000(0000) knlGS:0000000000000000
[ 9380.916146] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9380.916165] CR2: 00007f8e8e42d9d0 CR3: 000000069d010001 CR4: 00000000001706e0
[ 9380.916187] Call Trace:
[ 9380.916199]  <TASK>
[ 9380.916225]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[ 9380.916294]  __pagevec_release+0x1b/0x30
[ 9380.916314]  svc_xprt_release+0x1a1/0x200 [sunrpc]
[ 9380.916409]  svc_send+0x59/0x160 [sunrpc]
[ 9380.916491]  nfsd+0xd5/0x190 [nfsd]
[ 9380.916550]  kthread+0xe9/0x110
[ 9380.916569]  ? kthread_complete_and_exit+0x20/0x20
[ 9380.916589]  ret_from_fork+0x22/0x30
[ 9380.916612]  </TASK>
[ 9380.916623] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl dm_crypt rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink sunrpc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm at24 mei_pxp mei_hdcp iTCO_wdt intel_pmc_bxt iTCO_vendor_support snd_hda_codec_hdmi snd_hda_codec_via snd_hda_codec_generic ledtrig_audio irqbypass rapl intel_cstate snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec mxm_wmi snd_hda_core i2c_i801 intel_uncore snd_pcsp i2c_smbus snd_hwdep snd_seq snd_seq_device joydev snd_pcm snd_timer mei_me snd mei soundcore lpc_ich fuse zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 drm_buddy
[ 9380.916709]  drm_display_helper alx e1000e cec mdio ttm video wmi

Some notes:


struct svc_rqst is at 0xffff93fb477b4000

This probably failed when calling this in svc_xprt_release:

       pagevec_release(&rqstp->rq_pvec);

From the core:

  rq_pvec = {
    nr = 13 '\r',
    percpu_pvec_drained = true,
    pages = {0xffffe67c85732c80, 0xffffe67c848e7e40, 0xffff93fb477b42d0, 0xffff93fa477b4af0, 0x17ffffc0000000, 0xffffe67c85aeac01, 0xffffe67c85aeaec8, 0xdead000000000400, 0xffffffff, 0x17ffffc0000000, 0xffffe67c85aeac01, 0xffffe67c85aeaf08, 0xdead000000000400, 0xffffe67c856d7040, 0xffffe67c84770980}
  },

Supposedly there are 13 pages in there, but many of those page pointers look bogus. In fact the GPF seems likely to have happened when we hit the 5th element in the array (0x17ffffc0000000).

Comment 27 Jeff Layton 2023-03-07 17:57:01 UTC
Actually, not quite. I think it failed on the second element in the array since that value is in %rdi.

All code
========
   0:	00 48 8d             	add    %cl,-0x73(%rax)
   3:	44 24 28             	rex.R and $0x28,%al
   6:	48 89 44 24 28       	mov    %rax,0x28(%rsp)
   b:	48 89 44 24 30       	mov    %rax,0x30(%rsp)
  10:	85 f6                	test   %esi,%esi
  12:	0f 8e a9 00 00 00    	jle    0xc1
  18:	48 63 f6             	movslq %esi,%rsi
  1b:	31 db                	xor    %ebx,%ebx
  1d:	49 89 ff             	mov    %rdi,%r15
  20:	45 31 ed             	xor    %r13d,%r13d
  23:	48 8d 2c f7          	lea    (%rdi,%rsi,8),%rbp
  27:	4d 8b 27             	mov    (%r15),%r12
  2a:*	49 8b 44 24 08       	mov    0x8(%r12),%rax		<-- trapping instruction
  2f:	a8 01                	test   $0x1,%al
  31:	0f 85 bb 01 00 00    	jne    0x1f2
  37:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  3c:	4d 85 ed             	test   %r13,%r13
  3f:	74                   	.byte 0x74

Regardless though, the problem seems to be that the rq_pvec has some garbage in it, but an elevated count. It's not yes clear to me how it ends up in that state.

Comment 28 Jeff Layton 2023-03-07 19:39:16 UTC
rq_pvec is only added to in svc_rqst_replace_page:

void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
{
        if (*rqstp->rq_next_page) {
                if (!pagevec_space(&rqstp->rq_pvec))
                        __pagevec_release(&rqstp->rq_pvec);
                pagevec_add(&rqstp->rq_pvec, *rqstp->rq_next_page);
        }

        get_page(page);
        *(rqstp->rq_next_page++) = page;
}

...so I'm guessing that means that rq_next_page is sometimes pointing to something bogus?

  struct page **rq_next_page = 0xffffe67c85aeaf20

According to crash, the rq_pages is 260 element array of page pointers:

   [0x2c8] struct page *rq_pages[260];
   [0xae8] struct page **rq_respages;
   [0xaf0] struct page **rq_next_page;

I don't see that value in the rq_pages array, but crash (frustratingly) only shows the first 256 elements of the array. The last several elements are truncated:

    0xffffe67c85aeadc0,
    0xffffe67c85aeadc0,
    0xffffe67c85aeae00,
    0xffffe67c85aeae40
  },
  rq_respages = 0xffffe67c85aeae80,
  rq_next_page = 0xffffe67c85aeaf20,

Why is rq_respages so big? AFAICT, it's supposed to track the values in rq_pages, but it's clearly (just) larger than the last element. Did it walk off the end of the rq_pages array?

Chuck, any thoughts here?

Comment 29 Chuck Lever 2023-03-07 19:49:17 UTC
When reply processing begins, rq_respages is supposed to point to the first page in rq_pages that is available to write the reply in.

There are some extra elements in rq_pages because sometimes the number of pages in the Call message plus the number of pages needed to build the Reply are more than 1MB worth of pages. That is likely the case here, with a 1MB READ reply, you'll need at least 1 page for the Call, another page for the Reply header, and 256 pages for the READ payload.

If the client is sending a READ Call that has an RPC frame size larger than the actual Call message, for instance, it's possible that even more pages will be needed... we're missing a bounds check somewhere.

Comment 30 Jeff Layton 2023-03-07 20:09:57 UTC
(In reply to Chuck Lever from comment #29)
> When reply processing begins, rq_respages is supposed to point to the first
> page in rq_pages that is available to write the reply in.
> 
> There are some extra elements in rq_pages because sometimes the number of
> pages in the Call message plus the number of pages needed to build the Reply
> are more than 1MB worth of pages. That is likely the case here, with a 1MB
> READ reply, you'll need at least 1 page for the Call, another page for the
> Reply header, and 256 pages for the READ payload.
> 

Makes sense. I'm aware of how we split the call and reply. My suspicion here though is that the reply overran the provided number of pages.

> If the client is sending a READ Call that has an RPC frame size larger than
> the actual Call message, for instance, it's possible that even more pages
> will be needed... we're missing a bounds check somewhere.

It doesn't look like that's the case here. In the capture that David provided, all of the READ calls seem to be of normal size (~186 bytes for the whole frame), even the one that directly preceded the crash.

You're clearly right about the bounds check, but I'm not sure where we're missing it. The problems seem to be confined to splice reads though. I wonder if we ought to have a switch to turn those off (module parameter or something maybe?).

Comment 31 Chuck Lever 2023-03-07 20:15:23 UTC
RQ_SPLICE_OK is the usual mechanism for disabling splice reads.

It's also possible that a recent change to the splice code broke things. Lots of churn there recently. See also ac8db824ead0 ("NFSD: Fix reads with a non-zero offset that don't end on a page boundary").

Comment 32 Jeff Layton 2023-03-08 12:53:34 UTC
(In reply to Chuck Lever from comment #31)
> RQ_SPLICE_OK is the usual mechanism for disabling splice reads.
> 

Yeah, but that flag isn't settable by users. 

> It's also possible that a recent change to the splice code broke things.
> Lots of churn there recently. See also ac8db824ead0 ("NFSD: Fix reads with a
> non-zero offset that don't end on a page boundary").

Certainly possible, but I don't see a bug there right offhand.

As far as bounds checks go, we don't really have much in the way of them. My thinking at this point is to add some BUG_ON checks like this and see if we can catch the problem earlier. Working on a test kernel with this now (I'm assuming for the moment that the problem is confined to splice reads and that most people are using TCP transport:

---------------------------8<------------------------------

[PATCH] nfs/sunrpc: add some bounds-checking BUG_ON calls

Try to catch us overrunning the rq_pages array earlier.

Signed-off-by: Jeff Layton <jlayton>
---
 fs/nfsd/vfs.c        | 2 ++
 net/sunrpc/svcsock.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 0d49c6bb22eb..0d797c325dc8 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -929,6 +929,8 @@ __be32 nfsd_splice_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 	trace_nfsd_read_splice(rqstp, fhp, offset, *count);
 	rqstp->rq_next_page = rqstp->rq_respages + 1;
+	/* Make sure we're not overrunning the buffer */
+	BUG_ON(rqstp->rq_next_page > rqstp->rq_pages + RPCSVC_MAXPAGES);
 	host_err = splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
 	return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err);
 }
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 815baf308236..53284fb7270f 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -258,6 +258,8 @@ static ssize_t svc_tcp_read_msg(struct svc_rqst *rqstp, size_t buflen,
 		bvec[i].bv_offset = 0;
 	}
 	rqstp->rq_respages = &rqstp->rq_pages[i];
+	/* Ensure we're not already past the end of the page array */
+	BUG_ON(rqstp->rq_respages > rqstp->rq_pages + RPCSVC_MAXPAGES);
 	rqstp->rq_next_page = rqstp->rq_respages + 1;
 
 	iov_iter_bvec(&msg.msg_iter, ITER_DEST, bvec, i, buflen);
-- 
2.39.2

Comment 33 Jeff Layton 2023-03-08 14:53:09 UTC
I've built a kernel with the above patch. @dcritch , would you be able to run your testcase against this and collect a vmcore? Unfortunately it likely won't fix anything, but it may make the crash happen slightly sooner, and that might tell us a bit more about the nature of this bug. The kernel is brewing here:

https://koji.fedoraproject.org/koji/taskinfo?taskID=98448808

Comment 35 Jeff Layton 2023-03-14 12:58:14 UTC
Looking at David's second vmcore with the debug kernel. Interestingly, it still crashed in the same place. The BUG_ONs were not triggered:

[  157.434761] general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#1] PREEMPT SMP PTI
[  157.434824] CPU: 1 PID: 4756 Comm: nfsd Kdump: loaded Not tainted 6.1.15-200.bz2150630.1.fc37.x86_64 #1
[  157.434881] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77X-D3H, BIOS F16 10/24/2012
[  157.434913] RIP: 0010:release_pages+0x45/0x580
[  157.434937] Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
[  157.435009] RSP: 0018:ffffb3b080d17e40 EFLAGS: 00010206
[  157.435043] RAX: 00000000ffff9b4e RBX: 0000000000000000 RCX: 0000000000000000
[  157.435086] RDX: ffffd86b447ef008 RSI: ffffb3b080d17e68 RDI: ffffd86b44a50d08
[  157.435114] RBP: ffff9b4e0e210b78 R08: ffffd86b447ef008 R09: 0000000000000221
[  157.435137] R10: 0000000000008d68 R11: 0000000000000000 R12: 0017ffffc0000000
[  157.435160] R13: 0000000000000000 R14: ffffd86b44a50d08 R15: ffff9b4e0e210b28
[  157.435183] FS:  0000000000000000(0000) GS:ffff9b54ff480000(0000) knlGS:0000000000000000
[  157.435208] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  157.435239] CR2: 000055d625eb7030 CR3: 000000074d010002 CR4: 00000000001706e0
[  157.435263] Call Trace:
[  157.435276]  <TASK>
[  157.435291]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  157.435362]  __pagevec_release+0x1b/0x30
[  157.435381]  svc_xprt_release+0x1a1/0x200 [sunrpc]
[  157.435497]  svc_send+0x59/0x160 [sunrpc]
[  157.435579]  nfsd+0xd5/0x190 [nfsd]
[  157.435660]  kthread+0xe9/0x110
[  157.435692]  ? kthread_complete_and_exit+0x20/0x20
[  157.435715]  ret_from_fork+0x22/0x30
[  157.435738]  </TASK>
[  157.435749] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace dm_crypt nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink sunrpc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm snd_hda_codec_hdmi iTCO_wdt snd_hda_codec_via snd_hda_codec_generic mei_pxp mei_hdcp ledtrig_audio intel_pmc_bxt at24 iTCO_vendor_support irqbypass snd_hda_intel snd_intel_dspcfg rapl snd_intel_sdw_acpi snd_hda_codec intel_cstate snd_hda_core intel_uncore snd_hwdep mxm_wmi i2c_i801 joydev snd_pcsp i2c_smbus snd_seq snd_seq_device snd_pcm lpc_ich snd_timer mei_me snd mei soundcore fuse zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic e1000e ghash_clmulni_intel alx drm_buddy sha512_ssse3 drm_display_helper cec mdio ttm video wmi


crash> struct svc_rqst 0xffff9b4e0e210000
  rq_pages = {
    0xffffd86b44a54d80,
    0xffffd86b44a7bd00,
    0xffffd86b44989800,
...
    0xffffd86b45736300,
    0xffffd86b45736340,
    0xffffd86b45736380,
    0xffffd86b457363c0,
    0xffffd86b45736400,                <<<< duplicate values?
    0xffffd86b45736400,                <<<<
    0xffffd86b45736440,
    0xffffd86b45736480
  },
  rq_respages = 0xffffd86b457364c0,
  rq_next_page = 0xffffd86b45736560,
  rq_page_end = 0xffff9b4e0e210ae0,
  rq_pvec = {
    nr = 0xe,
    percpu_pvec_drained = 0x1,
    pages = {
      0xffffd86b447ef000,
      0xffffd86b44a50d00,
      0xffff9b4e0e2102d0,
      0xffff9b4d0e210af0,
      0x17ffffc0000000,
      0xffffd86b45736001,
      0xdead000000000122,
      0xdead000000000400,
      0x1,
      0xffffffff,
      0x17ffffc0000000,
      0xffffd86b45736001,
      0xdead000000000122,
      0xdead000000000400,
      0xffffd86b44a7d740
    }
  },

Again the rq_pvec is filled with garbage while the nr value is at 14 (0xe), so that's where the crash is occurring. The rq_pages array looks reasonable for the most part, but there are some duplicate values in there? Very odd, either way:

I see why the BUG_ON didn't fire:

RPCSVC_MAXPAGES is 259:

crash> p ((((struct svc_rqst *)0xffff9b4e0e210000)->rq_pages)+259)
$3 = (struct page **) 0xffff9b4e0e210ae0

Which is greater than 0xffffd86b457364c0. I don't think I can do what I was hoping, as the rq_respages and rq_next_page pointers are not pointers into the array itself, but copies of the values that that array holds, so we can't detect whether we've overrun the array by their values. I think I need to have it catch the array index going too high. We probably need such a guardrail anyway.

The only way I can see this happening is if we end up calling svc_tcp_read_msg with a large buflen. I'll rework the patch in a bit -- maybe we can turn this from an oops into something that just causes an error to be returned, and gather some info about the (potentially) offending request.

Comment 36 Jeff Layton 2023-03-14 18:27:12 UTC
New debug patch below. I've got a build (based on v6.1.18) going in koji with this now:

     https://koji.fedoraproject.org/koji/taskinfo?taskID=98695147

David, could you try this kernel and see how it does vs. your reproducer? This one might help prevent a crash. Either way, the pr_warns should hopefully give us a better indication of what's happening:

[PATCH] sunrpc: don't allow svc_tcp_read_msg to walk off end of rq_pages

We've had some reports of crashes with symptoms that are consistent with
walking off the end of the rq_pages array and maybe . Test for this in
svc_tcp_read_msg and return an error and pr_warn if we're going to overrun
the rq_pages array.

Signed-off-by: Jeff Layton <jlayton>
---
 net/sunrpc/svcsock.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 815baf308236..f82c074d88c0 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -253,6 +253,11 @@ static ssize_t svc_tcp_read_msg(struct svc_rqst *rqstp, size_t buflen,
 	clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
 
 	for (i = 0, t = 0; t < buflen; i++, t += PAGE_SIZE) {
+		if (i > RPCSVC_MAXPAGES - 1) {
+			pr_warn("%s overran rq_pages! (buflen=%zu)\n", __func__, buflen);
+			set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
+			return -ENOBUFS;
+		}
 		bvec[i].bv_page = rqstp->rq_pages[i];
 		bvec[i].bv_len = PAGE_SIZE;
 		bvec[i].bv_offset = 0;
@@ -991,6 +996,8 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
 		trace_svcsock_tcp_recv(&svsk->sk_xprt, len);
 		svsk->sk_tcplen += len;
 		svsk->sk_datalen += len;
+	} else if (len == -ENOBUFS) {
+		pr_warn("%s no space for reply! base=%zu want=%zu\n", __func__, base, want);
 	}
 	if (len != want || !svc_sock_final_rec(svsk))
 		goto err_incomplete;
-- 
2.39.2

Comment 38 Jeff Layton 2023-03-15 12:53:29 UTC
3rd vmcore looks exactly like the first ones. This one didn't trigger the new debug code at all either:

[  158.341201] NFSD: Using nfsdcld client tracking operations.
[  158.341207] NFSD: starting 90-second grace period (net f0000000)
[  194.795050] NFSD: all clients done reclaiming, ending NFSv4 grace period (net f0000000)
[  194.844782] general protection fault, probably for non-canonical address 0x17ffffc0000008: 0000 [#1] PREEMPT SMP PTI
[  194.844833] CPU: 0 PID: 4839 Comm: nfsd Kdump: loaded Not tainted 6.1.18-200.bz2150630.1.fc37.x86_64 #1
[  194.844864] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77X-D3H, BIOS F16 10/24/2012
[  194.844894] RIP: 0010:release_pages+0x45/0x580
[  194.844918] Code: 00 48 8d 44 24 28 48 89 44 24 28 48 89 44 24 30 85 f6 0f 8e a9 00 00 00 48 63 f6 31 db 49 89 ff 45 31 ed 48 8d 2c f7 4d 8b 27 <49> 8b 44 24 08 a8 01 0f 85 bb 01 00 00 0f 1f 44 00 00 4d 85 ed 74
[  194.844978] RSP: 0018:ffffb42dc0e3be40 EFLAGS: 00010206
[  194.845004] RAX: 00000000ffff9517 RBX: 0000000000000000 RCX: 0000000000000000
[  194.845027] RDX: ffffed9805805208 RSI: ffffb42dc0e3be68 RDI: ffffed98048a9bc8
[  194.845050] RBP: ffff951747814b78 R08: ffffed9805805208 R09: 0000000000000172
[  194.845072] R10: 000000000000e240 R11: 0000000000000000 R12: 0017ffffc0000000
[  194.845095] R13: 0000000000000000 R14: ffffed98048a9bc8 R15: ffff951747814b28
[  194.845117] FS:  0000000000000000(0000) GS:ffff951e3f400000(0000) knlGS:0000000000000000
[  194.845142] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  194.845164] CR2: 0000555e60acf720 CR3: 0000000431010006 CR4: 00000000001706f0
[  194.845206] Call Trace:
[  194.845228]  <TASK>
[  194.845252]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  194.845328]  __pagevec_release+0x1b/0x30
[  194.845349]  svc_xprt_release+0x1a1/0x200 [sunrpc]
[  194.845455]  svc_send+0x59/0x160 [sunrpc]
[  194.845541]  nfsd+0xd5/0x190 [nfsd]
[  194.845622]  kthread+0xe9/0x110
[  194.845652]  ? kthread_complete_and_exit+0x20/0x20
[  194.845683]  ret_from_fork+0x22/0x30
[  194.845706]  </TASK>
[  194.845717] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace dm_crypt binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink sunrpc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi coretemp snd_hda_codec_via kvm_intel snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi iTCO_wdt kvm intel_pmc_bxt at24 mei_hdcp mei_pxp snd_hda_codec iTCO_vendor_support irqbypass snd_hda_core snd_pcsp snd_hwdep rapl snd_seq intel_cstate snd_seq_device snd_pcm intel_uncore mxm_wmi i2c_i801 snd_timer i2c_smbus snd joydev soundcore lpc_ich mei_me mei fuse zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic drm_buddy drm_display_helper ghash_clmulni_intel e1000e cec alx sha512_ssse3 mdio ttm
[  194.845819]  video wmi

From the core, it's the same sort of pattern that we've seen with the others. rq_respages is sizeof(struct page) higher than the last element of the rq_pages array. 

    0xffffed98049c1c00,
    0xffffed98049c1c40,
    0xffffed98049c1c80               <<< Last element of rq_pages
  },
  rq_respages = 0xffffed98049c1cc0,   <<< same as last element of rq_pages + sizeof(struct page)
  rq_next_page = 0xffffed98049c1d60,  
  rq_page_end = 0xffff951747814ae0,
  rq_pvec = {
    nr = 0xe,
    percpu_pvec_drained = 0x1,
    pages = {
      0xffffed9805805200,
      0xffffed98048a9bc0,
      0xffff9517478142d0,
      0xffff951647814af0,
      0x17ffffc0000000,
      0xffffed98049c1801,
      0xdead000000000122,
      0xdead000000000400,
      0x1,
      0xffffffff,
      0x17ffffc0000000,
      0xffffed98049c1801,
      0xdead000000000122,
      0xdead000000000400,
      0xffffed9805805240
    }
  },

rq_respages is only set in very few places and it's never changed in place, AFAICT. I think this means that we're probably looking at some sort of memory scribble here.

Comment 39 Jeff Layton 2023-03-15 13:18:07 UTC
During a splice read, we replace the page in the rq_pages array and increment rq_next_page. I suspect we're overrunning the end of rq_pages here and the assignment to rq_next_page ends up overwriting rq_respages.

void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
{
        if (*rqstp->rq_next_page) {
                if (!pagevec_space(&rqstp->rq_pvec))
                        __pagevec_release(&rqstp->rq_pvec);
                pagevec_add(&rqstp->rq_pvec, *rqstp->rq_next_page);
        }

        get_page(page);
        *(rqstp->rq_next_page++) = page;
}
EXPORT_SYMBOL_GPL(svc_rqst_replace_page);

That's why it has a pointer to a struct page in it rather than a pointer into the array. rq_next_page then ends up getting corrupted later when it's reassigned and then the whole thing comes crashing down. My suspicion at this point is that we're requesting that nfsd_splice_read that goes beyond the end of rq_pages.

Comment 40 Chuck Lever 2023-03-15 13:56:49 UTC
Test your theory by bumping RPCSVC_MAXPAGES?

Comment 41 Chuck Lever 2023-03-15 14:29:15 UTC
Also, perhaps you might enable KASAN in the test kernels.

Comment 42 Jeff Layton 2023-03-15 14:54:38 UTC
Poking around some more in the core, and the read that triggered this was definitely a 1M read starting at offset 4. The arguments only seem to have been 48 bytes, so it doesn't seem like rq_respages would have started out so far into the array that we'd overrun it.

crash> *svc_rqst.rq_arg 0xffff951747814000
  rq_arg = {
    head = {
      {
        iov_base = 0xffff95175eb0b044,
        iov_len = 0x30
      }
    },
    tail = {
      {
        iov_base = 0x0,
        iov_len = 0x0
      }
    },
    bvec = 0x0,
    pages = 0xffff9517478142d0,
    page_base = 0x0,
    page_len = 0x0,
    flags = 0x0,
    buflen = 0x0,
    len = 0x30
  },



(In reply to Chuck Lever from comment #40)
> Test your theory by bumping RPCSVC_MAXPAGES?

I may try that eventually, but it's hard to know how big to make it? What I have now is:

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 24577d1b9907..84685dd8e30f 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -830,6 +830,13 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads);
  */
 void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
 {
+       struct page **begin, **end;
+
+       begin = rqstp->rq_pages;
+       end = &rqstp->rq_pages[RPCSVC_MAXPAGES + 1];
+
+       BUG_ON(rqstp->rq_next_page < begin || rqstp->rq_next_page > end);
+
        if (*rqstp->rq_next_page) {
                if (!pagevec_space(&rqstp->rq_pvec))
                        __pagevec_release(&rqstp->rq_pvec);

...and I'm building that in koji:

    https://koji.fedoraproject.org/koji/taskinfo?taskID=98722435

That should help us catch the case where rq_next_page has wandered outside of the array. If we get a vmcore at that point, we may be able to see what's happening with the splice read since it should still be ongoing.

David, thanks so far for your patience. If you can collect a vmcore with the kernel above, we'd very much appreciate it! Hopefully this one will prove more fruitful.

Comment 43 Jeff Layton 2023-03-15 15:48:41 UTC
(In reply to Chuck Lever from comment #41)
> Also, perhaps you might enable KASAN in the test kernels.

Worth a shot, but I doubt it would help catch the overrun of rq_pages. That array is embedded in svc_rqst, and is part of the svc_rqst allocation. KASAN would not blink at an overrun of rq_pages unless it went beyond the end of svc_rqst. It seems to be crashing long before that point now.

Comment 44 David Critch 2023-03-15 21:47:23 UTC
Uploaded a fresh vmcore to the usual spot.

And I appreciate all your work to fix it so I'll keep giving you all the info I can

Comment 45 Jeff Layton 2023-03-15 22:15:40 UTC
Bingo. That did it:

[  738.133090] ------------[ cut here ]------------
[  738.133095] kernel BUG at net/sunrpc/svc.c:838!
[  738.133132] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[  738.133154] CPU: 1 PID: 4913 Comm: nfsd Kdump: loaded Not tainted 6.1.18-200.bz2150630.2.fc37.x86_64 #1
[  738.133185] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77X-D3H, BIOS F16 10/24/2012
[  738.133215] RIP: 0010:svc_rqst_replace_page+0xe8/0x110 [sunrpc]
[  738.133314] Code: 48 8d bf 00 0b 00 00 48 89 34 24 e8 c2 d2 1b f4 48 8b 83 f0 0a 00 00 48 8b 34 24 48 8b 10 0f b6 83 00 0b 00 00 e9 5f ff ff ff <0f> 0b 48 c7 c6 f0 58 14 c1 e8 2a c3 1e f4 0f 0b 48 83 ef 01 e9 6a
[  738.133364] RSP: 0018:ffffb91040ad3c98 EFLAGS: 00010283
[  738.133385] RAX: ffff972d8b474af0 RBX: ffffdc60c58d7d40 RCX: 0000000000000003
[  738.133409] RDX: ffff972d8b474ae8 RSI: ffffdc60c58d7d00 RDI: ffff972d8b474000
[  738.133431] RBP: ffffdc60c58d7fc0 R08: ffffdc60c4879648 R09: ffffdc60c4879600
[  738.133453] R10: 0000000000000002 R11: ffff972ddc8e6c10 R12: ffff972d8b474000
[  738.133476] R13: ffffb91040ad3d90 R14: 0000000000010004 R15: ffffb91040ad3d90
[  738.133499] FS:  0000000000000000(0000) GS:ffff97347f480000(0000) knlGS:0000000000000000
[  738.133525] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  738.133545] CR2: 00007f25471abdd8 CR3: 00000002e2010006 CR4: 00000000001706e0
[  738.133568] Call Trace:
[  738.133581]  <TASK>
[  738.133594]  nfsd_splice_actor+0x4e/0x90 [nfsd]
[  738.133669]  __splice_from_pipe+0x93/0x1c0
[  738.133692]  ? nfsd_direct_splice_actor+0x20/0x20 [nfsd]
[  738.133780]  nfsd_direct_splice_actor+0x11/0x20 [nfsd]
[  738.133848]  splice_direct_to_actor+0xc8/0x1d0
[  738.133869]  ? fsid_source+0x60/0x60 [nfsd]
[  738.133934]  nfsd_splice_read+0x6b/0xf0 [nfsd]
[  738.134000]  nfsd_read+0x11d/0x180 [nfsd]
[  738.134071]  nfsd3_proc_read+0x156/0x210 [nfsd]
[  738.134147]  nfsd_dispatch+0x16a/0x280 [nfsd]
[  738.134210]  svc_process_common+0x265/0x5c0 [sunrpc]
[  738.134298]  ? nfsd_svc+0x360/0x360 [nfsd]
[  738.134359]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  738.134421]  svc_process+0xad/0x100 [sunrpc]
[  738.134503]  nfsd+0xd5/0x190 [nfsd]
[  738.134595]  kthread+0xe9/0x110
[  738.134636]  ? kthread_complete_and_exit+0x20/0x20
[  738.134662]  ret_from_fork+0x22/0x30
[  738.134693]  </TASK>
[  738.134705] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace dm_crypt nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink intel_rapl_msr intel_rapl_common iTCO_wdt mei_pxp intel_pmc_bxt mei_hdcp x86_pkg_temp_thermal at24 intel_powerclamp iTCO_vendor_support coretemp kvm_intel sunrpc kvm snd_hda_codec_hdmi snd_hda_codec_via irqbypass snd_hda_codec_generic ledtrig_audio rapl mxm_wmi snd_hda_intel intel_cstate snd_intel_dspcfg snd_intel_sdw_acpi intel_uncore snd_hda_codec i2c_i801 joydev i2c_smbus snd_pcsp snd_hda_core snd_hwdep lpc_ich snd_seq snd_seq_device snd_pcm mei_me snd_timer mei snd soundcore fuse zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 drm_buddy e1000e drm_display_helper alx mdio cec ttm video wmi


Relevant part of svc_rqst:

    0xffffdc60c58d7bc0,
    0xffffdc60c58d7c00,
    0xffffdc60c58d7c00,
    0xffffdc60c58d7c40,
    0xffffdc60c58d7c80
  },
  rq_respages = 0xffffdc60c58d7cc0,   <<< BAD VALUE
  rq_next_page = 0xffff972d8b474af0,
  rq_page_end = 0xffff972d8b474ae0,
  rq_pvec = {
    nr = 0x3,
    percpu_pvec_drained = 0x1,
    pages = {
      0xffffdc60c48799c0,
      0xffffdc60c4333100,
      0xffff972d8b4742d0,


My hope was that this BUG_ON would catch it before we corrupted rq_respages...Doh! Off-by-one bug in the debug patch:

+       end = &rqstp->rq_pages[RPCSVC_MAXPAGES + 1];

The last index should be RPCSVC_MAXPAGES. Mea culpa:

+       end = &rqstp->rq_pages[RPCSVC_MAXPAGES];

I think we'll need to do this again, unfortunately. Sorry, David!

    https://koji.fedoraproject.org/koji/taskinfo?taskID=98734229

Comment 46 Jeff Layton 2023-03-15 22:45:55 UTC
Hrm, we might still be able to glean something from this core though. There is a fair bit of redundant info in the svc_rqst (for better or worse). For instance, the rq_res.pages field gets set to point to rq_respages[1] in svc_process:

crash> struct svc_rqst.rq_res.pages 0xffff972d8b474000
  rq_res.pages = 0xffff972d8b4742d8,

crash> p &(((struct svc_rqst *)0xffff972d8b474000)->rq_pages[0])
$6 = (struct page **) 0xffff972d8b4742c8

rq_res.pages points to rq_pages[2], rq_respages was likely originally set to point to rq_pages[1] -- which makes sense. v3 read requests are small, and should never be longer than a single page (leaving the rest of rq_pages for the reply). Given that, there should have been 259 pages available to handle a 1M splice read (256 pages of data). But... the splice read was not aligned, so that means we could have an extra page on the front and back for the partial pages, so the data in the reply could span up to 257 pages.

That's cutting it close, but it seems like things should still fit with 2 pages to spare. Maybe the next core will give us a better indication.

Comment 47 Trevor Hemsley 2023-03-16 00:00:12 UTC
*** Bug 2148276 has been marked as a duplicate of this bug. ***

Comment 48 exarkun 2023-03-16 00:59:02 UTC
*** Bug 2171185 has been marked as a duplicate of this bug. ***

Comment 49 David Critch 2023-03-16 15:02:28 UTC
New vmcore has been uploaded to the same spot

Comment 50 Jeff Layton 2023-03-16 16:46:29 UTC
Cool, we caught it before it overran the array this time:

crash> struct svc_rqst 0xffff8dd2c37d8000
...
  rq_pages = {
    ...
    0xffffe1f145855380,
    0xffffe1f1458553c0,
    0xffffe1f145855400,
    0xffffe1f145855400,
    0xffffe1f145855440,
    0xffffe1f145855480
  },
  rq_respages = 0xffff8dd2c37d82d0,    <<< at index 1
  rq_next_page = 0xffff8dd2c37d8ae8,   <<< at index 260
  rq_page_end = 0xffff8dd2c37d8ae0,

crash> p &(((struct svc_rqst *)0xffff8dd2c37d8000)->rq_pages)
$1 = (struct page *(*)[260]) 0xffff8dd2c37d82c8

(0xffff8dd2c37d8ae8-0xffff8dd2c37d82c8)/8 = 260

So yeah, rq_respages starts at rq_pages[1]. I think we're just straight up overrunning the array somehow.

I think it's possible to get incomplete pages back in the splice in some cases (e.g. maybe the fs gives back a short read?). The nfsd_splice_actor code doesn't seem to account for that however. I'll see if I can instrument a check for that and we can verify whether it's happening.

If that turns out to be the case, then maybe we'll need to copy partial pages instead of trying to swap them into place.

Comment 51 Chuck Lever 2023-03-16 16:55:59 UTC
The question in my mind is why this just started happening. Adding the svc_rqst_replace_page() helper was not supposed to cause any behavioral changes.

Comment 52 Jeff Layton 2023-03-16 19:32:33 UTC
Forgot to paste the oops:

[  253.521921] ------------[ cut here ]------------
[  253.521926] kernel BUG at net/sunrpc/svc.c:838!
[  253.521960] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[  253.521982] CPU: 2 PID: 4817 Comm: nfsd Kdump: loaded Not tainted 6.1.18-200.bz2150630.3.fc37.x86_64 #1
[  253.522012] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77X-D3H, BIOS F16 10/24/2012
[  253.522042] RIP: 0010:svc_rqst_replace_page+0xe8/0x110 [sunrpc]
[  253.522136] Code: 48 8d bf 00 0b 00 00 48 89 34 24 e8 c2 62 3f df 48 8b 83 f0 0a 00 00 48 8b 34 24 48 8b 10 0f b6 83 00 0b 00 00 e9 5f ff ff ff <0f> 0b 48 c7 c6 f0 c8 f0 c0 e8 2a 53 42 df 0f 0b 48 83 ef 01 e9 6a
[  253.522185] RSP: 0018:ffffa57bc0d07c98 EFLAGS: 00010293
[  253.522206] RAX: ffff8dd2c37d8ae8 RBX: ffffe1f145855500 RCX: 0000000000000002
[  253.522228] RDX: ffff8dd2c37d8ae0 RSI: ffffe1f1458554c0 RDI: ffff8dd2c37d8000
[  253.522250] RBP: ffffe1f1458557c0 R08: ffffe1f1448164c8 R09: ffffe1f144816480
[  253.522272] R10: 0000000000000002 R11: ffff8dd2e0b5ce10 R12: ffff8dd2c37d8000
[  253.522295] R13: ffffa57bc0d07d90 R14: 0000000000010004 R15: ffffa57bc0d07d90
[  253.522317] FS:  0000000000000000(0000) GS:ffff8dd9bf500000(0000) knlGS:0000000000000000
[  253.522343] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  253.522363] CR2: 00005593e27f0000 CR3: 0000000484010002 CR4: 00000000001706e0
[  253.522386] Call Trace:
[  253.522399]  <TASK>
[  253.522411]  nfsd_splice_actor+0x4e/0x90 [nfsd]
[  253.522483]  __splice_from_pipe+0x93/0x1c0
[  253.522505]  ? nfsd_direct_splice_actor+0x20/0x20 [nfsd]
[  253.522568]  nfsd_direct_splice_actor+0x11/0x20 [nfsd]
[  253.522630]  splice_direct_to_actor+0xc8/0x1d0
[  253.522650]  ? fsid_source+0x60/0x60 [nfsd]
[  253.522710]  nfsd_splice_read+0x6b/0xf0 [nfsd]
[  253.522772]  nfsd_read+0x11d/0x180 [nfsd]
[  253.522833]  nfsd3_proc_read+0x156/0x210 [nfsd]
[  253.522898]  nfsd_dispatch+0x16a/0x280 [nfsd]
[  253.522958]  svc_process_common+0x265/0x5c0 [sunrpc]
[  253.523039]  ? nfsd_svc+0x360/0x360 [nfsd]
[  253.523098]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[  253.523159]  svc_process+0xad/0x100 [sunrpc]
[  253.523237]  nfsd+0xd5/0x190 [nfsd]
[  253.523295]  kthread+0xe9/0x110
[  253.523313]  ? kthread_complete_and_exit+0x20/0x20
[  253.523335]  ret_from_fork+0x22/0x30
[  253.523357]  </TASK>
[  253.523368] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace dm_crypt nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink sunrpc snd_hda_codec_hdmi intel_rapl_msr snd_hda_codec_via snd_hda_codec_generic intel_rapl_common ledtrig_audio x86_pkg_temp_thermal intel_powerclamp snd_hda_intel coretemp snd_intel_dspcfg snd_intel_sdw_acpi kvm_intel snd_hda_codec iTCO_wdt intel_pmc_bxt mei_pxp mei_hdcp at24 snd_hda_core iTCO_vendor_support kvm snd_pcsp snd_hwdep irqbypass rapl snd_seq intel_cstate snd_seq_device mxm_wmi snd_pcm i2c_i801 intel_uncore i2c_smbus snd_timer joydev snd lpc_ich soundcore mei_me mei fuse zram xfs i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 e1000e drm_buddy drm_display_helper cec alx mdio ttm video wmi

The splice_desc is in %r13:

crash> struct splice_desc ffffa57bc0d07d90
struct splice_desc {
  total_len = 0x10000,
  len = 0xfffc,
  flags = 0x0,
  u = {
    userptr = 0xffff8dd2c37d8000,
    file = 0xffff8dd2c37d8000,
    data = 0xffff8dd2c37d8000
  },
  pos = 0xf0004,
  opos = 0x0,
  num_spliced = 0x0,
  need_wakeup = 0x0
}

...unfortunately, I'm having trouble tracking down the pipe_buffer in this code.

(In reply to Chuck Lever from comment #51)
> The question in my mind is why this just started happening. Adding the
> svc_rqst_replace_page() helper was not supposed to cause any behavioral
> changes.

One theory is that we might be getting multipage folios in the pipe. Support for that was added to xfs a while back (see mapping_set_large_folios()). This code may not deal with those properly (though nfsd_splice_actor does have this in there, which seems to acknowledge the possibility):

        struct page *page = buf->page;  // may be a compound one

...and I don't see anything wrong in there right offhand. I'll need to think about the next debug steps.

Comment 53 Chuck Lever 2023-03-16 19:36:23 UTC
> (In reply to Chuck Lever from comment #51)
> > The question in my mind is why this just started happening. Adding the
> > svc_rqst_replace_page() helper was not supposed to cause any behavioral
> > changes.
> 
> One theory is that we might be getting multipage folios in the pipe.\

Ja, this is exactly what ac8db824ead0 ("NFSD: Fix reads with a non-zero offset that don't end on a page boundary") was supposed to fix ;-)

Comment 54 Jeff Layton 2023-03-16 19:49:23 UTC
Ok, I think I'm starting to sort of get it. Here's my current theory:

I suspect what's happening is that we're getting a partially filled out page in the middle of the splice. We end up stuffing that page into the array. Then the caller goes around again and the remainder of the page is filled out, so it shows up in the pipe again. Now we stuff it into the next slot in the array again. Do that enough times and we run out of slots in the array. That might explain why we see some duplicate entries in the rq_pages array as well:

    0xffffe1f145855400,    <<< dup
    0xffffe1f145855400,    <<< dup
    0xffffe1f145855440,
    0xffffe1f145855480
  },

Probably what we need to do is pay attention to whether we're dealing with the continuation of a partial page from the last splice and not replace the page in the array again. I'll have to think about how we can detect that and do it safely.

Comment 55 Chuck Lever 2023-03-16 20:05:26 UTC
(In reply to Jeff Layton from comment #54)
> Probably what we need to do is pay attention to whether we're dealing with
> the continuation of a partial page from the last splice and not replace the
> page in the array again. I'll have to think about how we can detect that and
> do it safely.

Does NFSD need to do this safely, or has the splice code or underlying filesystem broken us? I know the splice/pipe code has changed significantly in the 6.0 - 6.1 time frame.

Comment 56 Jeff Layton 2023-03-16 20:07:21 UTC
(In reply to Chuck Lever from comment #55)
> (In reply to Jeff Layton from comment #54)
> > Probably what we need to do is pay attention to whether we're dealing with
> > the continuation of a partial page from the last splice and not replace the
> > page in the array again. I'll have to think about how we can detect that and
> > do it safely.
> 
> Does NFSD need to do this safely, or has the splice code or underlying
> filesystem broken us? I know the splice/pipe code has changed significantly
> in the 6.0 - 6.1 time frame.

It's a good question and I don't know the answer to that yet. At this point, that's just a theory, and I need to validate that that is what's happening.

Comment 57 David Critch 2023-03-16 20:21:39 UTC
FWIW, this started happening after I went from kernel-5.17.13-200.fc35.x86_64 to kernel-6.0.9-200.fc36.x86_64. I removed the 6.0.9 kernel way back in November and hadn't updated in awhile. I did a system upgrade in February and had forgotten (or hoped the issue was fixed) which brought me to kernel-6.1.8-200.fc37.x86_64 and the current state.

Comment 58 Jeff Layton 2023-03-16 20:26:01 UTC
I put this patch in for now (delta on top of last patch):


diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 100c9399ef6d..45efc1fd8222 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -835,7 +835,15 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
        begin = rqstp->rq_pages;
        end = &rqstp->rq_pages[RPCSVC_MAXPAGES];
 
-       BUG_ON(rqstp->rq_next_page < begin || rqstp->rq_next_page > end);
+       /*
+        * Bounds check: make sure rq_next_page points into the rq_respages
+        * part of the array.
+        */
+       BUG_ON(rqstp->rq_next_page <= begin || rqstp->rq_next_page > end);
+
+       /* If this is a continuation of the last page, don't replace it. */
+       if (*rqstp->rq_next_page == *(rqstp->rq_next_page - 1))
+               return;
 
        if (*rqstp->rq_next_page) {
                if (!pagevec_space(&rqstp->rq_pvec))


Basically, we don't allow duplicate pages in the array, and just ignore replacing it if the next page pointer is going to be a duplicate of the last one. If the theory is correct, then this should fix it. I give it a 50% chance of working. David, if you're willing and able:

    https://koji.fedoraproject.org/koji/taskinfo?taskID=98777686

Comment 59 Jeff Layton 2023-03-16 20:48:01 UTC
Sorry, make that condition:

    if (page == *(rqstp->rq_next_page - 1))
           return;

I canceled the last kernel. This one has the correct patch:

    https://koji.fedoraproject.org/koji/taskinfo?taskID=98778395

Comment 60 David Critch 2023-03-16 23:40:00 UTC
Issue is FIXED with latest test kernel!

Comment 61 Jeff Layton 2023-03-17 10:25:50 UTC
Great! Thanks for testing it. I think that means that we have an understanding of the problem now. The patch is not quite what we want upstream I think, but I'll respin it and send it out soon.

Comment 62 Jeff Layton 2023-03-17 11:00:55 UTC
Patch posted upstream:

https://lore.kernel.org/linux-nfs/20230317105608.19393-1-jlayton@kernel.org/T/#t

Comment 63 Trevor Hemsley 2023-03-17 14:25:30 UTC
This bug also affects Fedora 36 as well as 37. Does anything special need to happen to make sure that 36 gets the fix as well?

Comment 64 Jeff Layton 2023-03-17 14:41:09 UTC
(In reply to Trevor Hemsley from comment #63)
> This bug also affects Fedora 36 as well as 37. Does anything special need to
> happen to make sure that 36 gets the fix as well?

It shouldn't. The upstream stable kernels should pick this up fairly quickly once it goes in, and Fedora will get it after that.

Comment 65 Justin M. Forbes 2023-03-17 15:20:43 UTC
I can actually pull it into Fedora as soon as it is staged upstream. As it looks like there will be a v2, this won't make 6.2.7, but once it is ready, we use the same source tree for all Fedora stable releases so all versions should get it.

Comment 66 Jeff Layton 2023-03-22 09:55:19 UTC
Patch is now merged upstream and should make it into stable releases very soon.


Note You need to log in before you can comment on or make changes to this bug.