Bug 2164887

Summary: [nfs] refcount_t: underflow; use-after-free lib/refcount.c:28 refcount_warn_saturat
Product: Red Hat Enterprise Linux 9 Reporter: daryl herzmann <akrherz>
Component: kernelAssignee: nfs-maint
kernel sub component: NFS QA Contact: Filesystem QE <fs-qe>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: bstinson, jlayton, jwboyer
Version: CentOS Stream   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-26 20:50:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description daryl herzmann 2023-01-26 20:38:06 UTC
Description of problem:

I have been fighting a reproducible NFS kernel panic with a CS9 Stream + ZFS host.  I saw a bunch of NFS4 fixes went into 5.14.0-239, so I got excited, but was able to reproduce the crash by just running `exportfs -a`

Version-Release number of selected component (if applicable):

current centos 9 stream
5.14.0-239.el9.x86_64

How reproducible:

Seemingly always. ;(

Steps to Reproduce:
1. exportfs -a

Actual results:

[  122.093687] ------------[ cut here ]------------
[  122.093723] refcount_t: underflow; use-after-free.
[  122.093749] WARNING: CPU: 18 PID: 5275 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
[  122.093786] Modules linked in: rpcsec_gss_krb5 tls vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE nft_chain_nat nf_nat rpcrdma rdma_cm iw_cm ib_cm ib_core bridge stp llc nft_counter ipt_REJECT nf_reject_ipv4 xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables dell_rbu nfnetlink ledtrig_audio rfkill intel_rapl_msr video dcdbas intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm irqbypass rapl pcspkr dell_smbios dell_wmi_descriptor wmi_bmof ipmi_ssif k10temp i2c_piix4 ptdma acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter vfat fat zfs(POE) zunicode(POE) zzstd(OE) ext4 zlua(OE) zavl(POE) icp(POE) mbcache zcommon(POE) jbd2 joydev znvpair(POE) spl(OE) nfsd auth_rpcgss nfs_acl lockd grace sunrpc fuse xfs libcrc32c sd_mod sg mgag200 i2c_algo_bit drm_shmem_helper drm_kms_helper nvme syscopyarea ahci sysfillrect nvme_core sysimgblt libahci crct10dif_pclmul fb_sys_fops crc32_pclmul crc32c_intel nvme_common drm libata ghash_clmulni_intel tg3
[  122.093859]  megaraid_sas ccp t10_pi sp5100_tco wmi dm_mirror dm_region_hash dm_log dm_mod
[  122.094095] CPU: 18 PID: 5275 Comm: nfsd Kdump: loaded Tainted: P           OE    --------- ---  5.14.0-239.el9.x86_64 #1
[  122.094140] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.8.4 06/23/2022
[  122.094164] RIP: 0010:refcount_warn_saturate+0xba/0x110
[  122.094183] Code: 01 01 e8 49 a9 56 00 0f 0b e9 22 98 89 00 80 3d 4a 3e 9b 01 00 75 85 48 c7 c7 88 cb c4 a1 c6 05 3a 3e 9b 01 01 e8 26 a9 56 00 <0f> 0b e9 ff 97 89 00 80 3d 25 3e 9b 01 00 0f 85 5e ff ff ff 48 c7
[  122.094233] RSP: 0018:ffffa6e1cf28fcc8 EFLAGS: 00010282
[  122.094268] RAX: 0000000000000000 RBX: ffff94789aef8208 RCX: 0000000000000027
[  122.094291] RDX: ffff9493bf4998a8 RSI: 0000000000000001 RDI: ffff9493bf4998a0
[  122.094313] RBP: ffff947739115280 R08: 0000000000000000 R09: 00000000ffff7fff
[  122.094335] R10: ffffa6e1cf28fb68 R11: ffffffffa25e96c8 R12: ffff94789aefe024
[  122.094358] R13: ffff94789aef8208 R14: 0000000000000000 R15: ffff9476712e4fd0
[  122.094380] FS:  0000000000000000(0000) GS:ffff9493bf480000(0000) knlGS:0000000000000000
[  122.094406] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  122.094425] CR2: 00007f10fb4d6e10 CR3: 00000020b2c32000 CR4: 0000000000350ee0
[  122.094448] Call Trace:
[  122.094461]  <TASK>
[  122.094471]  nfsd_file_free+0x225/0x230 [nfsd]
[  122.094512]  nfs4_file_put_access+0x7e/0x130 [nfsd]
[  122.094550]  release_all_access+0x6a/0x80 [nfsd]
[  122.094584]  nfs4_free_ol_stateid+0x22/0x60 [nfsd]
[  122.094618]  nfs4_put_stid+0xb1/0x100 [nfsd]
[  122.094652]  nfsd4_close+0x1e3/0x3c0 [nfsd]
[  122.094687]  ? nfsd4_encode_getattr+0x28/0x30 [nfsd]
[  122.094723]  ? nfsd4_encode_operation+0xdc/0x270 [nfsd]
[  122.094759]  nfsd4_proc_compound+0x446/0x6f0 [nfsd]
[  122.094796]  nfsd_dispatch+0x15e/0x290 [nfsd]
[  122.094831]  svc_process_common+0x3bc/0x5e0 [sunrpc]
[  122.094877]  ? nfsd_svc+0x190/0x190 [nfsd]
[  122.094910]  ? nfsd_shutdown_threads+0xa0/0xa0 [nfsd]
[  122.095603]  svc_process+0xb7/0xf0 [sunrpc]
[  122.096261]  nfsd+0xd5/0x190 [nfsd]
[  122.096904]  kthread+0xd9/0x100
[  122.097535]  ? kthread_complete_and_exit+0x20/0x20
[  122.098162]  ret_from_fork+0x22/0x30
[  122.098781]  </TASK>
[  122.099380] ---[ end trace 7a8b3c06a65fce64 ]---
[  122.122108] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  122.122923] #PF: supervisor instruction fetch in kernel mode
[  122.123543] #PF: error_code(0x0010) - not-present page
[  122.124136] PGD 208ac34067 P4D 208ac34067 PUD 208ac32067 PMD 0 
[  122.124720] Oops: 0010 [#1] PREEMPT SMP NOPTI
[  122.125286] CPU: 18 PID: 0 Comm: swapper/18 Kdump: loaded Tainted: P        W  OE    --------- ---  5.14.0-239.el9.x86_64 #1
[  122.125860] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.8.4 06/23/2022
[  122.126423] RIP: 0010:0x0
[  122.126971] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[  122.127514] RSP: 0018:ffffa6e1ccaacee8 EFLAGS: 00010202
[  122.128044] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00000000105ba012
[  122.128569] RDX: ffff94789aefe048 RSI: bbb4bbef1e6de0fa RDI: ffff94789aef8220
[  122.129085] RBP: ffff94744628b900 R08: ffffffffa2259364 R09: 0000000000000101
[  122.129593] R10: 0000000000000040 R11: ffffffffa2206100 R12: ffff9493bf4abb40
[  122.130096] R13: 0000000000000003 R14: 000000000000000a R15: 0000000000000000
[  122.130588] FS:  0000000000000000(0000) GS:ffff9493bf480000(0000) knlGS:0000000000000000
[  122.131074] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  122.131553] CR2: ffffffffffffffd6 CR3: 00000020b2c32000 CR4: 0000000000350ee0
[  122.132035] Call Trace:
[  122.132507]  <IRQ>
[  122.132975]  rcu_do_batch+0x1ae/0x4d0
[  122.133449]  rcu_core+0x26a/0x410
[  122.133911]  __do_softirq+0xca/0x2ac
[  122.134366]  __irq_exit_rcu+0xb5/0xe0
[  122.134814]  sysvec_apic_timer_interrupt+0x72/0x90
[  122.135258]  </IRQ>
[  122.135688]  <TASK>
[  122.136117]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[  122.136556] RIP: 0010:mwait_idle+0x51/0x80
[  122.136997] Code: 31 d2 48 89 d1 65 48 8b 04 25 40 8f 01 00 0f 01 c8 48 8b 00 a8 08 75 14 eb 07 0f 00 2d c4 8e 4d 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 01 fb 65 48 8b 04 25 40 8f 01 00 f0 80 60 02 df e9 79 fc 2c 00
[  122.137929] RSP: 0018:ffffa6e1c8217ed0 EFLAGS: 00000246
[  122.138401] RAX: 0000000000000000 RBX: ffff94744628b900 RCX: 0000000000000000
[  122.138881] RDX: 0000000000000000 RSI: 0000000000000012 RDI: 0000000000164ed2
[  122.139361] RBP: 0000000000000000 R08: 0000001c6eb41521 R09: ffff9474502e5600
[  122.139842] R10: 00000000000002e4 R11: ffff94744a8d0e10 R12: 0000000000000000
[  122.140323] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  122.140804]  default_idle_call+0x33/0xe0
[  122.141285]  cpuidle_idle_call+0x15d/0x1c0
[  122.141762]  ? ktime_get+0x38/0xa0
[  122.142238]  do_idle+0x7b/0xe0
[  122.142709]  cpu_startup_entry+0x19/0x20
[  122.143180]  start_secondary+0x116/0x140
[  122.143653]  secondary_startup_64_no_verify+0xe5/0xeb
[  122.144129]  </TASK>
[  122.144597] Modules linked in: rpcsec_gss_krb5 tls vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE nft_chain_nat nf_nat rpcrdma rdma_cm iw_cm ib_cm ib_core bridge stp llc nft_counter ipt_REJECT nf_reject_ipv4 xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables dell_rbu nfnetlink ledtrig_audio rfkill intel_rapl_msr video dcdbas intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm irqbypass rapl pcspkr dell_smbios dell_wmi_descriptor wmi_bmof ipmi_ssif k10temp i2c_piix4 ptdma acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter vfat fat zfs(POE) zunicode(POE) zzstd(OE) ext4 zlua(OE) zavl(POE) icp(POE) mbcache zcommon(POE) jbd2 joydev znvpair(POE) spl(OE) nfsd auth_rpcgss nfs_acl lockd grace sunrpc fuse xfs libcrc32c sd_mod sg mgag200 i2c_algo_bit drm_shmem_helper drm_kms_helper nvme syscopyarea ahci sysfillrect nvme_core sysimgblt libahci crct10dif_pclmul fb_sys_fops crc32_pclmul crc32c_intel nvme_common drm libata ghash_clmulni_intel tg3
[  122.144637]  megaraid_sas ccp t10_pi sp5100_tco wmi dm_mirror dm_region_hash dm_log dm_mod
[  122.149557] CR2: 0000000000000000

I have a vmcore to share, if anybody is interested more in this...

Comment 1 Jeff Layton 2023-01-26 20:50:03 UTC

*** This bug has been marked as a duplicate of bug 2160443 ***