Bug 2127067

Summary: [RHEL 9] BUG: KASAN: use-after-free in nfsd4_cb_prepare+0x227/0x250 [nfsd]
Product: Red Hat Enterprise Linux 9 Reporter: Zhi Li <yieli>
Component: kernelAssignee: Jeff Layton <jlayton>
kernel sub component: NFS QA Contact: Zhi Li <yieli>
Status: CLOSED MIGRATED Docs Contact:
Severity: unspecified    
Priority: unspecified CC: adscvr, bcodding, chuck.lever, jiyin, jlayton, smayhew, steved, xzhou, yoyang
Version: 9.2Keywords: MigratedToJIRA
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-23 11:15:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2091421    

Description Zhi Li 2022-09-15 09:14:17 UTC
Description of problem:
When running the NFS regression tests against debug kernel, hit this BUG twice:
====
[  812.539117] BUG: KASAN: use-after-free in nfsd4_cb_prepare+0x227/0x250 [nfsd] 
[  812.546301] Read of size 8 at addr ff11000129b72cc0 by task kworker/u226:1/26363 
[  812.553704]  
[  812.555204] CPU: 85 PID: 26363 Comm: kworker/u226:1 Kdump: loaded Not tainted 5.14.0-163.el9.x86_64+debug #1 
[  812.565029] Hardware name: Dell Inc. PowerEdge R750/0PJ80M, BIOS 1.5.4 12/17/2021 
[  812.572509] Workqueue: rpciod rpc_async_schedule [sunrpc] 
[  812.577969] Call Trace: 
[  812.580427]  dump_stack_lvl+0x57/0x81 
[  812.584098]  print_address_description.constprop.0+0x1f/0x140 
[  812.589856]  ? nfsd4_cb_prepare+0x227/0x250 [nfsd] 
[  812.594689]  __kasan_report.cold+0x7f/0x122 
[  812.598888]  ? nfsd4_cb_prepare+0x227/0x250 [nfsd] 
[  812.603715]  ? __rpc_atrun+0x60/0x60 [sunrpc] 
[  812.608119]  kasan_report+0x38/0x50 
[  812.611615]  nfsd4_cb_prepare+0x227/0x250 [nfsd] 
[  812.616274]  ? __rpc_atrun+0x60/0x60 [sunrpc] 
[  812.620681]  __rpc_execute+0x1a4/0xdf0 [sunrpc] 
[  812.625281]  rpc_async_schedule+0x9f/0x140 [sunrpc] 
[  812.630206]  process_one_work+0x8c8/0x1590 
[  812.634326]  ? __lock_acquired+0x209/0x890 
[  812.638429]  ? pwq_dec_nr_in_flight+0x230/0x230 
[  812.642973]  ? __lock_contended+0x980/0x980 
[  812.647177]  ? worker_thread+0x157/0x1010 
[  812.651204]  worker_thread+0x59b/0x1010 
[  812.655069]  ? process_one_work+0x1590/0x1590 
[  812.659434]  kthread+0x361/0x420 
[  812.662670]  ? _raw_spin_unlock_irq+0x24/0x50 
[  812.667036]  ? set_kthread_struct+0x110/0x110 
[  812.671403]  ret_from_fork+0x1f/0x30 
[  812.675017]  
[  812.676517] Allocated by task 19249: 
[  812.680096]  kasan_save_stack+0x1e/0x50 
[  812.683936]  __kasan_slab_alloc+0x66/0x80 
[  812.687948]  kmem_cache_alloc+0x161/0x310 
[  812.691964]  nfs4_alloc_stid+0x29/0x430 [nfsd] 
[  812.696575]  nfs4_set_delegation+0x260/0x1090 [nfsd] 
[  812.701706]  nfs4_open_delegation+0x29c/0x7a0 [nfsd] 
[  812.706837]  nfsd4_process_open2+0xeeb/0x1e70 [nfsd] 
[  812.711966]  nfsd4_open+0xc5d/0x11c0 [nfsd] 
[  812.716310]  nfsd4_proc_compound+0xdbc/0x25a0 [nfsd] 
[  812.721438]  nfsd_dispatch+0x4dc/0xcd0 [nfsd] 
[  812.725946]  svc_process_common+0x1140/0x1c40 [sunrpc] 
[  812.731321]  svc_process+0x38b/0x590 [sunrpc] 
[  812.735923]  nfsd+0x281/0x3f0 [nfsd] 
[  812.739650]  kthread+0x361/0x420 
[  812.742897]  ret_from_fork+0x1f/0x30 
[  812.746484]  
[  812.747983] Freed by task 19247: 
[  812.751216]  kasan_save_stack+0x1e/0x50 
[  812.755055]  kasan_set_track+0x21/0x30 
[  812.758808]  kasan_set_free_info+0x20/0x40 
[  812.762916]  __kasan_slab_free+0xec/0x120 
[  812.766929]  slab_free_freelist_hook+0xa3/0x1d0 
[  812.771468]  kmem_cache_free+0x118/0x4b0 
[  812.775397]  nfs4_free_deleg+0x14/0x40 [nfsd] 
[  812.779791]  nfs4_put_stid+0x29f/0x430 [nfsd] 
[  812.784184]  nfsd4_free_stateid+0x30a/0x570 [nfsd] 
[  812.789012]  nfsd4_proc_compound+0xdbc/0x25a0 [nfsd] 
[  812.794012]  nfsd_dispatch+0x4dc/0xcd0 [nfsd] 
[  812.798406]  svc_process_common+0x1140/0x1c40 [sunrpc] 
[  812.803596]  svc_process+0x38b/0x590 [sunrpc] 
[  812.807998]  nfsd+0x281/0x3f0 [nfsd] 
[  812.811605]  kthread+0x361/0x420 
[  812.814838]  ret_from_fork+0x1f/0x30 
[  812.818427]  
[  812.819925] Last potentially related work creation: 
[  812.824805]  kasan_save_stack+0x1e/0x50 
[  812.828645]  __kasan_record_aux_stack+0xb2/0xc0 
[  812.833185]  insert_work+0x47/0x310 
[  812.836678]  __queue_work+0x4dd/0xd60 
[  812.840342]  queue_work_on+0x7f/0x90 
[  812.843923]  nfsd4_run_cb+0x51/0x80 [nfsd] 
[  812.848057]  nfsd_break_deleg_cb+0x16d/0x390 [nfsd] 
[  812.852971]  __break_lease+0x331/0x10a0 
[  812.856810]  do_dentry_open+0x3c8/0xec0 
[  812.860649]  do_open+0x69c/0xec0 
[  812.863882]  path_openat+0x281/0x680 
[  812.867461]  do_filp_open+0x1aa/0x3f0 
[  812.871128]  do_sys_openat2+0x126/0x410 
[  812.874968]  __x64_sys_openat+0x11f/0x1e0 
[  812.878978]  do_syscall_64+0x59/0x90 
[  812.882559]  entry_SYSCALL_64_after_hwframe+0x63/0xcd 
[  812.887620]  
[  812.889119] The buggy address belongs to the object at ff11000129b72be0 
[  812.889119]  which belongs to the cache nfsd4_delegations of size 368 
[  812.902147] The buggy address is located 224 bytes inside of 
[  812.902147]  368-byte region [ff11000129b72be0, ff11000129b72d50) 
[  812.913880] The buggy address belongs to the page: 
[  812.918675] page:0000000014fe62d9 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x129b70 
[  812.928068] head:0000000014fe62d9 order:2 compound_mapcount:0 compound_pincount:0 
[  812.935546] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff) 
[  812.942951] raw: 0017ffffc0010200 0000000000000000 dead000000000122 ff110002d6e4de00 
[  812.950699] raw: 0000000000000000 0000000000250025 00000001ffffffff 0000000000000000 
[  812.958443] page dumped because: kasan: bad access detected 
[  812.964018]  
[  812.965517] Memory state around the buggy address: 
[  812.970308]  ff11000129b72b80: 00 00 00 00 fc fc fc fc fc fc fc fc fa fb fb fb 
[  812.977530]  ff11000129b72c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
[  812.984756] >ff11000129b72c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
[  812.991975]                                            ^ 
[  812.997290]  ff11000129b72d00: fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc 
[  813.004507]  ff11000129b72d80: fc fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
[  813.011726] ================================================================== 
[  813.018945] Disabling lock debugging due to kernel taint 

Version-Release number of selected component (if applicable):
5.14.0-163.el9.x86_64+debug

How reproducible:
happen occasionally

Steps to Reproduce:
1. clone beaker job:  https://beaker.engineering.redhat.com/jobs/7010492
         console log: https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2022/09/70104/7010492/12605873/console.log

Beaker job:  https://beaker.engineering.redhat.com/recipes/12563945#task149870891
console log: https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2022/09/69839/6983964/12563945/console.log

2.
3.

Actual results:
BUG reported

Expected results:
no BUG in call trace

Comment 1 Zhi Li 2022-09-15 09:23:32 UTC
This issue can also be reproduced on rhel8, tracked by bz1849472.

Comment 2 Jeff Layton 2022-09-22 10:36:09 UTC
Making this bug public, since I suspect this is related to an upstream bug Chuck has been hunting here:

    https://bugzilla.linux-nfs.org/show_bug.cgi?id=394

Comment 3 Jeff Layton 2022-09-23 12:04:55 UTC
All of the places where this has been reproduced so far seem to be running a regression test for bz1687360. Here's the script that runs in that test:

Server() {
rlPhaseStartSetup do-$role-Setup-
	rlFileBackup /etc/exports /etc/nfs.conf
	run "mkdir -p $expdir"
	run "echo \"$expdir *(rw,no_root_squash)\" >/etc/exports"
	run "echo -e \"[nfsd]\n threads=64\n lease-time=15\" > /etc/nfs.conf"
	run "service_nfs restart"
	run "tmux new -d \"tcpdump -U -q -i $(getDefaultNic) -w $PCAP_FILE -s 0 host $CLIENT\""
	run "ps aux | grep -v grep | grep -i tmux"
rlPhaseEnd

rlPhaseStartTest do-$role-Test-
	run "rhts-sync-set -s servReady"
	run "rhts-sync-block -s testDone $CLIENT"
	run 'for i in $(seq 1 10000); do date > $expdir/file_$i & done'
	run "sleep 200"
	run "pkill tcpdump && sleep 3"
	run "ps aux | grep -v grep | grep tcpdump" 1
	run "tshark -ntad -r ${PCAP_FILE} &> /tmp/test.log"
	run 'count=`cat /tmp/test.log | grep TEST_STATEID | wc -l`'
	run "test $count = 1" 0 "NFS client should only send TEST_STATEID one time, got $count"
	run "cat /tmp/test.log | grep TEST_STATEID | tail" -
	run "rlFileSubmit $PCAP_FILE"
	run "rhts-sync-set -s servtestDone"
	run "rhts-sync-block -s clicleanDone $CLIENT"
rlPhaseEnd

rlPhaseStartCleanup do-$role-Cleanup-
	rlFileRestore
	run "rm -rf $expdir"
rlPhaseEnd
}

Client() {
rlPhaseStartSetup do-$role-Setup-
	run "mkdir -p $nfsmp"
	Vers=`uname -r`
	[[ "$Vers" != *+debug ]] && {
		run "debuginfo-install -y kernel-${Vers}"
		run 'rpm -q kernel-debuginfo-$Vers || brewinstall.sh -onlydebuginfo kernel-${Vers%.*}'
	}
	run "rhts-sync-block -s servReady $SERVER"
rlPhaseEnd

rlPhaseStartTest do-$role-Test-
	run "mount -overs=4.2 $SERVER:$expdir $nfsmp"
	run 'for i in $(seq 1 10000); do touch $nfsmp/file_$i; done'
	run "stap -gve 'probe module(\"nfsv4\").function(\"nfs4_callback_recall\") { printf(\"%s: delaying\n\", ppfunc()); mdelay(10); }' &> $STAP_OUT &"
	run "while true; do grep -q \"Pass 5: starting run\" $STAP_OUT && break; sleep 1; done"
	run "cat $STAP_OUT"
	run 'for i in $(seq 1 10000); do sleep 2m < $nfsmp/file_$i & done'
	run "rhts-sync-set -s testDone"
	run "rhts-sync-block -s servtestDone $SERVER"
	while ps aux | grep -v grep | grep sleep &> /dev/null; do
		pkill sleep
		sleep 3
	done
	run "ps aux | grep -v grep | grep sleep" 1
	run "umount -l $nfsmp"
	run "rm -rf $nfsmp"
	run "rhts-sync-set -s clicleanDone"
rlPhaseEnd

rlPhaseStartCleanup do-$role-Cleanup-
	rlFileRestore
rlPhaseEnd
}

I don't have the test harness set up, but this gives you an idea of what this test is doing. It sets a very short lease lifetime, and then introduces some artificial delays on the client-side recall handling. I guess this is to set up a situation where the delegation gets fully revoked.

Hypothesis: the problem is in the revoked delegation handling

I'll try to turn this into something I can run by hand and see if I can reproduce it that way.

Comment 4 Jeff Layton 2022-09-23 12:27:43 UTC
I think I may have spotted it:

[jlayton@tleilax kernel-rhel9]$ git diff
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 9069599c699f..e6384e350785 10064
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5991,7 +5991,6 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
        if (!*s)
                return nfserr_bad_stateid;
        if (((*s)->sc_type == NFS4_REVOKED_DELEG_STID) && !return_revoked) {
-               nfs4_put_stid(*s);
                if (cstate->minorversion)
                        return nfserr_deleg_revoked;
                return nfserr_bad_stateid;


In the case where we find a REVOKED stateid, we are putting the reference but still filling out the return pointer.

Comment 5 Jeff Layton 2022-09-23 12:41:29 UTC
No, spoke too soon. It's ugly, but the callers seem to handle that ok. I'll still plan to send a cleanup patch to make nfsd4_lookup_stateid only set the return pointer on success, since that's best practice.

Comment 6 Jeff Layton 2022-09-26 11:21:21 UTC
I spent a bunch of time on Friday trying to reproduce this, but no joy. I also crawled through the delegation handling code (again) in the hopes I'd spot the refcounting issue, but I still don't see it. I did come away with a few cleanup patches that I'll post soon. I may have to table working on this until we can come up with a way to reliably reproduce it.

Comment 8 Facundo 2022-12-08 09:33:06 UTC
I'm running Fedora Server 37 and I hit the same BUG making the system totally unstable. The only solution I've found was to downgrade to libnfsidmap-1:2.6.2-1.rc3.fc37 and nfs-utils-1:2.6.2-1.rc3.fc37. Now the system is stable.
Hope this helps.

Comment 12 RHEL Program Management 2023-09-23 11:15:15 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 13 RHEL Program Management 2023-09-23 11:15:48 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.