Bug 1448750
| Summary: | BUG: unable to handle kernel paging request at 0; IP: [<ffffffffc05ae76b>] nfsd4_cb_done+0x2b/0x310 [nfsd] | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | JianHong Yin <jiyin> | ||||||
| Component: | kernel | Assignee: | J. Bruce Fields <bfields> | ||||||
| kernel sub component: | NFS | QA Contact: | JianHong Yin <jiyin> | ||||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||||
| Severity: | high | ||||||||
| Priority: | high | CC: | bcodding, bfields, chorn, cmaiolin, dwysocha, fsorenso, gsierohu, jiyin, jlayton, knakai, pvlasin, rbergant, redhat-bugzilla, revers, rmonther, smayhew, snishika, steved, swhiteho, tumeya, vvs, xzhou, yieli, yoguma, yoyang, yzheng, zlang | ||||||
| Version: | 7.4 | ||||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | kernel-3.10.0-1139.el7 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2020-09-29 20:59:41 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1456511, 1469559, 1477664, 1546181 | ||||||||
| Attachments: |
|
||||||||
|
Description
JianHong Yin
2017-05-08 01:35:42 UTC
Additional info: reproducer: https://bugzilla.redhat.com/show_bug.cgi?id=1342305#c16 https://git.hpdd.intel.com/fs/lustre-release.git/tree/refs/heads/master:/lustre/tests/racer 0x190 is the offset of cl_minorversion in struct nfs4_client, so I think nfsd4_cb_done got called with a task with (struct nfsd4_callback *)(task->tk_calldata)->cb_clp set to 0x100000000, for some reason. Probably the nfsd4_callback was freed and reused before nfsd4_cb_done was called. It's unlikely there's any pnfs involved here, and I don't believe this test is something that frequently creates and destroys nfs clients, so it's most likely that this callback was a delegation recall. In which case the that nfsd4_callback is embedded in a struct nfsd4_delegation. The code that creates delegations is unfortunately quite complicated, so it's believable there could be a race or other rare failure case there. So far I haven't found one, but haven't convinced myself it's impossible either; I'll study it some more. I took a look at the cores from comment #2 and comment #9 and they were indeed delegation recalls. In both cores it looks like the nfsd4_callback and the nfs4_delegation that they were embedded in were both freed. A few other interesting tidbits: - in both cores the rpc_task.tk_status was -EIO - in both cores the rpc_task.tk_flags had RPC_TASK_KILLED set - in both cores nfsd was in the process of shutting down (there was one nfsd thread and it was somewhere in svc_shutdown_net) - in the core from comment #2 the state manager thread was running, but it wasn't in the core from comment #9 (In reply to Scott Mayhew from comment #14) > I took a look at the cores from comment #2 and comment #9 and they were > indeed delegation recalls. In both cores it looks like the nfsd4_callback > and the nfs4_delegation that they were embedded in were both freed. A few > other interesting tidbits: > > - in both cores the rpc_task.tk_status was -EIO > - in both cores the rpc_task.tk_flags had RPC_TASK_KILLED set > - in both cores nfsd was in the process of shutting down (there was one nfsd > thread and it was somewhere in svc_shutdown_net) > - in the core from comment #2 the state manager thread was running, but it > wasn't in the core from comment #9 Thanks, that's interesting! I hadn't thought to look at server shutdown. I haven't looked at the actual test--it'd be interesting to know how and when exactly the server's being shutdown in this test, and whether there might be any local or remote users of the filesystem active at the time of shutdown. Looking at the server shutdown code I do have some delegation-related questions: - in nfs4_state_shutdown, why do we first clear the del_recall_lru list? I can't see what prevents more delegations from being added to that list after this point, since the add is done by work items that as far as I can see might still be running. - in __destroy_client, we do a list_add that asumes dp->dl_recall_lru is empty, is that a safe assumption? - I'm worried about races between callbacks and delegation destruction in __destroy_client. Maybe the nfsd4_shutdown_callback() needs to come a lot earlier? Anyway, I think a crash on shutdown is maybe a little less likely and a little lower impact than a crash during normal operation, so given the schedule I'm inclined to postpone. (In reply to J. Bruce Fields from comment #15) > Thanks, that's interesting! I hadn't thought to look at server shutdown. I > haven't looked at the actual test--it'd be interesting to know how and when > exactly the server's being shutdown in this test, and whether there might be > any local or remote users of the filesystem active at the time of shutdown. Huh, no, doesn't look like there's anything interesting going on there: DURATION=3600 sh racer.sh $nfsmp/racer & DURATION=3600 sh racer.sh $nfsmp2/racer & DURATION=3600 sh racer.sh $nfsexpdir/racer & wait %1 %2 %3 umount $nfsmp || exit 5 umount $nfsmp2 || exit 6 # stop nfs-server.service service_nfs stop || exit 7 umount $nfsexpdir || exit 8 So somehow a delegation callback outlived the client mounts and is causing trouble on the "service nfs stop". (In reply to J. Bruce Fields from comment #15) > Looking at the server shutdown code I do have some delegation-related > questions: But, looking back at one of the cores, it's still in svc_close_net, which is before we start tearing down the NFSv4 state. So, there may still be a bug in that NFSv4 state shutdown, but if so it's likely not the problem here. (In reply to J. Bruce Fields from comment #18) > So, there may still be a bug in that NFSv4 state shutdown, but if so it's > likely not the problem here. Looks like what Bug 1312002 describes. Sorry I'm not sure about it. Apologies for the silence. I spent some more time studying the relevant code to see if I could spot the bug, but no luck so far. RPC_TASK_KILLED suggests the rpc client is being shut down. By that point the delegations have all been unhashed, but that should have put only one of two references on the delegation we're recalling--the callback should hold the other reference. All I can think is that there's been some reference counting error. I should try to run the reproducer myself. Apologies, I don't use our test infrastructure as often as I should, could you remind me how to get the recipe with the relevant scripts? I'd like to check that it's still reproduceable upstream. (Probably it is--at least I can't see any relevant commits that we haven't backported.) And then see if I can patch in some useful instrumentation. *** Bug 1542298 has been marked as a duplicate of this bug. *** I see one thing in the code that bothers me: If the vfs_setlease() in nfs4_setlease() succeeds but then hash_delegation_locked (or the fi_had_conflict check?) fails, then I don't see anything which removes that lease. It looks to me like the fp which the lease's fl_owner field points to can even be freed while the lease is still applied. Once nfsd_break_deleg_cb() is called with such a lease, anything could happen, including the observed symptoms--though I'm not sure why they'd be the most likely. Agreed. That does look wrong, especially in the RHEL7 code. It looks a bit better in upstream master, but even that looks like it has a bug. In the event that hash_delegation_locked fails there, we'll end up returning the (presumably now freed) delegation pointer (dp). We should really return ERR_PTR(status) in that case. With 20 jobs on 3.10.0-663.el7 I was able to reproduce twice. So my inability to reproduce upstream might be luck or might mean that it's fixed upstream. Running more jobs on upstream. My backport passes our test--but so does the redhat kernel (kernel-3.10.0-916.el7) that I based the backport on. So either the problem was inadvertently fixed between -693 (the latest redhat kernel where we've seen this) and -916, or our reproducer is just not reliable enough. I'm running some more tests and doing some more work on the backport. (In reply to J. Bruce Fields from comment #50) > So either the problem was inadvertently fixed between -693 (the latest > redhat kernel where we've seen this) and -916, or our reproducer is just not > reliable enough. Yes, I just can't reproduce it predictably enough to say for sure whether a particular patch has fixed the problem. And I'm not figuring it out from just analyzing the code either so far. Other eyes on the code might help. Or core analysis, or adding some instrumentation. A backport of possibly relevant upstream patches is in the 1448750-cb_done branch of git.engineering.redhat.com/srv/git/users/bfields/linux.git. I poked around in my old vmcore (https://optimus.gsslab.rdu2.redhat.com/manager/897324140) some more and found some (maybe) interesting stuff I hadn't noticed before. I'll try not to rehash stuff that's already been mentioned before. Here's the panic task: crash> bt PID: 4646 TASK: ffff98c3e75daf10 CPU: 0 COMMAND: "kworker/0:3" #0 [ffff98c3e74479e0] machine_kexec at ffffffff9545c44b #1 [ffff98c3e7447a40] __crash_kexec at ffffffff95504a82 #2 [ffff98c3e7447b10] crash_kexec at ffffffff95504b70 #3 [ffff98c3e7447b28] oops_end at ffffffff95aa9078 #4 [ffff98c3e7447b50] no_context at ffffffff95a991f4 #5 [ffff98c3e7447ba0] __bad_area_nosemaphore at ffffffff95a9928a #6 [ffff98c3e7447be8] bad_area_nosemaphore at ffffffff95a993f4 #7 [ffff98c3e7447bf8] __do_page_fault at ffffffff95aabf3e #8 [ffff98c3e7447c58] trace_do_page_fault at ffffffff95aac196 #9 [ffff98c3e7447c98] do_async_page_fault at ffffffff95aab82a #10 [ffff98c3e7447cb0] async_page_fault at ffffffff95aa8338 [exception RIP: nfsd4_cb_done+0x2b] RIP: ffffffffc06ba82b RSP: ffff98c3e7447d60 RFLAGS: 00010246 RAX: ffffffffc06ba800 RBX: ffff98c3d1995088 RCX: dead000000000200 RDX: 0000000000000001 RSI: ffff98c3d1995088 RDI: ffff98c39b860400 RBP: ffff98c3e7447d88 R8: ffff98c39b860448 R9: 9cf12aab8a860440 R10: 9cf12aab8a860440 R11: fffffd17c19d1d00 R12: ffff98c39b860400 R13: 0000000100000000 R14: 0000000100000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #11 [ffff98c3e7447d90] rpc_exit_task at ffffffffc06129ee [sunrpc] #12 [ffff98c3e7447da8] __rpc_execute at ffffffffc0614007 [sunrpc] #13 [ffff98c3e7447e10] rpc_async_schedule at ffffffffc0614392 [sunrpc] #14 [ffff98c3e7447e20] process_one_work at ffffffff954a874a #15 [ffff98c3e7447e68] worker_thread at ffffffff954a97e6 #16 [ffff98c3e7447ec8] kthread at ffffffff954b093f #17 [ffff98c3e7447f50] ret_from_fork at ffffffff95ab0d18 Here's the rpc_task and the nfsd4_callback that caused the panic: crash> rpc_task.tk_calldata ffff98c39b860400 tk_calldata = 0xffff98c3d1995088 crash> nfsd4_callback 0xffff98c3d1995088 struct nfsd4_callback { cb_clp = 0x100000000, ... Looking at the kworker threads, one of them was in nfsd4_process_cb_update(): crash> set 7038 PID: 7038 COMMAND: "kworker/u4:2" TASK: ffff98c3dece2f10 [THREAD_INFO: ffff98c39ea0c000] CPU: 0 STATE: TASK_UNINTERRUPTIBLE crash> bt PID: 7038 TASK: ffff98c3dece2f10 CPU: 0 COMMAND: "kworker/u4:2" #0 [ffff98c39ea0fb20] __schedule at ffffffff95aa4d25 #1 [ffff98c39ea0fb88] schedule at ffffffff95aa52a9 #2 [ffff98c39ea0fb98] schedule_timeout at ffffffff95aa2cf4 #3 [ffff98c39ea0fc40] rpc_shutdown_client at ffffffffc0608090 [sunrpc] #4 [ffff98c39ea0fca8] nfsd4_process_cb_update at ffffffffc06bab4e [nfsd] #5 [ffff98c39ea0fdf8] nfsd4_run_cb_work at ffffffffc06bad99 [nfsd] #6 [ffff98c39ea0fe20] process_one_work at ffffffff954a874a #7 [ffff98c39ea0fe68] worker_thread at ffffffff954a97e6 #8 [ffff98c39ea0fec8] kthread at ffffffff954b093f #9 [ffff98c39ea0ff50] ret_from_fork at ffffffff95ab0d18 Let's find the nfsd4_callback it was operating on: crash> dis -lr ffffffffc06bad99 ... /usr/src/debug/kernel-3.10.0-665.el7/linux-3.10.0-665.el7.x86_64/fs/nfsd/nfs4callback.c: 1113 0xffffffffc06bad91 <nfsd4_run_cb_work+0x91>: mov %r13,%rdi 0xffffffffc06bad94 <nfsd4_run_cb_work+0x94>: callq 0xffffffffc06bab10 <nfsd4_process_cb_update> /usr/src/debug/kernel-3.10.0-665.el7/linux-3.10.0-665.el7.x86_64/fs/nfsd/nfs4callback.c: 1115 0xffffffffc06bad99 <nfsd4_run_cb_work+0x99>: mov 0x2c0(%r12),%rdi crash> dis -l nfsd4_process_cb_update /usr/src/debug/kernel-3.10.0-665.el7/linux-3.10.0-665.el7.x86_64/fs/nfsd/nfs4callback.c: 1050 0xffffffffc06bab10 <nfsd4_process_cb_update>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xffffffffc06bab15 <nfsd4_process_cb_update+0x5>: push %rbp 0xffffffffc06bab16 <nfsd4_process_cb_update+0x6>: mov %rsp,%rbp 0xffffffffc06bab19 <nfsd4_process_cb_update+0x9>: push %r14 0xffffffffc06bab1b <nfsd4_process_cb_update+0xb>: push %r13 ... crash> bt -f PID: 7038 TASK: ffff98c3dece2f10 CPU: 0 COMMAND: "kworker/u4:2" ... ffff98c39ea0fde0: ffff98c39f3f82d0 ffff98c3e74cf300 ffff98c39ea0fdf0: ffff98c39ea0fe18 ffffffffc06bad99 #5 [ffff98c39ea0fdf8] nfsd4_run_cb_work at ffffffffc06bad99 [nfsd] ... crash> nfsd4_callback ffff98c39f3f82d0 struct nfsd4_callback { cb_clp = 0xffff98c39f3f8000, cb_minorversion = 0x0, cb_msg = { rpc_proc = 0xffffffffc06cd220 <nfs4_cb_procedures>, rpc_argp = 0xffff98c39f3f82d0, rpc_resp = 0xffff98c39f3f82d0, rpc_cred = 0x0 }, cb_ops = 0x0, cb_work = { data = { counter = 0x80 }, entry = { next = 0xffff98c39f3f8310, prev = 0xffff98c39f3f8310 }, func = 0xffffffffc06bad00 <nfsd4_run_cb_work> }, cb_seq_status = 0x1, cb_status = 0x0, cb_need_restart = 0x0 } That nfs4_client has 1 delegation on the dl_perclnt list and a slew of revoked delegations: crash> px &((struct nfs4_client *)0xffff98c39f3f8000)->cl_delegations $9 = (struct list_head *) 0xffff98c39f3f8068 crash> list -H 0xffff98c39f3f8068 -o nfs4_delegation.dl_perclnt ffff98c3d1995d20 crash> px &((struct nfs4_client *)0xffff98c39f3f8000)->cl_revoked $10 = (struct list_head *) 0xffff98c39f3f8078 crash> list -H 0xffff98c39f3f8078 -o nfs4_delegation.dl_recall_lru | wc -l 36 Since we were calling rpc_shutdown_client() (which calls rpc_killall_tasks()), let's have a look the client's rpc_tasks: crash> nfs4_client.cl_cb_client 0xffff98c39f3f8000 cl_cb_client = 0xffff98c3fa14ee00 crash> px &((struct rpc_clnt *)0xffff98c3fa14ee00)->cl_tasks $11 = (struct list_head *) 0xffff98c3fa14ee18 crash> list -H 0xffff98c3fa14ee18 -o rpc_task.tk_task ffff98c39b860400 ffff98c3c8137100 ffff98c3c8137800 ffff98c3b5ed9f00 ffff98c39e915100 ffff98c3a01dd000 crash> The first rpc_task on the list is the same as the one being executed by the panic task! Also, it doesn't look like the client had expired: crash> nfs4_client.cl_time 0xffff98c39f3f8000 cl_time = 0x591d3adc crash> p nfsd_net_id nfsd_net_id = $14 = 0x4 crash> p &init_net $15 = (struct net *) 0xffffffff95ed0940 crash> p ((struct net *) 0xffffffff95ed0940)->gen $16 = (struct net_generic *) 0xffff98c3fc7b6500 crash> p ((struct net_generic *) 0xffff98c3fc7b6500)->ptr[3] $18 = (void *) 0xffff98c3b65eb200 crash> nfsd_net.nfsd4_lease 0xffff98c3b65eb200 nfsd4_lease = 0x5a crash> p &timekeeper $21 = (struct timekeeper *) 0xffffffff9628a7a0 crash> p ((struct timekeeper *) 0xffffffff9628a7a0)->xtime_sec $22 = 0x591d3adc In fact, it looks like the client had just been renewed. So maybe the key to reproducing this isn't to try to force a client with a bunch of delegations to expire, but instead just mess with the callback thread... I looked at the other vmcores in this bug. Here's what I found (still no luck reproducing):
861164208
=========
- kworker pid 4923 was in rpc_shutdown_client/rpc_killall_tasks (via nfsd4_process_cb_update)
- the nfs4_client was 0xffff880236208000
- NFSD4_CLIENT_CB_KILL was set in the client's cl_flags
- the first rpc_task on the nfs4_client->cl_cb_client->cl_tasks list was 0xffff88005299eb00,
which was the rpc_task being executed by the panic task (kworker pid 915)
- the client was being destroyed by the nfsd pid 1650 which was handling a CREATE_SESSION,
so it looks like a client reboot triggered this
687329560
=========
- kworker pid 19647 was in rpc_shutdown_client/rpc_killall_tasks (via nfsd4_process_cb_update)
- the nfs4_client was 0xffff880065867000
- NFSD4_CLIENT_CB_KILL was set in the client's cl_flags
- the first rpc_task on the nfs4_client->cl_cb_client->cl_tasks list was 0xffff88003643ff00,
which was the rpc_task being executed by the panic task (kworker pid 23387)
- the client was being destroyed by the nfsd pid 1673 which was handling a CREATE_SESSION,
so it looks like a client reboot triggered this
604778343
=========
- kworker pid 16824 was in rpc_shutdown_client/rpc_killall_tasks (via nfsd4_process_cb_update)
- the nfs4_client was 0xffff887e577d3000
- NFSD4_CLIENT_CB_KILL was *not* set in the client's cl_flags, but NFSD4_CLIENT_CB_UPDATE *was* set
- the first rpc_task on the nfs4_client->cl_cb_client->cl_tasks list was 0xffff885e77304800,
which was the rpc_task being executed by the panic task (kworker pid 9786)
- nfsd pid 31065 was performing a nfsd4_probe_callback_sync via nfsd4_destroy_session. The sessionid
passed to nfsd4_destroy_session matches the sessionid cached in the nfs4_client:
crash> nfsd4_op.u.destroy_session 0xffff8822d43e7080
u.destroy_session = {
sessionid = {
data = "\r\n\346Yw\000\000\000f\000\000\000\000\000\000"
}
},
crash> nfs4_client.cl_cs_slot.sl_cr_ses.sessionid 0xffff887e577d3000
cl_cs_slot.sl_cr_ses.sessionid = {
data = "\r\n\346Yw\000\000\000f\000\000\000\000\000\000"
},
644096292
=========
- kworker pid 16824 was in rpc_shutdown_client/rpc_killall_tasks (via nfsd4_process_cb_update)
- the nfs4_client was 0xffff8801f331e800
- NFSD4_CLIENT_CB_KILL was *not* set in the client's cl_flags, but NFSD4_CLIENT_CB_UPDATE *was* set
- the first (actually, the only) rpc_task on the nfs4_client->cl_cb_client->cl_tasks list was 0xffff880167a28800,
which was the rpc_task being executed by the panic task (kworker pid 28481)
- this client was being destroyed by the laundromat thread (kworker pid 1819)
NOTE: Due to lack of reproducer, not being seen on recent kernels (we don't have any vmcore from a kernel past RHEL7.5 - 862*el7), this may not be looked at further, but may be closed INSUFFICIENT_DATA. If you have new information or feel differently please speak up. Another cust affected nfs server vmcore: https://optimus.gsslab.rdu2.redhat.com/manager/450212292/ #29: CPUS: 4 DATE: Sat Jan 5 16:53:34 2019 UPTIME: 116 days, 13:33:02 LOAD AVERAGE: 0.09, 0.05, 0.05 TASKS: 469 NODENAME: gtunxlvc03565 RELEASE: 3.10.0-862.11.6.el7.x86_64 VERSION: #1 SMP Fri Aug 10 16:55:11 UTC 2018 MACHINE: x86_64 (2593 Mhz) MEMORY: 12 GB PANIC: "BUG: unable to handle kernel NULL pointer dereference at (null)" #8 [ffff9c7cfa72bc50] do_page_fault at ffffffffa0b208d5 #9 [ffff9c7cfa72bc80] page_fault at ffffffffa0b1c758 [exception RIP: unknown or invalid address] RIP: 0000000000000000 RSP: ffff9c7cfa72bd38 RFLAGS: 00010286 RAX: ffffe59e4c56e1a0 RBX: 0000000000000000 RCX: 0000000000000018 RDX: 0000000225c17d03 RSI: 0000000000000000 RDI: ffffe59e4c56e1d0 RBP: ffff9c7cfa72bd50 R8: 000000000001bb00 R9: ffffffffa0599d05 R10: 8e0ca28c7f7c0fc0 R11: ffffe59e4c56e040 R12: ffffe59e4c56e1d0 R13: ffff9c7cfa72bda0 R14: ffff9c7cf8d78f00 R15: 0000000000002000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff9c7cfa72bd38] xprt_iter_get_helper at ffffffffc051ad4d [sunrpc] #11 [ffff9c7cfa72bd58] xprt_iter_get_next at ffffffffc051b56b [sunrpc] #12 [ffff9c7cfa72bd68] rpc_run_task at ffffffffc04f57c5 [sunrpc] #13 [ffff9c7cfa72bd98] rpc_call_async at ffffffffc04f5a3f [sunrpc] #14 [ffff9c7cfa72bdf8] nfsd4_run_cb_work at ffffffffc058d67a [nfsd] #15 [ffff9c7cfa72be20] process_one_work at ffffffffa04b613f #16 [ffff9c7cfa72be68] worker_thread at ffffffffa04b71d6 #17 [ffff9c7cfa72bec8] kthread at ffffffffa04bdf21 static struct rpc_xprt *xprt_iter_get_helper(struct rpc_xprt_iter *xpi, struct rpc_xprt *(*fn)(struct rpc_xprt_iter *)) { struct rpc_xprt *ret; do { ret = fn(xpi); <<--- panic because xpi is NULL [...] crash> rpc_task_setup ffff9c7cfa72bda0 struct rpc_task_setup { task = 0x0, rpc_client = 0xffffe59e4c56e040, <<--- this should be an rpc_clnt rpc_xprt = 0x0, rpc_message = 0xffff9c7cbe7c0f98, callback_ops = 0xffffffffc0595a20 <nfsd4_cb_ops>, callback_data = 0xffff9c7cbe7c0f88, workqueue = 0x0, flags = 1537, priority = 0 '\000' } crash> rpc_clnt.cl_xpi 0xffffe59e4c56e040 cl_xpi = { xpi_xpswitch = 0xe5a, xpi_cursor = 0x1ffffffff, xpi_ops = 0xffffe59e4c56e1a0 } crash> rpc_xprt_iter_ops 0xffffe59e4c56e1a0 struct rpc_xprt_iter_ops { xpi_rewind = 0xffffe59e49422be0, xpi_xprt = 0xffffe59e4c56e1e0, xpi_next = 0x0 <<--- that's why we panic... } rpc_clnt comes from here: crash> nfs4_client.cl_cb_client ffff9c7ea464a000 cl_cb_client = 0xffffe59e4c56e040 searching for it, we can see that is not a rpc_clnt, but a page struct... crash> buffer_head ffff9c7bc2c4c3a8 struct buffer_head { b_state = 33, b_this_page = 0xffff9c7bc2c4c3a8, b_page = 0xffffe59e4c56e040, <<--- our problematic pointer b_blocknr = 7655, b_size = 4096, b_data = 0xffff9c7e95b81000 b_bdev = 0xffff9c7eae3d4440, b_end_io = 0xffffffffa0656c30, b_private = 0x0, b_assoc_buffers = { next = 0xffff9c7bc2c4c3f0, prev = 0xffff9c7bc2c4c3f0 }, b_assoc_map = 0x0, b_count = { counter = 0 } } crash> page.mapping 0xffffe59e4c56e040 mapping = 0xffff9c7d40452bf0 crash> address_space 0xffff9c7d40452bf0 struct address_space { host = 0xffff9c7d40452aa0, page_tree = { rh_reserved_height = 0, gfp_mask = 33554464, rnode = 0xffff9c7c81ab5221 [...] nrpages = 333, nrexceptional = 106, writeback_index = 0, a_ops = 0xffffffffc034b2c0 <xfs_address_space_operations>, flags = 131162, [...] } crash> tree -t radix -s page.index -N 0xffff9c7c81ab5221|paste - -|grep ffffe59e4c56e040 ffffe59e4c56e040 index = 406 If we look at the nfs4_client structure, it is not what should be; it contains a set of page structures: crash> rd ffff9c7ea464a000 0x100|cut -f2 -d ":"|awk '{print $1}'|grep ffff > pages crash> page.mapping,index < pages|paste - - mapping = 0x69656e6f7a2f6572 index = 8030610025564038766 mapping = 0xffff9c7d40452bf0 index = 385 mapping = 0xffff9c7d40452bf0 index = 386 mapping = 0xffff9c7d40452bf0 index = 387 mapping = 0xffff9c7d40452bf0 index = 388 mapping = 0xffff9c7d40452bf0 index = 389 mapping = 0xffff9c7d40452bf0 index = 390 [...] mapping = 0xffff9c7d40452bf0 index = 408 mapping = 0xffff9c7d40452bf0 index = 409 mapping = 0xffff9c7e8a2f5781 index = 34322322192 mapping = 0xffff9c7d40452bf0 index = 411 mapping = 0xffff9c7d40452bf0 index = 412 [...] mapping = 0xffff9c7d40452bf0 index = 436 mapping = 0x1b5ffff9c7d4045 index = 18446462598732840960 What is clear is that this is not a nfs_client struct, so this looks indeed a use after free... Looking also at log, we can see: crash> log|grep "idr_remove called for"|wc -l 449 This is https://bugzilla.redhat.com/show_bug.cgi?id=1448750 Another occurrence in Case 02444371 vmcore location https://optimus.gsslab.rdu2.redhat.com/manager/558794125 retrace-server-interact 558794125 crash crash> sys KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-957.1.3.el7.x86_64/vmlinux DUMPFILE: /cores/retrace/tasks/558794125/crash/vmcore [PARTIAL DUMP] CPUS: 16 DATE: Tue Aug 27 17:10:58 2019 UPTIME: 20 days, 12:47:40 LOAD AVERAGE: 1.25, 1.26, 1.31 TASKS: 1500 NODENAME: xxxxxmasked RELEASE: 3.10.0-957.1.3.el7.x86_64 VERSION: #1 SMP Thu Nov 15 17:36:42 UTC 2018 MACHINE: x86_64 (2294 Mhz) MEMORY: 256 GB PANIC: "BUG: unable to handle kernel paging request at 0000000100000018" crash> ps -S RU: 19 IN: 1481 crash> sys -t TAINTED_MASK: 0 crash> mod -t no tainted modules crash> crashed task backtrace crash> bt PID: 7021 TASK: ffff979964472080 CPU: 11 COMMAND: "kworker/u256:11" #0 [ffff97b4d9a0b8d0] machine_kexec at ffffffffbe463674 #1 [ffff97b4d9a0b930] __crash_kexec at ffffffffbe51cef2 #2 [ffff97b4d9a0ba00] crash_kexec at ffffffffbe51cfe0 #3 [ffff97b4d9a0ba18] oops_end at ffffffffbeb6c758 #4 [ffff97b4d9a0ba40] no_context at ffffffffbeb5aafe #5 [ffff97b4d9a0ba90] __bad_area_nosemaphore at ffffffffbeb5ab95 #6 [ffff97b4d9a0bae0] bad_area_nosemaphore at ffffffffbeb5ad06 #7 [ffff97b4d9a0baf0] __do_page_fault at ffffffffbeb6f6b0 #8 [ffff97b4d9a0bb60] do_page_fault at ffffffffbeb6f915 #9 [ffff97b4d9a0bb90] page_fault at ffffffffbeb6b758 [exception RIP: rpc_shutdown_client+0x4f] RIP: ffffffffc044105f RSP: ffff97b4d9a0bc48 RFLAGS: 00010246 RAX: ffff97b4d9a0bc48 RBX: 0000000100000000 RCX: ffff97b4d9a0bfd8 RDX: 0000000000000000 RSI: 8d00000000000000 RDI: 0000000100000000 RBP: ffff97b4d9a0bca0 R8: ffff97b94ce1d1a8 R9: a11a64320661d1a0 R10: a11a64320661d1a0 R11: ffffd04cbb13f200 R12: 0000000100000018 R13: ffff97b94ce1d168 R14: ffff97b4d9a0bc60 R15: ffff979964472080 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff97b4d9a0bca8] nfsd4_process_cb_update at ffffffffc04de9fe [nfsd] #11 [ffff97b4d9a0bdf8] nfsd4_run_cb_work at ffffffffc04dec54 [nfsd] #12 [ffff97b4d9a0be20] process_one_work at ffffffffbe4b9d4f #13 [ffff97b4d9a0be68] worker_thread at ffffffffbe4bade6 #14 [ffff97b4d9a0bec8] kthread at ffffffffbe4c1c31 crash> disassembly of panic context crash> dis -lr rpc_shutdown_client+0x4f /usr/src/debug/kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/net/sunrpc/clnt.c: 850 0xffffffffc0441010 <rpc_shutdown_client>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xffffffffc0441015 <rpc_shutdown_client+0x5>: push %rbp 0xffffffffc0441016 <rpc_shutdown_client+0x6>: mov %rsp,%rbp 0xffffffffc0441019 <rpc_shutdown_client+0x9>: push %r15 0xffffffffc044101b <rpc_shutdown_client+0xb>: push %r14 0xffffffffc044101d <rpc_shutdown_client+0xd>: push %r13 0xffffffffc044101f <rpc_shutdown_client+0xf>: push %r12 0xffffffffc0441021 <rpc_shutdown_client+0x11>: push %rbx 0xffffffffc0441022 <rpc_shutdown_client+0x12>: mov %rdi,%rbx 0xffffffffc0441025 <rpc_shutdown_client+0x15>: sub $0x30,%rsp 0xffffffffc0441029 <rpc_shutdown_client+0x19>: mov %gs:0x28,%rax 0xffffffffc0441032 <rpc_shutdown_client+0x22>: mov %rax,-0x30(%rbp) 0xffffffffc0441036 <rpc_shutdown_client+0x26>: xor %eax,%eax /usr/src/debug/kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/net/sunrpc/clnt.c: 851 0xffffffffc0441038 <rpc_shutdown_client+0x28>: callq 0xffffffffbeb67eb0 <_cond_resched> /usr/src/debug/kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/net/sunrpc/clnt.c: 853 0xffffffffc044103d <rpc_shutdown_client+0x2d>: testb $0x2,0x3bae8(%rip) # 0xffffffffc047cb2c <rpc_debug> 0xffffffffc0441044 <rpc_shutdown_client+0x34>: jne 0xffffffffc0441108 <rpc_shutdown_client+0xf8> /usr/src/debug/kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/net/sunrpc/clnt.c: 859 0xffffffffc044104a <rpc_shutdown_client+0x3a>: lea -0x58(%rbp),%rax 0xffffffffc044104e <rpc_shutdown_client+0x3e>: lea 0x18(%rbx),%r12 << struct rpc_clnt* was valid here I[1] /usr/src/debug/kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/arch/x86/include/asm/current.h: 14 0xffffffffc0441052 <rpc_shutdown_client+0x42>: mov %gs:0x10e80,%r15 /usr/src/debug/kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/net/sunrpc/clnt.c: 859 0xffffffffc044105b <rpc_shutdown_client+0x4b>: lea 0x18(%rax),%r14 /usr/src/debug/kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/net/sunrpc/clnt.c: 857 0xffffffffc044105f <rpc_shutdown_client+0x4f>: cmp 0x18(%rbx),%r12 << rpc_clnt* was corrupted here I[2] rpc_clnt* was valid at I[1] and later was invalid dereference at I[2] rbx: 0000000100000000 rdi: 0000000100000000 r12: 0000000100000018 kworker was executing this work queue of nfs call back rpc_shutdown_client void rpc_shutdown_client(struct rpc_clnt *clnt) { might_sleep(); dprintk_rcu("RPC: shutting down %s client for %s\n", clnt->cl_program->name, rcu_dereference(clnt->cl_xprt)->servername); while (!list_empty(&clnt->cl_tasks)) { << Crashed here rpc_killall_tasks(clnt); wait_event_timeout(destroy_wait, list_empty(&clnt->cl_tasks), 1*HZ); } rpc_release_client(clnt); } crash> struct nfsd4_callback ffff97b94ce1d168 struct nfsd4_callback { cb_clp = 0xffff97b97c5ba800, cb_minorversion = 0x1, cb_msg = { rpc_proc = 0xffffffffc04f14b8 <nfs4_cb_procedures+56>, rpc_argp = 0xffff97b94ce1d168, rpc_resp = 0xffff97b94ce1d168, rpc_cred = 0xffff97b051657e00 }, cb_ops = 0xffffffffc04e6560 <nfsd4_cb_recall_ops>, cb_work = { data = { counter = 0x2000 < }, entry = { next = 0xffff97b94ce1d1a8, prev = 0xffff97b94ce1d1a8 }, func = 0xffffffffc04debb0 <nfsd4_run_cb_work> }, cb_seq_status = 0x1, cb_status = 0x0, cb_need_restart = 0x0 } crash> work_struct ffff97b94ce1d1a0 struct work_struct { data = { counter = 0x2000 }, entry = { next = 0xffff97b94ce1d1a8, prev = 0xffff97b94ce1d1a8 }, func = 0xffffffffc04debb0 <nfsd4_run_cb_work> } Lets inspect the rpc_clnt from the struct nfs4_client and nfsd4_callback* and crash> nfs4_client.cl_cb_client 0xffff97b97c5ba800 cl_cb_client = 0x100000000 << struct rpc_clnt* invalid The rpc_clnt* is invalid and caused the panic. crash> nfs4_client.cl_cb_client 0xffff97b97c5ba800 -ox struct nfs4_client { [ffff97b97c5baac0] struct rpc_clnt *cl_cb_client; } crash> rd ffff97b97c5baac0 ffff97b97c5baac0: 0000000100000000 ........ Forgot to mention a detail that there were numerous idr_remove() errors prior to crash and in the panic dumpstack as well < .. > [1774061.570626] idr_remove called for id=1848429 which is not allocated. << [1774061.571167] CPU: 11 PID: 7021 Comm: kworker/u256:11 Kdump: loaded Not tainted 3.10.0-957.1.3.el7.x86_64 #1 [1774061.571755] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 [1774061.572328] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd] [1774061.572906] Call Trace: [1774061.573460] [<ffffffffbeb61e41>] dump_stack+0x19/0x1b [1774061.574036] [<ffffffffbe776790>] idr_remove+0x160/0x290 [1774061.574612] [<ffffffffbeb6778f>] ? __schedule+0x3ff/0x890 [1774061.575207] [<ffffffffc04d31ea>] nfs4_put_stid+0x3a/0x90 [nfsd] [1774061.575791] [<ffffffffc04d3415>] nfsd4_cb_recall_release+0x15/0x20 [nfsd] [1774061.576358] [<ffffffffc04dec7b>] nfsd4_run_cb_work+0xcb/0x110 [nfsd] [1774061.576935] [<ffffffffbe4b9d4f>] process_one_work+0x17f/0x440 [1774061.577492] [<ffffffffbe4bade6>] worker_thread+0x126/0x3c0 [1774061.578061] [<ffffffffbe4bacc0>] ? manage_workers.isra.25+0x2a0/0x2a0 [1774061.578632] [<ffffffffbe4c1c31>] kthread+0xd1/0xe0 [1774061.579172] [<ffffffffbe4c1b60>] ? insert_kthread_work+0x40/0x40 [1774061.579735] [<ffffffffbeb74c37>] ret_from_fork_nospec_begin+0x21/0x21 [1774061.580255] [<ffffffffbe4c1b60>] ? insert_kthread_work+0x40/0x40 [1774061.580888] BUG: unable to handle kernel paging request at 0000000100000018 << panic [1774061.581415] IP: [<ffffffffc044105f>] rpc_shutdown_client+0x4f/0x130 [sunrpc] [1774061.582006] PGD 8000003ec4359067 PUD 0 [1774061.582533] Oops: 0000 [#1] SMP The numerous errors in idr_remove code path also being a trigger for this scenario, within a time window of around 2 seconds 2 seconds between time stamps 1774059.791250 and 1774061.570626 in kernel buffer. crash> dmesg | grep "idr_remove called" | wc -l 178 crash> crash> dmesg | grep "idr_remove called" | tail -4 [1774061.542169] idr_remove called for id=1865728 which is not allocated. [1774061.551619] idr_remove called for id=1864098 which is not allocated. [1774061.561085] idr_remove called for id=1910621 which is not allocated. [1774061.570626] idr_remove called for id=1848429 which is not allocated. In this last case 02444371, retrace-server-interact 558794125 crash its indeed a nfs4_client what is free, not nfs4_delegation :
crash> kmem -s 0xffff97b97c5ba800
CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME
ffff977bbfc07400 2048 1635 2560 160 32k kmalloc-2048
SLAB MEMORY NODE TOTAL ALLOCATED FREE
ffffd04cbcf16e00 ffff97b97c5b8000 0 16 3 13
FREE / [ALLOCATED]
ffff97b97c5ba800 (cpu 4 cache)
// callback/delegation we are dealing with when we hit the panic :
nfsd4_callback -> ffff97b94ce1d168
nfs4_delegation -> ffff97b94ce1d0e0
crash> kmem -s ffff97b94ce1d0e0
CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME
ffff97b97ce20400 240 9121 18428 271 16k nfsd4_delegations
SLAB MEMORY NODE TOTAL ALLOCATED FREE
ffffd04cbc338700 ffff97b94ce1c000 0 68 30 38
FREE / [ALLOCATED]
[ffff97b94ce1d0e0]
crash> nfs4_client.cl_stateids.layers 0xffff97b97c5ba800
cl_stateids.layers = 0,
crash> nfs4_delegation.dl_stid.sc_stateid.si_opaque.so_id ffff97b94ce1d0e0
dl_stid.sc_stateid.si_opaque.so_id = 1919585
// looks that's that's why we get warnings on idr_remove
// however not sure why we didn't panic on previous warnings
// there's still a big amount of delayed deleg. callback pending :
crash> crashinfo --workqueues
-----------------------WorkQueues - Active only-----------------------
-----nfsd4_callbacks----- <struct workqueue_struct 0xffff97b7f2f67c00>
<struct pool_workqueue 0xffff97b96e39e600> active=1 delayed=4675
<struct worker_pool 0xffff977b66d89c00> nr_workers=18 nr_idle=17
<struct worker 0xffff97b4f8f57f80> kworker/u256:11 nfsd4_run_cb_work
// all the delayed callbacks have the same cb_clp freed pointer :
crash> list -H ffff97b96e39e660 0x40 > callbacks
crash> nfsd4_callback.cb_clp < callbacks | sort | uniq -c
4675 cb_clp = 0xffff97b97c5ba800
Adding some details from an earlier issue (similar crash) I reported in BZ 1542298 (current status is duplicate of this bz) In reference to the recent vmcore mentioned in Comment #81, Comment #82 and Comment #83 of current BZ above, there was a very similar crash reported in BZ 1542298 (marked duplicate of this bz) and that occurred with similar sequence of lots of idr_remove errors prior to crash and panic context was during a nfsd callback. And the struct rpc_clnt *cl_cb_client was found invalid. If anyone wants to have a look at that vmcore its id is mentioned below for reference. (Dup bz) Bug 1542298 - Crash during kworker task executing nfsd call back - exception RIP: xprt_iter_get_next+0x9 From BZ 1542298 vmcore location: ssh optimus.gsslab.rdu2.redhat.com vmcore id: 264393541 ( retrace-server-interact 264393541 crash ) crash> bt PID: 27838 TASK: ffff8800350bdee0 CPU: 3 COMMAND: "kworker/u8:1" #0 [ffff88040d3879e0] machine_kexec at ffffffff8105c52b #1 [ffff88040d387a40] __crash_kexec at ffffffff81104a42 #2 [ffff88040d387b10] crash_kexec at ffffffff81104b30 #3 [ffff88040d387b28] oops_end at ffffffff816ad338 #4 [ffff88040d387b50] no_context at ffffffff8169d36a #5 [ffff88040d387ba0] __bad_area_nosemaphore at ffffffff8169d400 #6 [ffff88040d387be8] bad_area_nosemaphore at ffffffff8169d56a #7 [ffff88040d387bf8] __do_page_fault at ffffffff816b01fe #8 [ffff88040d387c58] trace_do_page_fault at ffffffff816b0456 #9 [ffff88040d387c98] do_async_page_fault at ffffffff816afaea #10 [ffff88040d387cb0] async_page_fault at ffffffff816ac5f8 [exception RIP: xprt_iter_get_next+9] RIP: ffffffffc0063f49 RSP: ffff88040d387d60 RFLAGS: 00010246 RAX: ffff880403bce100 RBX: ffff880403bce100 RCX: 0000000000000017 RDX: 0000000225c17d03 RSI: 0000000000006280 RDI: ffffffff00000190 RBP: ffff88040d387d60 R8: 00000000cd4b9b2d R9: ffffffff81185465 R10: dfeef653caec1660 R11: ffffffff00000000 R12: ffffffff00000000 R13: ffff88040d387da0 R14: ffff8803fd317400 R15: 0000000000000100 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #11 [ffff88040d387d68] rpc_run_task at ffffffffc003e7b5 [sunrpc] #12 [ffff88040d387d98] rpc_call_async at ffffffffc003ea2f [sunrpc] #13 [ffff88040d387df8] nfsd4_run_cb_work at ffffffffc02ac476 [nfsd] #14 [ffff88040d387e20] process_one_work at ffffffff810a882a #15 [ffff88040d387e68] worker_thread at ffffffff810a94f6 #16 [ffff88040d387ec8] kthread at ffffffff810b099f #17 [ffff88040d387f50] ret_from_fork at ffffffff816b4fd8 crash> crash> nfs4_client.cl_cb_client -ox struct nfs4_client { [0x2c0] struct rpc_clnt *cl_cb_client; crash> nfs4_client.cl_cb_client ffff8800095d0800 cl_cb_client = 0xffffffff00000000 <<----------- rpc_clnt* call back crash> pool_workqueue.pool,wq,refcnt,nr_in_flight,nr_active ffff8803fd317400 pool = 0xffff88017a1d7800 wq = 0xffff8803f3720600 refcnt = 11247 nr_in_flight = {0, 0, 0, 0, 0, 0, 0, 11245, 0, 0, 0, 0, 0, 0, 0} nr_active = 1 crash> dmesg | grep "idr_remove called for id=" | wc -l 690 crash> [3795237.582805] idr_remove called for id=2570080 which is not allocated. << [3795237.582805] CPU: 2 PID: 27838 Comm: kworker/u8:1 Not tainted 3.10.0-693.11.1.el7.x86_64 #1 [3795237.582806] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.7.5-20150310_111955-batsu 04/01/2014 [3795237.582811] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd] [3795237.582812] ffff88040d387d60 000000005bada75d ffff88040d387d48 ffffffff816a3e61 [3795237.582813] ffff88040d387da0 ffffffff81323b9a 0000000000000000 ffff8800095d0848 [3795237.582814] ffff880034acd7a0 ffff880034acd7c8 ffff8804246c1660 000000005bada75d [3795237.582814] Call Trace: [3795237.582816] [<ffffffff816a3e61>] dump_stack+0x19/0x1b [3795237.582818] [<ffffffff81323b9a>] idr_remove.part.6+0x24a/0x260 [3795237.582820] [<ffffffff81323bbd>] idr_remove+0xd/0x10 [3795237.582825] [<ffffffffc02a0a4a>] nfs4_put_stid+0x3a/0x80 [nfsd] [3795237.582831] [<ffffffffc02a0c65>] nfsd4_cb_recall_release+0x15/0x20 [nfsd] [3795237.582836] [<ffffffffc02ac4cd>] nfsd4_run_cb_work+0xbd/0x100 [nfsd] [3795237.582838] [<ffffffff810a882a>] process_one_work+0x17a/0x440 [3795237.582839] [<ffffffff810a94f6>] worker_thread+0x126/0x3c0 [3795237.582841] [<ffffffff810a93d0>] ? manage_workers.isra.24+0x2a0/0x2a0 [3795237.582842] [<ffffffff810b099f>] kthread+0xcf/0xe0 [3795237.582844] [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40 [3795237.582845] [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90 [3795272.193338] [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40 [3795272.195542] BUG: unable to handle kernel paging request at ffffffff000001a0 [3795272.199648] IP: [<ffffffffc0063f49>] xprt_iter_get_next+0x9/0x40 [sunrpc] [3795272.201836] PGD 19f5067 PUD 0 [3795272.203514] Oops: 0000 [#1] SMP When I had requested customer for any observations made prior to the panic. Customer quoted text scenario: ------------------------------------------------------------ If you are looking for a way to reproduce, here is how it went down the last time: This DataStage cluster consists of one main DataStage node that serves an nfs export to sixteen DataStage compute nodes. The nfs export is mounted on the compute node as follows: masked-nfsserver:/data01 on /data01 type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.xxx.yy.222,local_lock=none,addr=10.xxx.yy.195) The compute nodes were rebooted one by one after each was patched. After six or seven of these reboots, the main DataStage node crashed. My suspicion is that the kernel panic occurred during either the mounting or unmounting of the export by the sixth or seventh compute node restart. It may be possible to reproduce by repeatedly mounting/unmounting an nfs export. Hope that helps :) ----------------------------------------------------------- couple of additional observations from latest vmcore 558794125
vast majority of delegations are revoked :
crash> nfs4_delegation.dl_stid.sc_type < delegations | sort | uniq -c
4201 dl_stid.sc_type = 16 '\020', ----------------------------> NFS4_REVOKED_DELEG_STID
474 dl_stid.sc_type = 32 ' ', ----------------------------> NFS4_CLOSED_DELEG_STID
Meaning that it was during nfs4_laundromat -> revoke_delegation (under state_lock)
Looks enqueuing of callbacks raced with __destroy_client
I can see state_lock is dropped in __destroy_client as soon as we built the list of delegations
to reap, but is it possible they they can be hashed after that ? because laundromat finds
them in nn->del_recall_lru under state_lock a bit later.
BTW, looks nfsd4_shutdown_callback() must be called with state_lock held :
1079 /* must be called under the state lock */
1080 void nfsd4_shutdown_callback(struct nfs4_client *clp)
but looks we drop it during __destroy_client before calling it
1886 static void
1887 __destroy_client(struct nfs4_client *clp)
1888 {
1889 int i;
1890 struct nfs4_openowner *oo;
1891 struct nfs4_delegation *dp;
1892 struct list_head reaplist;
1893
1894 INIT_LIST_HEAD(&reaplist);
1895 spin_lock(&state_lock);
...
1901 spin_unlock(&state_lock);
...
1930 nfsd4_shutdown_callback(clp);
I don't see something taking it between 1901-1930, of course i might have missed something
I'm still catching up with your work on comment 89, but I wonder if this is the same race that Trond spotted at bakeathon last week: http://lkml.kernel.org/r/20191015154724.50988-1-trond.myklebust@hammerspace.com Created attachment 1650471 [details]
PATCH to fix this bug - applies cleanly on kernel-3.10.0-1117.el7
Created attachment 1650472 [details]
PATCH to fix this bug - applies cleanly on kernel-3.10.0-1117.el7
Bruce I looked at the patch from comment #90 which is now merged upstream. It needed one other prior patch to backport cleanly, and both apply on kernel-3.10.0-1117.el7. Do you think these can be posted for 7.9 or do we need some other step such as a reproducer or customer hitting the problem being willing to validate the fix with a test kernel? Others with more detailed knowledge of this bug and/or attached cases feel free to comment. (In reply to Dave Wysochanski from comment #94) > Bruce I looked at the patch from comment #90 which is now merged upstream. > It needed one other prior patch to backport cleanly, and both apply on > kernel-3.10.0-1117.el7. > Do you think these can be posted for 7.9 or do we need some other step such > as a reproducer or customer hitting the problem being willing to validate > the fix with a test kernel? After applying that patch upstream I was seeing an intermittent failure to send callbacks; symptoms were a hang in xfstests due to the server failing to send an OFFLOAD_DONE callback when a copy finished. I'd like to understand what was going on there before backporting this. (In reply to J. Bruce Fields from comment #95) > After applying that patch upstream I was seeing an intermittent failure to > send callbacks; symptoms were a hang in xfstests due to the server failing > to send an OFFLOAD_DONE callback when a copy finished. > > I'd like to understand what was going on there before backporting this. OK, apologies, after further efforts, I'm not able to reproduce that intermittent failure; at this point I've got to admit I may have just been confused, and all that was needed was the fix from http://lore.kernel.org/r/20191107222712.GB10806@fieldses.org, which is upstream. So, we should probably take: 12357f1b2c8e "nfsd: minor 4.1 callback cleanup" 2bbfed98a4d8 "nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()" Patch(es) committed on kernel-3.10.0-1139.el7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4060 |