Bug 1448750 - BUG: unable to handle kernel paging request at 0; IP: [<ffffffffc05ae76b>] nfsd4_cb_done+0x2b/0x310 [nfsd]
Summary: BUG: unable to handle kernel paging request at 0; IP: [<ffffffffc05ae76b>] nf...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.4
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: J. Bruce Fields
QA Contact: JianHong Yin
URL:
Whiteboard:
: 1542298 (view as bug list)
Depends On:
Blocks: 1456511 1477664 1469559 1546181
TreeView+ depends on / blocked
 
Reported: 2017-05-08 01:35 UTC by JianHong Yin
Modified: 2021-09-09 12:17 UTC (History)
27 users (show)

Fixed In Version: kernel-3.10.0-1139.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-29 20:59:41 UTC
Target Upstream Version:


Attachments (Terms of Use)
PATCH to fix this bug - applies cleanly on kernel-3.10.0-1117.el7 (2.74 KB, application/octet-stream)
2020-01-07 16:18 UTC, Dave Wysochanski
no flags Details
PATCH to fix this bug - applies cleanly on kernel-3.10.0-1117.el7 (2.82 KB, application/octet-stream)
2020-01-07 16:18 UTC, Dave Wysochanski
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5444681 0 None None None 2020-10-07 06:27:49 UTC
Red Hat Product Errata RHSA-2020:4060 0 None None None 2020-09-29 21:00:31 UTC

Description JianHong Yin 2017-05-08 01:35:42 UTC
Description of problem:

nfsd crash during lustre racer test:
'''
[ 7287.544734] BUG: unable to handle kernel paging request at 0000000100000190 
[ 7287.545792] IP: [<ffffffffc05ae76b>] nfsd4_cb_done+0x2b/0x310 [nfsd] 
[ 7287.546494] PGD 0  
[ 7287.546721] Oops: 0000 [#1] SMP  
[ 7287.547078] Modules linked in: nfsv3 ext4 mbcache jbd2 loop rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd nfsd virtio_balloon joydev i2c_piix4 pcspkr i2c_core parport_pc parport auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi ata_piix virtio_blk libata virtio_net virtio_console crct10dif_pclmul crct10dif_common crc32c_intel serio_raw floppy virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod 
[ 7287.552569] CPU: 1 PID: 20328 Comm: kworker/1:2 Not tainted 3.10.0-663.el7.x86_64 #1 
[ 7287.553347] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 
[ 7287.553936] Workqueue: rpciod rpc_async_schedule [sunrpc] 
[ 7287.554493] task: ffff9b7287bdaf10 ti: ffff9b72557f8000 task.ti: ffff9b72557f8000 
[ 7287.555230] RIP: 0010:[<ffffffffc05ae76b>]  [<ffffffffc05ae76b>] nfsd4_cb_done+0x2b/0x310 [nfsd] 
[ 7287.556223] RSP: 0018:ffff9b72557fbd60  EFLAGS: 00010246 
[ 7287.556759] RAX: ffffffffc05ae740 RBX: ffff9b72eb251448 RCX: dead000000000200 
[ 7287.557477] RDX: 0000000000000001 RSI: ffff9b72eb251448 RDI: ffff9b7169d35300 
[ 7287.558190] RBP: ffff9b72557fbd88 R08: ffff9b7169d35348 R09: 9239f4241cd35340 
[ 7287.558903] R10: 9239f4241cd35340 R11: fffff5c388ac8a80 R12: ffff9b7169d35300 
[ 7287.559620] R13: 0000000100000000 R14: 0000000100000000 R15: 0000000000000040 
[ 7287.560338] FS:  0000000000000000(0000) GS:ffff9b72ffd00000(0000) knlGS:0000000000000000 
[ 7287.561142] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[ 7287.561720] CR2: 0000000100000190 CR3: 0000000234aa7000 CR4: 00000000001406e0 
[ 7287.562449] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[ 7287.563162] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
[ 7287.563875] Stack: 
[ 7287.564083]  ffff9b7169d35300 ffff9b72e8b75200 ffffffffc05029c0 ffffffffc05029c0 
[ 7287.564951]  0000000000000040 ffff9b72557fbda0 ffffffffc05029ee ffff9b7169d35300 
[ 7287.565741]  ffff9b72557fbe08 ffffffffc0504007 000000000001a656 ffff9b7169d35330 
[ 7287.566528] Call Trace: 
[ 7287.566794]  [<ffffffffc05029c0>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc] 
[ 7287.567509]  [<ffffffffc05029c0>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc] 
[ 7287.568219]  [<ffffffffc05029ee>] rpc_exit_task+0x2e/0x90 [sunrpc] 
[ 7287.568846]  [<ffffffffc0504007>] __rpc_execute+0x97/0x410 [sunrpc] 
[ 7287.569482]  [<ffffffffc0504392>] rpc_async_schedule+0x12/0x20 [sunrpc] 
[ 7287.570162]  [<ffffffff8b0a874a>] process_one_work+0x17a/0x440 
[ 7287.570746]  [<ffffffff8b0a97e6>] worker_thread+0x126/0x3c0 
[ 7287.571301]  [<ffffffff8b0a96c0>] ? manage_workers.isra.24+0x2a0/0x2a0 
[ 7287.571954]  [<ffffffff8b0b093f>] kthread+0xcf/0xe0 
[ 7287.572444]  [<ffffffff8b0b0870>] ? insert_kthread_work+0x40/0x40 
[ 7287.573068]  [<ffffffff8b6b0d18>] ret_from_fork+0x58/0x90 
[ 7287.573612]  [<ffffffff8b0b0870>] ? insert_kthread_work+0x40/0x40 
[ 7287.574217] Code: 0f 1f 44 00 00 55 f6 05 f7 45 f8 ff 10 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 89 f3 4c 8b 36 0f 85 40 02 00 00 4d 89 f5 <41> 8b 8d 90 01 00 00 49 8b 85 38 03 00 00 85 c9 0f 85 0f 01 00  
[ 7287.576887] RIP  [<ffffffffc05ae76b>] nfsd4_cb_done+0x2b/0x310 [nfsd] 
[ 7287.577551]  RSP <ffff9b72557fbd60> 
[ 7287.577904] CR2: 0000000100000190 
[    0.000000] Initializing cgroup subsys cpuset 
[    0.000000] Initializing cgroup subsys cpu 
'''

Version-Release number of selected component (if applicable):
kernel-3.10.0-663.el7.x86_64

How reproducible:
TBD

Steps to Reproduce:
1. run lustre racer test on nfs

Actual results:
got "Oops:" on nfs server

Expected results:
works fine

Additional info:

Comment 10 J. Bruce Fields 2017-05-24 21:45:27 UTC
0x190 is the offset of cl_minorversion in struct nfs4_client, so I think nfsd4_cb_done got called with a task with (struct nfsd4_callback *)(task->tk_calldata)->cb_clp set to 0x100000000, for some reason.

Probably the nfsd4_callback was freed and reused before nfsd4_cb_done was called.

Comment 11 J. Bruce Fields 2017-06-02 20:32:00 UTC
It's unlikely there's any pnfs involved here, and I don't believe this test is something that frequently creates and destroys nfs clients, so it's most likely that this callback was a delegation recall.  In which case the that nfsd4_callback is embedded in a struct nfsd4_delegation.  The code that creates delegations is unfortunately quite complicated, so it's believable there could be a race or other rare failure case there.  So far I haven't found one, but haven't convinced myself it's impossible either; I'll study it some more.

Comment 14 Scott Mayhew 2017-06-20 20:03:48 UTC
I took a look at the cores from comment #2 and comment #9 and they were indeed delegation recalls.  In both cores it looks like the nfsd4_callback and the nfs4_delegation that they were embedded in were both freed.  A few other interesting tidbits:

- in both cores the rpc_task.tk_status was -EIO
- in both cores the rpc_task.tk_flags had RPC_TASK_KILLED set
- in both cores nfsd was in the process of shutting down (there was one nfsd thread and it was somewhere in svc_shutdown_net)
- in the core from comment #2 the state manager thread was running, but it wasn't in the core from comment #9

Comment 15 J. Bruce Fields 2017-06-21 20:26:39 UTC
(In reply to Scott Mayhew from comment #14)
> I took a look at the cores from comment #2 and comment #9 and they were
> indeed delegation recalls.  In both cores it looks like the nfsd4_callback
> and the nfs4_delegation that they were embedded in were both freed.  A few
> other interesting tidbits:
> 
> - in both cores the rpc_task.tk_status was -EIO
> - in both cores the rpc_task.tk_flags had RPC_TASK_KILLED set
> - in both cores nfsd was in the process of shutting down (there was one nfsd
> thread and it was somewhere in svc_shutdown_net)
> - in the core from comment #2 the state manager thread was running, but it
> wasn't in the core from comment #9


Thanks, that's interesting!  I hadn't thought to look at server shutdown.  I haven't looked at the actual test--it'd be interesting to know how and when exactly the server's being shutdown in this test, and whether there might be any local or remote users of the filesystem active at the time of shutdown.

Looking at the server shutdown code I do have some delegation-related questions:

- in nfs4_state_shutdown, why do we first clear the del_recall_lru list?  I can't see what prevents more delegations from being added to that list after this point, since the add is done by work items that as far as I can see might still be running.
- in __destroy_client, we do a list_add that asumes dp->dl_recall_lru is empty, is that a safe assumption?
- I'm worried about races between callbacks and delegation destruction in __destroy_client.  Maybe the nfsd4_shutdown_callback() needs to come a lot earlier?

Anyway, I think a crash on shutdown is maybe a little less likely and a little lower impact than a crash during normal operation, so given the schedule I'm inclined to postpone.

Comment 16 J. Bruce Fields 2017-06-22 19:37:49 UTC
(In reply to J. Bruce Fields from comment #15)
> Thanks, that's interesting!  I hadn't thought to look at server shutdown.  I
> haven't looked at the actual test--it'd be interesting to know how and when
> exactly the server's being shutdown in this test, and whether there might be
> any local or remote users of the filesystem active at the time of shutdown.

Huh, no, doesn't look like there's anything interesting going on there:

  DURATION=3600 sh racer.sh $nfsmp/racer &
  DURATION=3600 sh racer.sh $nfsmp2/racer &
  DURATION=3600 sh racer.sh $nfsexpdir/racer &
  wait %1 %2 %3
  umount $nfsmp || exit 5
  umount $nfsmp2 || exit 6
  # stop nfs-server.service
  service_nfs stop || exit 7
  umount $nfsexpdir || exit 8

So somehow a delegation callback outlived the client mounts and is causing trouble on the "service nfs stop".

Comment 18 J. Bruce Fields 2017-06-23 20:00:27 UTC
(In reply to J. Bruce Fields from comment #15)
> Looking at the server shutdown code I do have some delegation-related
> questions:

But, looking back at one of the cores, it's still in svc_close_net, which is before we start tearing down the NFSv4 state.

So, there may still be a bug in that NFSv4 state shutdown, but if so it's likely not the problem here.

Comment 19 Yongcheng Yang 2017-06-26 07:12:21 UTC
(In reply to J. Bruce Fields from comment #18)
> So, there may still be a bug in that NFSv4 state shutdown, but if so it's
> likely not the problem here.

Looks like what Bug 1312002 describes.  Sorry I'm not sure about it.

Comment 27 J. Bruce Fields 2018-04-19 21:00:34 UTC
Apologies for the silence.

I spent some more time studying the relevant code to see if I could spot the bug, but no luck so far.  RPC_TASK_KILLED suggests the rpc client is being shut down.  By that point the delegations have all been unhashed, but that should have put only one of two references on the delegation we're recalling--the callback should hold the other reference.  All I can think is that there's been some reference counting error.

Comment 28 J. Bruce Fields 2018-04-19 21:03:30 UTC
I should try to run the reproducer myself.  Apologies, I don't use our test infrastructure as often as I should, could you remind me how to get the recipe with the relevant scripts?

I'd like to check that it's still reproduceable upstream.  (Probably it is--at least I can't see any relevant commits that we haven't backported.)  And then see if I can patch in some useful instrumentation.

Comment 30 J. Bruce Fields 2018-04-20 20:16:17 UTC
*** Bug 1542298 has been marked as a duplicate of this bug. ***

Comment 32 J. Bruce Fields 2018-05-02 20:05:54 UTC
I see one thing in the code that bothers me:

If the vfs_setlease() in nfs4_setlease() succeeds but then hash_delegation_locked (or the fi_had_conflict check?) fails, then I don't see anything which removes that lease.  It looks to me like the fp which the lease's fl_owner field points to can even be freed while the lease is still applied.  Once nfsd_break_deleg_cb() is called with such a lease, anything could happen, including the observed symptoms--though I'm not sure why they'd be the most likely.

Comment 33 Jeff Layton 2018-05-02 20:50:32 UTC
Agreed. That does look wrong, especially in the RHEL7 code.

It looks a bit better in upstream master, but even that looks like it has a bug. In the event that hash_delegation_locked fails there, we'll end up returning the (presumably now freed) delegation pointer (dp). We should really return ERR_PTR(status) in that case.

Comment 45 J. Bruce Fields 2018-06-20 15:27:35 UTC
With 20 jobs on 3.10.0-663.el7 I was able to reproduce twice.

So my inability to reproduce upstream might be luck or might mean that it's fixed upstream.  Running more jobs on upstream.

Comment 50 J. Bruce Fields 2018-07-10 12:31:44 UTC
My backport passes our test--but so does the redhat kernel (kernel-3.10.0-916.el7) that I based the backport on.

So either the problem was inadvertently fixed between -693 (the latest redhat kernel where we've seen this) and -916, or our reproducer is just not reliable enough.  I'm running some more tests and doing some more work on the backport.

Comment 51 J. Bruce Fields 2018-07-12 20:24:45 UTC
(In reply to J. Bruce Fields from comment #50)
> So either the problem was inadvertently fixed between -693 (the latest
> redhat kernel where we've seen this) and -916, or our reproducer is just not
> reliable enough.

Yes, I just can't reproduce it predictably enough to say for sure whether a particular patch has fixed the problem.

And I'm not figuring it out from just analyzing the code either so far.

Other eyes on the code might help.  Or core analysis, or adding some instrumentation.

A backport of possibly relevant upstream patches is in the 1448750-cb_done branch of git.engineering.redhat.com/srv/git/users/bfields/linux.git.

Comment 57 Scott Mayhew 2018-07-18 13:34:41 UTC
I poked around in my old vmcore (https://optimus.gsslab.rdu2.redhat.com/manager/897324140)
some more and found some (maybe) interesting stuff I hadn't noticed before.  I'll try not
to rehash stuff that's already been mentioned before.

Here's the panic task:

crash> bt
PID: 4646   TASK: ffff98c3e75daf10  CPU: 0   COMMAND: "kworker/0:3"
 #0 [ffff98c3e74479e0] machine_kexec at ffffffff9545c44b
 #1 [ffff98c3e7447a40] __crash_kexec at ffffffff95504a82
 #2 [ffff98c3e7447b10] crash_kexec at ffffffff95504b70
 #3 [ffff98c3e7447b28] oops_end at ffffffff95aa9078
 #4 [ffff98c3e7447b50] no_context at ffffffff95a991f4
 #5 [ffff98c3e7447ba0] __bad_area_nosemaphore at ffffffff95a9928a
 #6 [ffff98c3e7447be8] bad_area_nosemaphore at ffffffff95a993f4
 #7 [ffff98c3e7447bf8] __do_page_fault at ffffffff95aabf3e
 #8 [ffff98c3e7447c58] trace_do_page_fault at ffffffff95aac196
 #9 [ffff98c3e7447c98] do_async_page_fault at ffffffff95aab82a
#10 [ffff98c3e7447cb0] async_page_fault at ffffffff95aa8338
    [exception RIP: nfsd4_cb_done+0x2b]
    RIP: ffffffffc06ba82b  RSP: ffff98c3e7447d60  RFLAGS: 00010246
    RAX: ffffffffc06ba800  RBX: ffff98c3d1995088  RCX: dead000000000200
    RDX: 0000000000000001  RSI: ffff98c3d1995088  RDI: ffff98c39b860400
    RBP: ffff98c3e7447d88   R8: ffff98c39b860448   R9: 9cf12aab8a860440
    R10: 9cf12aab8a860440  R11: fffffd17c19d1d00  R12: ffff98c39b860400
    R13: 0000000100000000  R14: 0000000100000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#11 [ffff98c3e7447d90] rpc_exit_task at ffffffffc06129ee [sunrpc]
#12 [ffff98c3e7447da8] __rpc_execute at ffffffffc0614007 [sunrpc]
#13 [ffff98c3e7447e10] rpc_async_schedule at ffffffffc0614392 [sunrpc]
#14 [ffff98c3e7447e20] process_one_work at ffffffff954a874a
#15 [ffff98c3e7447e68] worker_thread at ffffffff954a97e6
#16 [ffff98c3e7447ec8] kthread at ffffffff954b093f
#17 [ffff98c3e7447f50] ret_from_fork at ffffffff95ab0d18

Here's the rpc_task and the nfsd4_callback that caused the panic:

crash> rpc_task.tk_calldata ffff98c39b860400
  tk_calldata = 0xffff98c3d1995088
crash> nfsd4_callback 0xffff98c3d1995088
struct nfsd4_callback {
  cb_clp = 0x100000000, 
...

Looking at the kworker threads, one of them was in nfsd4_process_cb_update():

crash> set 7038
    PID: 7038
COMMAND: "kworker/u4:2"
   TASK: ffff98c3dece2f10  [THREAD_INFO: ffff98c39ea0c000]
    CPU: 0
  STATE: TASK_UNINTERRUPTIBLE 
crash> bt
PID: 7038   TASK: ffff98c3dece2f10  CPU: 0   COMMAND: "kworker/u4:2"
 #0 [ffff98c39ea0fb20] __schedule at ffffffff95aa4d25
 #1 [ffff98c39ea0fb88] schedule at ffffffff95aa52a9
 #2 [ffff98c39ea0fb98] schedule_timeout at ffffffff95aa2cf4
 #3 [ffff98c39ea0fc40] rpc_shutdown_client at ffffffffc0608090 [sunrpc]
 #4 [ffff98c39ea0fca8] nfsd4_process_cb_update at ffffffffc06bab4e [nfsd]
 #5 [ffff98c39ea0fdf8] nfsd4_run_cb_work at ffffffffc06bad99 [nfsd]
 #6 [ffff98c39ea0fe20] process_one_work at ffffffff954a874a
 #7 [ffff98c39ea0fe68] worker_thread at ffffffff954a97e6
 #8 [ffff98c39ea0fec8] kthread at ffffffff954b093f
 #9 [ffff98c39ea0ff50] ret_from_fork at ffffffff95ab0d18

Let's find the nfsd4_callback it was operating on:

crash> dis -lr ffffffffc06bad99
...
/usr/src/debug/kernel-3.10.0-665.el7/linux-3.10.0-665.el7.x86_64/fs/nfsd/nfs4callback.c: 1113
0xffffffffc06bad91 <nfsd4_run_cb_work+0x91>:    mov    %r13,%rdi
0xffffffffc06bad94 <nfsd4_run_cb_work+0x94>:    callq  0xffffffffc06bab10 <nfsd4_process_cb_update>
/usr/src/debug/kernel-3.10.0-665.el7/linux-3.10.0-665.el7.x86_64/fs/nfsd/nfs4callback.c: 1115
0xffffffffc06bad99 <nfsd4_run_cb_work+0x99>:    mov    0x2c0(%r12),%rdi

crash> dis -l nfsd4_process_cb_update
/usr/src/debug/kernel-3.10.0-665.el7/linux-3.10.0-665.el7.x86_64/fs/nfsd/nfs4callback.c: 1050
0xffffffffc06bab10 <nfsd4_process_cb_update>:   nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffc06bab15 <nfsd4_process_cb_update+0x5>:       push   %rbp
0xffffffffc06bab16 <nfsd4_process_cb_update+0x6>:       mov    %rsp,%rbp
0xffffffffc06bab19 <nfsd4_process_cb_update+0x9>:       push   %r14
0xffffffffc06bab1b <nfsd4_process_cb_update+0xb>:       push   %r13
...

crash> bt -f
PID: 7038   TASK: ffff98c3dece2f10  CPU: 0   COMMAND: "kworker/u4:2"
...
    ffff98c39ea0fde0: ffff98c39f3f82d0 ffff98c3e74cf300 
    ffff98c39ea0fdf0: ffff98c39ea0fe18 ffffffffc06bad99 
 #5 [ffff98c39ea0fdf8] nfsd4_run_cb_work at ffffffffc06bad99 [nfsd]
...

crash> nfsd4_callback ffff98c39f3f82d0
struct nfsd4_callback {
  cb_clp = 0xffff98c39f3f8000, 
  cb_minorversion = 0x0, 
  cb_msg = {
    rpc_proc = 0xffffffffc06cd220 <nfs4_cb_procedures>, 
    rpc_argp = 0xffff98c39f3f82d0, 
    rpc_resp = 0xffff98c39f3f82d0, 
    rpc_cred = 0x0
  }, 
  cb_ops = 0x0, 
  cb_work = {
    data = {
      counter = 0x80
    }, 
    entry = {
      next = 0xffff98c39f3f8310, 
      prev = 0xffff98c39f3f8310
    }, 
    func = 0xffffffffc06bad00 <nfsd4_run_cb_work>
  }, 
  cb_seq_status = 0x1, 
  cb_status = 0x0, 
  cb_need_restart = 0x0
}

That nfs4_client has 1 delegation on the dl_perclnt list and a slew of revoked delegations:

crash> px &((struct nfs4_client *)0xffff98c39f3f8000)->cl_delegations
$9 = (struct list_head *) 0xffff98c39f3f8068
crash> list -H 0xffff98c39f3f8068 -o nfs4_delegation.dl_perclnt
ffff98c3d1995d20
crash> px &((struct nfs4_client *)0xffff98c39f3f8000)->cl_revoked
$10 = (struct list_head *) 0xffff98c39f3f8078
crash> list -H 0xffff98c39f3f8078 -o nfs4_delegation.dl_recall_lru | wc -l
36

Since we were calling rpc_shutdown_client() (which calls rpc_killall_tasks()), let's have
a look the client's rpc_tasks:

crash> nfs4_client.cl_cb_client 0xffff98c39f3f8000
  cl_cb_client = 0xffff98c3fa14ee00
crash> px &((struct rpc_clnt *)0xffff98c3fa14ee00)->cl_tasks
$11 = (struct list_head *) 0xffff98c3fa14ee18
crash> list -H 0xffff98c3fa14ee18 -o rpc_task.tk_task
ffff98c39b860400
ffff98c3c8137100
ffff98c3c8137800
ffff98c3b5ed9f00
ffff98c39e915100
ffff98c3a01dd000
crash> 

The first rpc_task on the list is the same as the one being executed by the panic task!

Also, it doesn't look like the client had expired:

crash> nfs4_client.cl_time 0xffff98c39f3f8000
  cl_time = 0x591d3adc
crash> p nfsd_net_id
nfsd_net_id = $14 = 0x4
crash> p &init_net
$15 = (struct net *) 0xffffffff95ed0940
crash> p ((struct net *) 0xffffffff95ed0940)->gen
$16 = (struct net_generic *) 0xffff98c3fc7b6500
crash> p ((struct net_generic *) 0xffff98c3fc7b6500)->ptr[3]
$18 = (void *) 0xffff98c3b65eb200
crash> nfsd_net.nfsd4_lease 0xffff98c3b65eb200
  nfsd4_lease = 0x5a

crash> p &timekeeper
$21 = (struct timekeeper *) 0xffffffff9628a7a0
crash> p ((struct timekeeper *) 0xffffffff9628a7a0)->xtime_sec
$22 = 0x591d3adc

In fact, it looks like the client had just been renewed.

So maybe the key to reproducing this isn't to try to force a client with a bunch of delegations
to expire, but instead just mess with the callback thread...

Comment 58 Scott Mayhew 2018-07-20 16:05:40 UTC
I looked at the other vmcores in this bug.  Here's what I found (still no luck reproducing):

861164208
=========

- kworker pid 4923 was in rpc_shutdown_client/rpc_killall_tasks (via nfsd4_process_cb_update) 
- the nfs4_client was 0xffff880236208000
- NFSD4_CLIENT_CB_KILL was set in the client's cl_flags
- the first rpc_task on the nfs4_client->cl_cb_client->cl_tasks list was 0xffff88005299eb00, 
  which was the rpc_task being executed by the panic task (kworker pid 915)
- the client was being destroyed by the nfsd pid 1650 which was handling a CREATE_SESSION, 
  so it looks like a client reboot triggered this 

687329560
=========

- kworker pid 19647 was in rpc_shutdown_client/rpc_killall_tasks (via nfsd4_process_cb_update) 
- the nfs4_client was 0xffff880065867000
- NFSD4_CLIENT_CB_KILL was set in the client's cl_flags
- the first rpc_task on the nfs4_client->cl_cb_client->cl_tasks list was 0xffff88003643ff00, 
  which was the rpc_task being executed by the panic task (kworker pid 23387)
- the client was being destroyed by the nfsd pid 1673 which was handling a CREATE_SESSION, 
  so it looks like a client reboot triggered this

604778343
=========

- kworker pid 16824 was in rpc_shutdown_client/rpc_killall_tasks (via nfsd4_process_cb_update) 
- the nfs4_client was 0xffff887e577d3000
- NFSD4_CLIENT_CB_KILL was *not* set in the client's cl_flags, but NFSD4_CLIENT_CB_UPDATE *was* set
- the first rpc_task on the nfs4_client->cl_cb_client->cl_tasks list was 0xffff885e77304800, 
  which was the rpc_task being executed by the panic task (kworker pid 9786)
- nfsd pid 31065 was performing a nfsd4_probe_callback_sync via nfsd4_destroy_session.  The sessionid 
  passed to nfsd4_destroy_session matches the sessionid cached in the nfs4_client:

crash> nfsd4_op.u.destroy_session 0xffff8822d43e7080
  u.destroy_session = {
    sessionid = {
      data = "\r\n\346Yw\000\000\000f\000\000\000\000\000\000"
    }
  },

crash> nfs4_client.cl_cs_slot.sl_cr_ses.sessionid 0xffff887e577d3000
  cl_cs_slot.sl_cr_ses.sessionid = {
    data = "\r\n\346Yw\000\000\000f\000\000\000\000\000\000"
  },


644096292
=========

- kworker pid 16824 was in rpc_shutdown_client/rpc_killall_tasks (via nfsd4_process_cb_update) 
- the nfs4_client was 0xffff8801f331e800
- NFSD4_CLIENT_CB_KILL was *not* set in the client's cl_flags, but NFSD4_CLIENT_CB_UPDATE *was* set
- the first (actually, the only) rpc_task on the nfs4_client->cl_cb_client->cl_tasks list was 0xffff880167a28800,
  which was the rpc_task being executed by the panic task (kworker pid 28481)
- this client was being destroyed by the laundromat thread (kworker pid 1819)

Comment 71 Dave Wysochanski 2018-11-16 21:19:37 UTC
NOTE: Due to lack of reproducer, not being seen on recent kernels (we don't have any vmcore from a kernel past RHEL7.5 - 862*el7), this may not be looked at further, but may be closed INSUFFICIENT_DATA.  If you have new information or feel differently please speak up.

Comment 75 Gonzalo Siero 2019-01-15 15:08:05 UTC
Another cust affected

nfs server vmcore: https://optimus.gsslab.rdu2.redhat.com/manager/450212292/

#29:

        CPUS: 4
        DATE: Sat Jan  5 16:53:34 2019
      UPTIME: 116 days, 13:33:02
LOAD AVERAGE: 0.09, 0.05, 0.05
       TASKS: 469
    NODENAME: gtunxlvc03565
     RELEASE: 3.10.0-862.11.6.el7.x86_64
     VERSION: #1 SMP Fri Aug 10 16:55:11 UTC 2018
     MACHINE: x86_64  (2593 Mhz)
      MEMORY: 12 GB
       PANIC: "BUG: unable to handle kernel NULL pointer dereference at           (null)"



 #8 [ffff9c7cfa72bc50] do_page_fault at ffffffffa0b208d5
 #9 [ffff9c7cfa72bc80] page_fault at ffffffffa0b1c758
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff9c7cfa72bd38  RFLAGS: 00010286
    RAX: ffffe59e4c56e1a0  RBX: 0000000000000000  RCX: 0000000000000018
    RDX: 0000000225c17d03  RSI: 0000000000000000  RDI: ffffe59e4c56e1d0
    RBP: ffff9c7cfa72bd50   R8: 000000000001bb00   R9: ffffffffa0599d05
    R10: 8e0ca28c7f7c0fc0  R11: ffffe59e4c56e040  R12: ffffe59e4c56e1d0
    R13: ffff9c7cfa72bda0  R14: ffff9c7cf8d78f00  R15: 0000000000002000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff9c7cfa72bd38] xprt_iter_get_helper at ffffffffc051ad4d [sunrpc]
#11 [ffff9c7cfa72bd58] xprt_iter_get_next at ffffffffc051b56b [sunrpc]
#12 [ffff9c7cfa72bd68] rpc_run_task at ffffffffc04f57c5 [sunrpc]
#13 [ffff9c7cfa72bd98] rpc_call_async at ffffffffc04f5a3f [sunrpc]
#14 [ffff9c7cfa72bdf8] nfsd4_run_cb_work at ffffffffc058d67a [nfsd]
#15 [ffff9c7cfa72be20] process_one_work at ffffffffa04b613f
#16 [ffff9c7cfa72be68] worker_thread at ffffffffa04b71d6
#17 [ffff9c7cfa72bec8] kthread at ffffffffa04bdf21

static
struct rpc_xprt *xprt_iter_get_helper(struct rpc_xprt_iter *xpi,
                struct rpc_xprt *(*fn)(struct rpc_xprt_iter *))
{
        struct rpc_xprt *ret;

        do {
                ret = fn(xpi); <<--- panic because xpi is NULL
[...]

crash> rpc_task_setup ffff9c7cfa72bda0
struct rpc_task_setup {
  task = 0x0, 
  rpc_client = 0xffffe59e4c56e040, <<--- this should be an rpc_clnt
  rpc_xprt = 0x0, 
  rpc_message = 0xffff9c7cbe7c0f98, 
  callback_ops = 0xffffffffc0595a20 <nfsd4_cb_ops>, 
  callback_data = 0xffff9c7cbe7c0f88, 
  workqueue = 0x0, 
  flags = 1537, 
  priority = 0 '\000'
}

crash> rpc_clnt.cl_xpi 0xffffe59e4c56e040
  cl_xpi = {
    xpi_xpswitch = 0xe5a, 
    xpi_cursor = 0x1ffffffff, 
    xpi_ops = 0xffffe59e4c56e1a0
  }

crash> rpc_xprt_iter_ops 0xffffe59e4c56e1a0
struct rpc_xprt_iter_ops {
  xpi_rewind = 0xffffe59e49422be0, 
  xpi_xprt = 0xffffe59e4c56e1e0, 
  xpi_next = 0x0 <<--- that's why we panic...
}

rpc_clnt comes from here:

crash> nfs4_client.cl_cb_client ffff9c7ea464a000
  cl_cb_client = 0xffffe59e4c56e040

searching for it, we can see that is not a rpc_clnt, but a page struct...

crash> buffer_head ffff9c7bc2c4c3a8
struct buffer_head {
  b_state = 33, 
  b_this_page = 0xffff9c7bc2c4c3a8, 
  b_page = 0xffffe59e4c56e040,  <<--- our problematic pointer
  b_blocknr = 7655, 
  b_size = 4096, 
  b_data = 0xffff9c7e95b81000 
  b_bdev = 0xffff9c7eae3d4440, 
  b_end_io = 0xffffffffa0656c30, 
  b_private = 0x0, 
  b_assoc_buffers = {
    next = 0xffff9c7bc2c4c3f0, 
    prev = 0xffff9c7bc2c4c3f0
  }, 
  b_assoc_map = 0x0, 
  b_count = {
    counter = 0
  }
}

crash> page.mapping 0xffffe59e4c56e040
  mapping = 0xffff9c7d40452bf0
crash> address_space 0xffff9c7d40452bf0
struct address_space {
  host = 0xffff9c7d40452aa0, 
  page_tree = {
    rh_reserved_height = 0, 
    gfp_mask = 33554464, 
    rnode = 0xffff9c7c81ab5221
[...]
  nrpages = 333, 
  nrexceptional = 106, 
  writeback_index = 0, 
  a_ops = 0xffffffffc034b2c0 <xfs_address_space_operations>, 
  flags = 131162, 
[...]
}

crash> tree -t radix -s page.index -N 0xffff9c7c81ab5221|paste - -|grep ffffe59e4c56e040
ffffe59e4c56e040	      index = 406

If we look at the nfs4_client structure, it is not what should be; it contains a set of page structures:


crash> rd ffff9c7ea464a000 0x100|cut -f2 -d ":"|awk '{print $1}'|grep ffff > pages

crash>  page.mapping,index  < pages|paste - -
  mapping = 0x69656e6f7a2f6572	      index = 8030610025564038766
  mapping = 0xffff9c7d40452bf0	      index = 385
  mapping = 0xffff9c7d40452bf0	      index = 386
  mapping = 0xffff9c7d40452bf0	      index = 387
  mapping = 0xffff9c7d40452bf0	      index = 388
  mapping = 0xffff9c7d40452bf0	      index = 389
  mapping = 0xffff9c7d40452bf0	      index = 390
[...]
  mapping = 0xffff9c7d40452bf0	      index = 408
  mapping = 0xffff9c7d40452bf0	      index = 409
  mapping = 0xffff9c7e8a2f5781	      index = 34322322192
  mapping = 0xffff9c7d40452bf0	      index = 411
  mapping = 0xffff9c7d40452bf0	      index = 412
[...]
  mapping = 0xffff9c7d40452bf0	      index = 436
  mapping = 0x1b5ffff9c7d4045	      index = 18446462598732840960

What is clear is that this is not a nfs_client struct, so this looks indeed a use after free...


Looking also at log, we can see:

crash> log|grep "idr_remove called for"|wc -l
449


This is https://bugzilla.redhat.com/show_bug.cgi?id=1448750

Comment 81 Ronald Monthero 2019-09-03 08:20:44 UTC
Another occurrence in Case 02444371 

vmcore location https://optimus.gsslab.rdu2.redhat.com/manager/558794125
retrace-server-interact 558794125 crash 

crash> sys 
      KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-957.1.3.el7.x86_64/vmlinux
    DUMPFILE: /cores/retrace/tasks/558794125/crash/vmcore  [PARTIAL DUMP]
        CPUS: 16
        DATE: Tue Aug 27 17:10:58 2019
      UPTIME: 20 days, 12:47:40
LOAD AVERAGE: 1.25, 1.26, 1.31
       TASKS: 1500
    NODENAME: xxxxxmasked
     RELEASE: 3.10.0-957.1.3.el7.x86_64
     VERSION: #1 SMP Thu Nov 15 17:36:42 UTC 2018
     MACHINE: x86_64  (2294 Mhz)
      MEMORY: 256 GB
       PANIC: "BUG: unable to handle kernel paging request at 0000000100000018"

crash> ps -S 
  RU: 19
  IN: 1481

crash> sys -t 
TAINTED_MASK: 0  
crash> mod -t 
no tainted modules
crash>

crashed task backtrace
crash> bt 
PID: 7021   TASK: ffff979964472080  CPU: 11  COMMAND: "kworker/u256:11"
 #0 [ffff97b4d9a0b8d0] machine_kexec at ffffffffbe463674
 #1 [ffff97b4d9a0b930] __crash_kexec at ffffffffbe51cef2
 #2 [ffff97b4d9a0ba00] crash_kexec at ffffffffbe51cfe0
 #3 [ffff97b4d9a0ba18] oops_end at ffffffffbeb6c758
 #4 [ffff97b4d9a0ba40] no_context at ffffffffbeb5aafe
 #5 [ffff97b4d9a0ba90] __bad_area_nosemaphore at ffffffffbeb5ab95
 #6 [ffff97b4d9a0bae0] bad_area_nosemaphore at ffffffffbeb5ad06
 #7 [ffff97b4d9a0baf0] __do_page_fault at ffffffffbeb6f6b0
 #8 [ffff97b4d9a0bb60] do_page_fault at ffffffffbeb6f915
 #9 [ffff97b4d9a0bb90] page_fault at ffffffffbeb6b758
    [exception RIP: rpc_shutdown_client+0x4f]
    RIP: ffffffffc044105f  RSP: ffff97b4d9a0bc48  RFLAGS: 00010246
    RAX: ffff97b4d9a0bc48  RBX: 0000000100000000  RCX: ffff97b4d9a0bfd8
    RDX: 0000000000000000  RSI: 8d00000000000000  RDI: 0000000100000000
    RBP: ffff97b4d9a0bca0   R8: ffff97b94ce1d1a8   R9: a11a64320661d1a0
    R10: a11a64320661d1a0  R11: ffffd04cbb13f200  R12: 0000000100000018
    R13: ffff97b94ce1d168  R14: ffff97b4d9a0bc60  R15: ffff979964472080
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff97b4d9a0bca8] nfsd4_process_cb_update at ffffffffc04de9fe [nfsd]
#11 [ffff97b4d9a0bdf8] nfsd4_run_cb_work at ffffffffc04dec54 [nfsd]
#12 [ffff97b4d9a0be20] process_one_work at ffffffffbe4b9d4f
#13 [ffff97b4d9a0be68] worker_thread at ffffffffbe4bade6
#14 [ffff97b4d9a0bec8] kthread at ffffffffbe4c1c31
crash> 

 disassembly of panic context
crash> dis -lr rpc_shutdown_client+0x4f 
/usr/src/debug/kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/net/sunrpc/clnt.c: 850
0xffffffffc0441010 <rpc_shutdown_client>:	nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffc0441015 <rpc_shutdown_client+0x5>:	push   %rbp
0xffffffffc0441016 <rpc_shutdown_client+0x6>:	mov    %rsp,%rbp
0xffffffffc0441019 <rpc_shutdown_client+0x9>:	push   %r15
0xffffffffc044101b <rpc_shutdown_client+0xb>:	push   %r14
0xffffffffc044101d <rpc_shutdown_client+0xd>:	push   %r13
0xffffffffc044101f <rpc_shutdown_client+0xf>:	push   %r12
0xffffffffc0441021 <rpc_shutdown_client+0x11>:	push   %rbx
0xffffffffc0441022 <rpc_shutdown_client+0x12>:	mov    %rdi,%rbx
0xffffffffc0441025 <rpc_shutdown_client+0x15>:	sub    $0x30,%rsp
0xffffffffc0441029 <rpc_shutdown_client+0x19>:	mov    %gs:0x28,%rax
0xffffffffc0441032 <rpc_shutdown_client+0x22>:	mov    %rax,-0x30(%rbp)
0xffffffffc0441036 <rpc_shutdown_client+0x26>:	xor    %eax,%eax
/usr/src/debug/kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/net/sunrpc/clnt.c: 851
0xffffffffc0441038 <rpc_shutdown_client+0x28>:	callq  0xffffffffbeb67eb0 <_cond_resched>
/usr/src/debug/kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/net/sunrpc/clnt.c: 853
0xffffffffc044103d <rpc_shutdown_client+0x2d>:	testb  $0x2,0x3bae8(%rip)        # 0xffffffffc047cb2c <rpc_debug>
0xffffffffc0441044 <rpc_shutdown_client+0x34>:	jne    0xffffffffc0441108 <rpc_shutdown_client+0xf8>
/usr/src/debug/kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/net/sunrpc/clnt.c: 859
0xffffffffc044104a <rpc_shutdown_client+0x3a>:	lea    -0x58(%rbp),%rax
0xffffffffc044104e <rpc_shutdown_client+0x3e>:	lea    0x18(%rbx),%r12          << struct rpc_clnt*  was valid here    I[1]
/usr/src/debug/kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/arch/x86/include/asm/current.h: 14
0xffffffffc0441052 <rpc_shutdown_client+0x42>:	mov    %gs:0x10e80,%r15
/usr/src/debug/kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/net/sunrpc/clnt.c: 859
0xffffffffc044105b <rpc_shutdown_client+0x4b>:	lea    0x18(%rax),%r14 
/usr/src/debug/kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/net/sunrpc/clnt.c: 857
0xffffffffc044105f <rpc_shutdown_client+0x4f>:	cmp    0x18(%rbx),%r12             << rpc_clnt* was corrupted here      I[2]


rpc_clnt* was valid at I[1] and later was invalid dereference at I[2]  

rbx: 0000000100000000
rdi: 0000000100000000
r12: 0000000100000018 

kworker was executing this work queue of nfs call back  rpc_shutdown_client 

void rpc_shutdown_client(struct rpc_clnt *clnt)
{
        might_sleep();

        dprintk_rcu("RPC:       shutting down %s client for %s\n",
                        clnt->cl_program->name,
                        rcu_dereference(clnt->cl_xprt)->servername);

        while (!list_empty(&clnt->cl_tasks)) {              << Crashed here 
                rpc_killall_tasks(clnt);
                wait_event_timeout(destroy_wait,
                        list_empty(&clnt->cl_tasks), 1*HZ);
        }

        rpc_release_client(clnt);
}

crash> struct nfsd4_callback ffff97b94ce1d168 
struct nfsd4_callback {
  cb_clp = 0xffff97b97c5ba800, 
  cb_minorversion = 0x1, 
  cb_msg = {
    rpc_proc = 0xffffffffc04f14b8 <nfs4_cb_procedures+56>, 
    rpc_argp = 0xffff97b94ce1d168, 
    rpc_resp = 0xffff97b94ce1d168, 
    rpc_cred = 0xffff97b051657e00
  }, 
  cb_ops = 0xffffffffc04e6560 <nfsd4_cb_recall_ops>, 
  cb_work = {
    data = {
      counter = 0x2000   < 
    }, 
    entry = {
      next = 0xffff97b94ce1d1a8, 
      prev = 0xffff97b94ce1d1a8
    }, 
    func = 0xffffffffc04debb0 <nfsd4_run_cb_work>
  }, 
  cb_seq_status = 0x1, 
  cb_status = 0x0, 
  cb_need_restart = 0x0
}

crash> work_struct ffff97b94ce1d1a0 
struct work_struct {
  data = {
    counter = 0x2000
  }, 
  entry = {
    next = 0xffff97b94ce1d1a8, 
    prev = 0xffff97b94ce1d1a8
  }, 
  func = 0xffffffffc04debb0 <nfsd4_run_cb_work>
}


Lets inspect the rpc_clnt from the struct nfs4_client and nfsd4_callback* and 

crash> nfs4_client.cl_cb_client 0xffff97b97c5ba800
  cl_cb_client = 0x100000000                                 << struct rpc_clnt* invalid

The rpc_clnt*  is invalid and caused the panic. 

crash> nfs4_client.cl_cb_client 0xffff97b97c5ba800 -ox 
struct nfs4_client {
  [ffff97b97c5baac0] struct rpc_clnt *cl_cb_client;
}

crash> rd ffff97b97c5baac0 
ffff97b97c5baac0:  0000000100000000                    ........

Comment 82 Ronald Monthero 2019-09-03 08:23:32 UTC
Forgot to mention a detail that there were numerous idr_remove() errors prior to crash and in the panic dumpstack as well  

< .. > 
[1774061.570626] idr_remove called for id=1848429 which is not allocated.                          <<  
[1774061.571167] CPU: 11 PID: 7021 Comm: kworker/u256:11 Kdump: loaded Not tainted 3.10.0-957.1.3.el7.x86_64 #1
[1774061.571755] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[1774061.572328] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
[1774061.572906] Call Trace:
[1774061.573460]  [<ffffffffbeb61e41>] dump_stack+0x19/0x1b
[1774061.574036]  [<ffffffffbe776790>] idr_remove+0x160/0x290
[1774061.574612]  [<ffffffffbeb6778f>] ? __schedule+0x3ff/0x890
[1774061.575207]  [<ffffffffc04d31ea>] nfs4_put_stid+0x3a/0x90 [nfsd]
[1774061.575791]  [<ffffffffc04d3415>] nfsd4_cb_recall_release+0x15/0x20 [nfsd]
[1774061.576358]  [<ffffffffc04dec7b>] nfsd4_run_cb_work+0xcb/0x110 [nfsd]
[1774061.576935]  [<ffffffffbe4b9d4f>] process_one_work+0x17f/0x440
[1774061.577492]  [<ffffffffbe4bade6>] worker_thread+0x126/0x3c0
[1774061.578061]  [<ffffffffbe4bacc0>] ? manage_workers.isra.25+0x2a0/0x2a0
[1774061.578632]  [<ffffffffbe4c1c31>] kthread+0xd1/0xe0
[1774061.579172]  [<ffffffffbe4c1b60>] ? insert_kthread_work+0x40/0x40
[1774061.579735]  [<ffffffffbeb74c37>] ret_from_fork_nospec_begin+0x21/0x21
[1774061.580255]  [<ffffffffbe4c1b60>] ? insert_kthread_work+0x40/0x40
[1774061.580888] BUG: unable to handle kernel paging request at 0000000100000018  << panic
[1774061.581415] IP: [<ffffffffc044105f>] rpc_shutdown_client+0x4f/0x130 [sunrpc]
[1774061.582006] PGD 8000003ec4359067 PUD 0 
[1774061.582533] Oops: 0000 [#1] SMP 

The numerous errors in idr_remove code path also being a trigger for this scenario, within a time window of around 2 seconds 2 seconds between time stamps 1774059.791250 and 1774061.570626 in kernel buffer.

crash> dmesg | grep "idr_remove called" | wc -l 
178
crash>

crash> dmesg | grep "idr_remove called" | tail -4 
[1774061.542169] idr_remove called for id=1865728 which is not allocated.
[1774061.551619] idr_remove called for id=1864098 which is not allocated.
[1774061.561085] idr_remove called for id=1910621 which is not allocated.
[1774061.570626] idr_remove called for id=1848429 which is not allocated.

Comment 83 Roberto Bergantinos 2019-09-03 09:25:30 UTC
In this last case 02444371, retrace-server-interact 558794125 crash its indeed a nfs4_client what is free, not nfs4_delegation :

crash> kmem -s 0xffff97b97c5ba800
CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
ffff977bbfc07400     2048       1635      2560    160    32k  kmalloc-2048
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  ffffd04cbcf16e00  ffff97b97c5b8000     0     16          3    13
  FREE / [ALLOCATED]
   ffff97b97c5ba800  (cpu 4 cache)

// callback/delegation we are dealing with when we hit the panic :

nfsd4_callback  -> ffff97b94ce1d168
nfs4_delegation -> ffff97b94ce1d0e0

crash> kmem -s ffff97b94ce1d0e0
CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
ffff97b97ce20400      240       9121     18428    271    16k  nfsd4_delegations
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  ffffd04cbc338700  ffff97b94ce1c000     0     68         30    38
  FREE / [ALLOCATED]
  [ffff97b94ce1d0e0]


crash> nfs4_client.cl_stateids.layers 0xffff97b97c5ba800
  cl_stateids.layers = 0,
crash> nfs4_delegation.dl_stid.sc_stateid.si_opaque.so_id ffff97b94ce1d0e0
  dl_stid.sc_stateid.si_opaque.so_id = 1919585

// looks that's that's why we get warnings on idr_remove

// however not sure why we didn't panic on previous warnings

// there's still a big amount of delayed deleg. callback pending :

crash> crashinfo --workqueues
 -----------------------WorkQueues - Active only-----------------------
   -----nfsd4_callbacks-----   <struct workqueue_struct 0xffff97b7f2f67c00>
  <struct pool_workqueue 0xffff97b96e39e600> active=1 delayed=4675
   <struct worker_pool 0xffff977b66d89c00> nr_workers=18 nr_idle=17
      <struct worker 0xffff97b4f8f57f80>  kworker/u256:11 nfsd4_run_cb_work

// all the delayed callbacks have the same cb_clp freed pointer :

crash> list -H ffff97b96e39e660 0x40 > callbacks
crash> nfsd4_callback.cb_clp < callbacks | sort | uniq -c
   4675   cb_clp = 0xffff97b97c5ba800

Comment 84 Ronald Monthero 2019-09-04 00:24:37 UTC
Adding some details from an earlier issue (similar crash) I reported in BZ 1542298 (current status is duplicate of this bz) 
In reference to the recent vmcore mentioned in Comment #81, Comment #82 and Comment #83 of current BZ above, there was a very similar crash reported in BZ 1542298 (marked duplicate of this bz) and that occurred with similar sequence of lots of idr_remove errors prior to crash and panic context was during a nfsd callback. And the struct rpc_clnt *cl_cb_client was found invalid. If anyone wants to have a look at that vmcore its id is mentioned below for reference. 

(Dup bz) Bug 1542298 - Crash during kworker task executing nfsd call back - exception RIP: xprt_iter_get_next+0x9 

From BZ 1542298
vmcore location: ssh optimus.gsslab.rdu2.redhat.com
vmcore id:  264393541   ( retrace-server-interact 264393541 crash )

crash> bt 
PID: 27838  TASK: ffff8800350bdee0  CPU: 3   COMMAND: "kworker/u8:1"
 #0 [ffff88040d3879e0] machine_kexec at ffffffff8105c52b
 #1 [ffff88040d387a40] __crash_kexec at ffffffff81104a42
 #2 [ffff88040d387b10] crash_kexec at ffffffff81104b30
 #3 [ffff88040d387b28] oops_end at ffffffff816ad338
 #4 [ffff88040d387b50] no_context at ffffffff8169d36a
 #5 [ffff88040d387ba0] __bad_area_nosemaphore at ffffffff8169d400
 #6 [ffff88040d387be8] bad_area_nosemaphore at ffffffff8169d56a
 #7 [ffff88040d387bf8] __do_page_fault at ffffffff816b01fe
 #8 [ffff88040d387c58] trace_do_page_fault at ffffffff816b0456
 #9 [ffff88040d387c98] do_async_page_fault at ffffffff816afaea
#10 [ffff88040d387cb0] async_page_fault at ffffffff816ac5f8
    [exception RIP: xprt_iter_get_next+9]
    RIP: ffffffffc0063f49  RSP: ffff88040d387d60  RFLAGS: 00010246
    RAX: ffff880403bce100  RBX: ffff880403bce100  RCX: 0000000000000017
    RDX: 0000000225c17d03  RSI: 0000000000006280  RDI: ffffffff00000190
    RBP: ffff88040d387d60   R8: 00000000cd4b9b2d   R9: ffffffff81185465
    R10: dfeef653caec1660  R11: ffffffff00000000  R12: ffffffff00000000
    R13: ffff88040d387da0  R14: ffff8803fd317400  R15: 0000000000000100
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#11 [ffff88040d387d68] rpc_run_task at ffffffffc003e7b5 [sunrpc]
#12 [ffff88040d387d98] rpc_call_async at ffffffffc003ea2f [sunrpc]
#13 [ffff88040d387df8] nfsd4_run_cb_work at ffffffffc02ac476 [nfsd]
#14 [ffff88040d387e20] process_one_work at ffffffff810a882a
#15 [ffff88040d387e68] worker_thread at ffffffff810a94f6
#16 [ffff88040d387ec8] kthread at ffffffff810b099f
#17 [ffff88040d387f50] ret_from_fork at ffffffff816b4fd8
crash> 

crash> nfs4_client.cl_cb_client -ox 
struct nfs4_client {
  [0x2c0] struct rpc_clnt *cl_cb_client;

crash> nfs4_client.cl_cb_client ffff8800095d0800
 cl_cb_client = 0xffffffff00000000           <<-----------  rpc_clnt* call back 

crash> pool_workqueue.pool,wq,refcnt,nr_in_flight,nr_active ffff8803fd317400
  pool = 0xffff88017a1d7800
  wq = 0xffff8803f3720600
  refcnt = 11247
  nr_in_flight = {0, 0, 0, 0, 0, 0, 0, 11245, 0, 0, 0, 0, 0, 0, 0} 
  nr_active = 1


crash> dmesg | grep "idr_remove called for id=" | wc -l
690
crash> 

[3795237.582805] idr_remove called for id=2570080 which is not allocated.                                 << 
[3795237.582805] CPU: 2 PID: 27838 Comm: kworker/u8:1 Not tainted 3.10.0-693.11.1.el7.x86_64 #1
[3795237.582806] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.7.5-20150310_111955-batsu 04/01/2014
[3795237.582811] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
[3795237.582812]  ffff88040d387d60 000000005bada75d ffff88040d387d48 ffffffff816a3e61
[3795237.582813]  ffff88040d387da0 ffffffff81323b9a 0000000000000000 ffff8800095d0848
[3795237.582814]  ffff880034acd7a0 ffff880034acd7c8 ffff8804246c1660 000000005bada75d
[3795237.582814] Call Trace:
[3795237.582816]  [<ffffffff816a3e61>] dump_stack+0x19/0x1b
[3795237.582818]  [<ffffffff81323b9a>] idr_remove.part.6+0x24a/0x260
[3795237.582820]  [<ffffffff81323bbd>] idr_remove+0xd/0x10
[3795237.582825]  [<ffffffffc02a0a4a>] nfs4_put_stid+0x3a/0x80 [nfsd]
[3795237.582831]  [<ffffffffc02a0c65>] nfsd4_cb_recall_release+0x15/0x20 [nfsd]
[3795237.582836]  [<ffffffffc02ac4cd>] nfsd4_run_cb_work+0xbd/0x100 [nfsd]
[3795237.582838]  [<ffffffff810a882a>] process_one_work+0x17a/0x440
[3795237.582839]  [<ffffffff810a94f6>] worker_thread+0x126/0x3c0
[3795237.582841]  [<ffffffff810a93d0>] ? manage_workers.isra.24+0x2a0/0x2a0
[3795237.582842]  [<ffffffff810b099f>] kthread+0xcf/0xe0
[3795237.582844]  [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40
[3795237.582845]  [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90
[3795272.193338]  [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40
[3795272.195542] BUG: unable to handle kernel paging request at ffffffff000001a0
[3795272.199648] IP: [<ffffffffc0063f49>] xprt_iter_get_next+0x9/0x40 [sunrpc]
[3795272.201836] PGD 19f5067 PUD 0 
[3795272.203514] Oops: 0000 [#1] SMP

Comment 86 Ronald Monthero 2019-09-04 01:53:45 UTC
When I had requested customer for any observations made prior to the panic. 

Customer quoted text scenario:
------------------------------------------------------------

If you are looking for a way to reproduce, here is how it went down the last time:

This DataStage cluster consists of one main DataStage node that serves an nfs export to sixteen DataStage compute nodes. The nfs export is mounted on the compute node as follows:

masked-nfsserver:/data01 on /data01 type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.xxx.yy.222,local_lock=none,addr=10.xxx.yy.195)

The compute nodes were rebooted one by one after each was patched. After six or seven of these reboots, the main DataStage node crashed.

My suspicion is that the kernel panic occurred during either the mounting or unmounting of the export by the sixth or seventh compute node restart. It may be possible to reproduce by repeatedly mounting/unmounting an nfs export.

Hope that helps :)
-----------------------------------------------------------

Comment 87 Roberto Bergantinos 2019-09-18 15:12:08 UTC
couple of additional observations from latest vmcore 558794125

vast majority of delegations are revoked :

crash> nfs4_delegation.dl_stid.sc_type < delegations | sort | uniq -c
   4201   dl_stid.sc_type = 16 '\020',    ----------------------------> NFS4_REVOKED_DELEG_STID
    474   dl_stid.sc_type = 32 ' ',       ----------------------------> NFS4_CLOSED_DELEG_STID

Meaning that it was during nfs4_laundromat -> revoke_delegation (under state_lock)

Looks enqueuing of callbacks raced with __destroy_client

I can see state_lock is dropped in __destroy_client as soon as we built the list of delegations
to reap, but is it possible they they can be hashed after that ? because laundromat finds
them in nn->del_recall_lru under state_lock a bit later.

BTW, looks nfsd4_shutdown_callback() must be called with state_lock held :

1079 /* must be called under the state lock */
1080 void nfsd4_shutdown_callback(struct nfs4_client *clp)

but looks we drop it during __destroy_client before calling it 

1886 static void
1887 __destroy_client(struct nfs4_client *clp)
1888 {
1889         int i;
1890         struct nfs4_openowner *oo;
1891         struct nfs4_delegation *dp;
1892         struct list_head reaplist;
1893 
1894         INIT_LIST_HEAD(&reaplist);
1895         spin_lock(&state_lock);
...
1901         spin_unlock(&state_lock);
...
1930         nfsd4_shutdown_callback(clp);

I don't see something taking it between 1901-1930, of course i might have missed something

Comment 90 J. Bruce Fields 2019-10-22 19:12:56 UTC
I'm still catching up with your work on comment 89, but I wonder if this is the same race that Trond spotted at bakeathon last week:

http://lkml.kernel.org/r/20191015154724.50988-1-trond.myklebust@hammerspace.com

Comment 92 Dave Wysochanski 2020-01-07 16:18:51 UTC
Created attachment 1650471 [details]
PATCH to fix this bug - applies cleanly on kernel-3.10.0-1117.el7

Comment 93 Dave Wysochanski 2020-01-07 16:18:54 UTC
Created attachment 1650472 [details]
PATCH to fix this bug - applies cleanly on kernel-3.10.0-1117.el7

Comment 94 Dave Wysochanski 2020-01-07 16:22:38 UTC
Bruce I looked at the patch from comment #90 which is now merged upstream.  It needed one other prior patch to backport cleanly, and both apply on kernel-3.10.0-1117.el7.
Do you think these can be posted for 7.9 or do we need some other step such as a reproducer or customer hitting the problem being willing to validate the fix with a test kernel?
Others with more detailed knowledge of this bug and/or attached cases feel free to comment.

Comment 95 J. Bruce Fields 2020-01-10 20:20:40 UTC
(In reply to Dave Wysochanski from comment #94)
> Bruce I looked at the patch from comment #90 which is now merged upstream. 
> It needed one other prior patch to backport cleanly, and both apply on
> kernel-3.10.0-1117.el7.
> Do you think these can be posted for 7.9 or do we need some other step such
> as a reproducer or customer hitting the problem being willing to validate
> the fix with a test kernel?

After applying that patch upstream I was seeing an intermittent failure to send callbacks; symptoms were a hang in xfstests due to the server failing to send an OFFLOAD_DONE callback when a copy finished.

I'd like to understand what was going on there before backporting this.

Comment 97 J. Bruce Fields 2020-02-21 17:20:36 UTC
(In reply to J. Bruce Fields from comment #95)
> After applying that patch upstream I was seeing an intermittent failure to
> send callbacks; symptoms were a hang in xfstests due to the server failing
> to send an OFFLOAD_DONE callback when a copy finished.
> 
> I'd like to understand what was going on there before backporting this.

OK, apologies, after further efforts, I'm not able to reproduce that intermittent failure; at this point I've got to admit I may have just been confused, and all that was needed was the fix from http://lore.kernel.org/r/20191107222712.GB10806@fieldses.org, which is upstream.

So, we should probably take:
12357f1b2c8e "nfsd: minor 4.1 callback cleanup"
2bbfed98a4d8 "nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()"

Comment 108 Jan Stancek 2020-05-05 19:07:27 UTC
Patch(es) committed on kernel-3.10.0-1139.el7

Comment 114 errata-xmlrpc 2020-09-29 20:59:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4060


Note You need to log in before you can comment on or make changes to this bug.