Bug 1504058
Summary: | kernel panic with nfsd while removing locks on file close | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Frank Sorenson <fsorenso> | ||||||||||||
Component: | kernel | Assignee: | Scott Mayhew <smayhew> | ||||||||||||
kernel sub component: | NFS | QA Contact: | Yongcheng Yang <yoyang> | ||||||||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||||||||
Severity: | urgent | ||||||||||||||
Priority: | urgent | CC: | bfields, bsingh, ccaulfie, dhoward, dwysocha, glamb, jaeshin, jiyin, jlayton, pdhamdhe, plambri, rbost, sbarcomb, smayhew, snishika, tthakur, xzhou, yoyang, zlang | ||||||||||||
Version: | 7.4 | Keywords: | ZStream | ||||||||||||
Target Milestone: | rc | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | kernel-3.10.0-868.el7 | Doc Type: | If docs needed, set a value | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | |||||||||||||||
: | 1628562 (view as bug list) | Environment: | |||||||||||||
Last Closed: | 2018-10-30 08:19:58 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 1477664, 1546181, 1628562 | ||||||||||||||
Attachments: |
|
Description
Frank Sorenson
2017-10-19 12:16:26 UTC
initial vmcore analysis from support engineer [ 939.900297] BUG: unable to handle kernel paging request at 0000000000001ce8 [ 939.900401] IP: [<ffffffffc03dc6b8>] nfsd4_lm_notify+0x28/0xb0 [nfsd] [ 939.900455] PGD 0 [ 939.900467] Oops: 0000 [#1] SMP [ 939.900483] Modules linked in: ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vmw_vsock_vmci_transport vsock ext4 mbcache jbd2 sb_edac edac_core coretemp iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ppdev vmw_balloon joydev pcspkr sg vmw_vmci shpchp i2c_piix4 parport_pc parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom ata_generic [ 939.900822] pata_acpi vmwgfx drm_kms_helper sd_mod crc_t10dif syscopyarea crct10dif_generic sysfillrect sysimgblt fb_sys_fops ttm ata_piix ahci drm libahci libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw vmw_pvscsi vmxnet3 i2c_core dm_mirror dm_region_hash dm_log dm_mod [ 939.900951] CPU: 6 PID: 1659 Comm: nfsd Not tainted 3.10.0-693.1.1.el7.x86_64 #1 [ 939.900978] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015 [ 939.901015] task: ffff88022abd8000 ti: ffff88022abd4000 task.ti: ffff88022abd4000 [ 939.901040] RIP: 0010:[<ffffffffc03dc6b8>] [<ffffffffc03dc6b8>] nfsd4_lm_notify+0x28/0xb0 [nfsd] [ 939.901081] RSP: 0018:ffff88022abd7a08 EFLAGS: 00010202 [ 939.901101] RAX: 0000000000000004 RBX: ffff880003d29628 RCX: dead000000000200 [ 939.901125] RDX: 0000000000001000 RSI: 0000000000000005 RDI: ffff880003d29628 [ 939.901149] RBP: ffff88022abd7a20 R08: ffff880003d29640 R09: 0000000100170002 [ 939.901174] R10: 0000000003226d01 R11: ffffea00080c8980 R12: ffff880003d29628 [ 939.901198] R13: ffff88021e7e9540 R14: ffff88021e7e9558 R15: 0000000000000001 [ 939.901224] FS: 0000000000000000(0000) GS:ffff88023fd80000(0000) knlGS:0000000000000000 [ 939.901284] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 939.901312] CR2: 0000000000001ce8 CR3: 00000000bab03000 CR4: 00000000001407e0 [ 939.901384] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 939.901441] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 939.901467] Stack: [ 939.901477] ffff880003d29640 ffff880003d29628 ffff88021e7e9540 ffff88022abd7a50 [ 939.901509] ffffffff81256f58 ffff88021e7e9540 ffff880004c14618 0000000000000001 [ 939.901538] ffff88021e7e9540 ffff88022abd7a70 ffffffff8125702c ffff88022abd7ac0 [ 939.901567] Call Trace: [ 939.901586] [<ffffffff81256f58>] locks_wake_up_blocks+0x58/0xa0 [ 939.901609] [<ffffffff8125702c>] locks_unlink_lock+0x8c/0xa0 [ 939.901631] [<ffffffff812578c7>] locks_delete_lock+0x17/0x50 [ 939.901653] [<ffffffff81258207>] __posix_lock_file+0x3a7/0x510 [ 939.901676] [<ffffffff8125870b>] vfs_lock_file+0x4b/0x50 [ 939.901696] [<ffffffff81258799>] locks_remove_posix.part.27+0x89/0xc0 [ 939.902703] [<ffffffff811de581>] ? __slab_free+0x81/0x2f0 [ 939.903652] [<ffffffff812587f0>] locks_remove_posix+0x20/0x30 [ 939.904600] [<ffffffff811fe333>] filp_close+0x53/0x80 [ 939.905550] [<ffffffffc03dd94c>] nfs4_free_lock_stateid+0x2c/0x40 [nfsd] [ 939.906489] [<ffffffffc03dd7d6>] free_ol_stateid_reaplist+0x36/0x60 [nfsd] [ 939.907404] [<ffffffffc03ddbfb>] release_openowner+0xbb/0x110 [nfsd] [ 939.908311] [<ffffffffc03ddd9b>] __destroy_client+0x12b/0x1a0 [nfsd] [ 939.909217] [<ffffffffc03dde32>] expire_client+0x22/0x30 [nfsd] [ 939.910157] [<ffffffffc03deb92>] nfsd4_create_session+0x742/0x8e0 [nfsd] [ 939.911074] [<ffffffffc03cf845>] nfsd4_proc_compound+0x3d5/0x770 [nfsd] [ 939.911982] [<ffffffffc03ba593>] nfsd_dispatch+0xd3/0x280 [nfsd] [ 939.912938] [<ffffffffc0376453>] svc_process_common+0x453/0x6f0 [sunrpc] [ 939.913832] [<ffffffffc03767f3>] svc_process+0x103/0x190 [sunrpc] [ 939.914737] [<ffffffffc03b9eff>] nfsd+0xdf/0x150 [nfsd] [ 939.915617] [<ffffffffc03b9e20>] ? nfsd_destroy+0x80/0x80 [nfsd] [ 939.916490] [<ffffffff810b098f>] kthread+0xcf/0xe0 [ 939.917307] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 [ 939.918114] [<ffffffff816b4f18>] ret_from_fork+0x58/0x90 [ 939.918876] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 [ 939.919618] Code: 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 47 28 48 89 fb 48 8b 40 20 48 8b 90 a8 04 00 00 8b 05 ca 17 02 00 85 c0 <48> 8b 92 e8 0c 00 00 74 78 3b 02 77 74 83 e8 01 48 98 4c 8b 6c [ 939.921153] RIP [<ffffffffc03dc6b8>] nfsd4_lm_notify+0x28/0xb0 [nfsd] [ 939.921881] RSP <ffff88022abd7a08> [ 939.922573] CR2: 0000000000001ce8 fs/nfsd/nfs4state.c 5417 static void 5418 nfsd4_lm_notify(struct file_lock *fl) 5419 { 5420 struct nfs4_lockowner *lo = (struct nfs4_lockowner *)fl->fl_owner; 5421 struct net *net = lo->lo_owner.so_client->net; 5422 struct nfsd_net *nn = net_generic(net, nfsd_net_id); / / include/net/netns/generic.h / / 34 static inline void *net_generic(const struct net *net, int id) 35 { 36 struct net_generic *ng; 37 void *ptr; 38 39 rcu_read_lock(); 40 ng = rcu_dereference(net->gen); <<<<<<<<<<<<<<<<<<<< Crashed. 41 BUG_ON(id == 0 || id > ng->len); 42 ptr = ng->ptr[id - 1]; 43 rcu_read_unlock(); 44 45 BUG_ON(!ptr); 46 return ptr; 47 } 5423 struct nfsd4_blocked_lock *nbl = container_of(fl, 5424 struct nfsd4_blocked_lock, nbl_lock); 5425 bool queue = false; 5426 5427 /* An empty list means that something else is going to be using it */ 5428 spin_lock(&nn->blocked_locks_lock); 5429 if (!list_empty(&nbl->nbl_list)) { 5430 list_del_init(&nbl->nbl_list); 5431 list_del_init(&nbl->nbl_lru); 5432 queue = true; 5433 } 5434 spin_unlock(&nn->blocked_locks_lock); 5435 5436 if (queue) 5437 nfsd4_run_cb(&nbl->nbl_cb); 5438 } 40 ng = rcu_dereference(net->gen); <<<<<<<<<<<<<<<<<<<< Crashed. crash> struct net.gen -ox struct net { [0xce8] struct net_generic *gen; } /usr/src/debug/kernel-3.10.0-693.1.1.el7/linux-3.10.0-693.1.1.el7.x86_64/include/net/netns/generic.h: 40 0xffffffffc03dc6b8 <nfsd4_lm_notify+40>: mov 0xce8(%rdx),%rdx struct net *net is in %rdx. Trying to copy net->gen from 0xce8(%rdx) to %rdx. but %rdx is 0x1000 which is invalid kernel address. So the system crashed. crash> px (0xce8 + 0x1000) $1 = 0x1ce8 [exception RIP: nfsd4_lm_notify+40] RIP: ffffffffc03dc6b8 RSP: ffff88022abd7a08 RFLAGS: 00010202 RAX: 0000000000000004 RBX: ffff880003d29628 RCX: dead000000000200 RDX: 0000000000001000 RSI: 0000000000000005 RDI: ffff880003d29628 RBP: ffff88022abd7a20 R8: ffff880003d29640 R9: 0000000100170002 R10: 0000000003226d01 R11: ffffea00080c8980 R12: ffff880003d29628 R13: ffff88021e7e9540 R14: ffff88021e7e9558 R15: 0000000000000001 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff88022abd7a28] locks_wake_up_blocks at ffffffff81256f58 crash> log | grep CR2 [ 939.901312] CR2: 0000000000001ce8 CR3: 00000000bab03000 CR4: 00000000001407e0 [ 939.922573] CR2: 0000000000001ce8 Checking the 1st arg passed from the previous function locks_wake_up_blocks(): crash> dis -r ffffffff81256f58|tail 0xffffffff81256f39 <locks_wake_up_blocks+57>: callq 0xffffffff81256ea0 <__locks_delete_block> 0xffffffff81256f3e <locks_wake_up_blocks+62>: mov 0x80(%rbx),%rax 0xffffffff81256f45 <locks_wake_up_blocks+69>: test %rax,%rax 0xffffffff81256f48 <locks_wake_up_blocks+72>: je 0xffffffff81256f80 <locks_wake_up_blocks+128> 0xffffffff81256f4a <locks_wake_up_blocks+74>: mov 0x10(%rax),%rax 0xffffffff81256f4e <locks_wake_up_blocks+78>: test %rax,%rax 0xffffffff81256f51 <locks_wake_up_blocks+81>: je 0xffffffff81256f80 <locks_wake_up_blocks+128> 0xffffffff81256f53 <locks_wake_up_blocks+83>: mov %r12,%rdi <<<<<<<<< 1st arg 0xffffffff81256f56 <locks_wake_up_blocks+86>: callq *%rax 0xffffffff81256f58 <locks_wake_up_blocks+88>: mov 0x18(%r13),%rbx crash> dis -r ffffffffc03dc6b8|head 0xffffffffc03dc690 <nfsd4_lm_notify>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xffffffffc03dc695 <nfsd4_lm_notify+5>: push %rbp 0xffffffffc03dc696 <nfsd4_lm_notify+6>: mov %rsp,%rbp 0xffffffffc03dc699 <nfsd4_lm_notify+9>: push %r13 0xffffffffc03dc69b <nfsd4_lm_notify+11>: push %r12 <<<<<<<<<<< 1st arg 3rd push 0xffffffffc03dc69d <nfsd4_lm_notify+13>: push %rbx 0xffffffffc03dc69e <nfsd4_lm_notify+14>: mov 0x28(%rdi),%rax 0xffffffffc03dc6a2 <nfsd4_lm_notify+18>: mov %rdi,%rbx 0xffffffffc03dc6a5 <nfsd4_lm_notify+21>: mov 0x20(%rax),%rax 0xffffffffc03dc6a9 <nfsd4_lm_notify+25>: mov 0x4a8(%rax),%rdx [exception RIP: nfsd4_lm_notify+40] RIP: ffffffffc03dc6b8 RSP: ffff88022abd7a08 RFLAGS: 00010202 RAX: 0000000000000004 RBX: ffff880003d29628 RCX: dead000000000200 RDX: 0000000000001000 RSI: 0000000000000005 RDI: ffff880003d29628 RBP: ffff88022abd7a20 R8: ffff880003d29640 R9: 0000000100170002 R10: 0000000003226d01 R11: ffffea00080c8980 R12: ffff880003d29628 R13: ffff88021e7e9540 R14: ffff88021e7e9558 R15: 0000000000000001 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 ffff88022abd7958: 0000000000000001 ffff88021e7e9558 ffff88022abd7968: ffff88021e7e9540 ffff880003d29628 ffff88022abd7978: ffff88022abd7a20 ffff880003d29628 ffff88022abd7988: ffffea00080c8980 0000000003226d01 ffff88022abd7998: 0000000100170002 ffff880003d29640 ffff88022abd79a8: 0000000000000004 dead000000000200 ffff88022abd79b8: 0000000000001000 0000000000000005 ffff88022abd79c8: ffff880003d29628 ffffffffffffffff ffff88022abd79d8: ffffffffc03dc6b8 0000000000000010 ffff88022abd79e8: 0000000000010202 ffff88022abd7a08 ffff88022abd79f8: 0000000000000018 ffff88022abd7a20 ffff88022abd7a08: ffff880003d29640 ffff880003d29628 ^ %r12 ffff88022abd7a18: ffff88021e7e9540 ffff88022abd7a50 ffff88022abd7a28: ffffffff81256f58 #10 [ffff88022abd7a28] locks_wake_up_blocks at ffffffff81256f58 The 1st arg struct file_lock *fl is 0xffff880003d29628. crash> px ((struct nfs4_lockowner *)((struct file_lock *)0xffff880003d29628)->fl_owner)->lo_owner.so_client->net $10 = (struct net *) 0x1000 So a client is getting destroyed. As part of that its locks are released. The lock code notices that there are other lockers waiting on the lock that was released. One of those is from an NFS client (probably a different one). So it calls nfsd4_lm_notify_callback to give nfsd a chance to notify the client. But the client structure representing that client is corrupted somehow. So possibly that waiting lock wasn't properly initialized, or perhaps it belonged to a client that was destroyed but for some reason it wasn't properly cleaned up. Checking with jlayton, who wrote the lock notification code, in case this is something he recognizes. No, I haven't seen this before. I wonder if this is related to bug #1448750? There was another nfsd bug out there too that looked like a double-free a while back. That might also be related. If this is reproducible, it might be nice to run the test with memory poisoning enabled on the server. It'll be slow, but it might help catch this earlier. Running it KASAN might also be a candidate if it's in RHEL7 too. If we have a reliable reproducer of some sort, it'd also be good to verify that this is not a problem upstream. New detail... this nfs server crash occurred when the customer initiated a crash on a client that was stuck in the state manager. Working on getting more details on this. I don't know if this helps, but I can reliably reproduce this on my systems with a 'flock make install' run on 8 VM nodes accessing the same NFS-mounted source tree. I'm happy to try test kernels or other things that might be helpful. For some reason it only started this morning. even though I've downgraded the server kernel. That bit I can't explain! Seems to me this is probably a duplicate of bug 1539866 where the laundromat thread was deleting blocked locks from the wrong list. Christine, what kernel version were you running before you downgraded? Was it -847 or higher? I downgraded from 851 to 845 - but I still get the problem. I set up a VM server (so it didn't regularly crash my desktop) and that's on 693 and still crashes. It just reboots a lot faster! I'm trying to roll things back bit by bit to isolate it but not having much success, annoyingly. Just to clarify for anyone watching the bug, this does not appear to be a dup of bug 1539866, because Christine was able to reproduce on -851 which has the fix for that bug. I've been unable to reproduce this myself. I used 8 clients all doing 'flock <foo> make install' of various git repos (lz4, kronosnet, corosync). I tried Fedora clients, RHEL clients, all v4.1 clients, a mix of v4.0 and v4.1 clients (so we'd have clients polling as well as waiting for callbacks), various kernel versions on the server (693, < 847, > 847). Christine, any chance you can try reproducing this with slab poisoning on the server? I think if you boot with "slub_debug=P,kmalloc-2048" in your kernel command line that'll do just the slab the nfs4_client structs are being allocated from. If you can get a core with that then maybe we'll get a clue where to look next. > crash> px ((struct nfs4_lockowner *)((struct file_lock *)0xffff880003d29628)->fl_owner)->lo_owner.so_client->net
> $10 = (struct net *) 0x1000
You may have already done this, but...
It might be good to walk that chain of pointers individually, and see if you can figure out which struct in there is corrupt. Are you sure it's just so_client that is bad? Or does the problem extend farther back?
Looking over the code, I think we might have a situation where we set a blocking lock for a client, and then have it expire. While we hold a reference to the stateowner, I think that can end up outliving the so_client so by the time we go to access the client there, it has been freed (and possibly reused). I need to look over the code more closely as it has been a while since I did this work, but I suspect that could be what's happening here. Created attachment 1408596 [details]
nfsd: remove blocked locks on client teardown
Possible patch. Needs testing at this point.
Created attachment 1408603 [details]
nfsd: remove blocked locks on client teardown
v2 -- fixed silly list handling error
Created attachment 1408809 [details]
nfsd: remove blocked locks on client teardown
Final patch attached. Seems to work correctly and posted upstream for review.
Works for me, thanks. Created attachment 1411692 [details]
pynfs test patch 1/2
Adds basic support for CB_NOTIFY_LOCK
Created attachment 1411693 [details]
pynfs test patch 2/2
Tests CB_NOTIFY_LOCK with an expired client.
The reproducer isn't perfect, but it does panic the server a lot of the time. I ran it in a loop over night and it panicked my server around 50 times between midnight and 7 AM.
I did have a bit where it creates a bunch of extra clients after the one client expires but that wasn't exactly creating the desired result. Instead of a panic, I'd get a list corruption warning, and a CB_NOTIFY_LOCK being sent with the wrong session id. So I have that part commented out.
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing Patch(es) available on kernel-3.10.0-868.el7 Moving to VERIFIED according to comment #38. Will include this case as regression test. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3083 |