Bug 2138605
Summary: | [Regression] kernel BUG at lib/list_debug.c:26! RIP: 0010:__list_add_valid.cold+0x3a/0x3c | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Zhi Li <yieli> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
kernel sub component: | NFS | QA Contact: | Zhi Li <yieli> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | unspecified | ||
Priority: | unspecified | CC: | bxue, chorn, chuck.lever, jiyin, jlayton, xzhou, yoyang |
Version: | 9.2 | Keywords: | Regression, Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | kernel-5.14.0-239.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-05-09 08:05:48 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Zhi Li
2022-10-30 04:19:40 UTC
This is one of several bugs that have cropped up around the filecache handling. I'm pursuing a set of fixes upstream that I'm hoping will fix these. Some notes. I backported a pile of patches from Chuck Lever's upstream tree that I hoped would help this, but hit a different looking list corruption when testing with NFSv3: [ 4330.196752] list_del corruption. next->prev should be ffff96a4bedc2bd0, but was ffff96a46e7aa480 crash> bt PID: 4893 TASK: ffff96a48e570000 CPU: 4 COMMAND: "nfsd" #0 [ffffba4c81c6b898] machine_kexec at ffffffff9b069d87 #1 [ffffba4c81c6b8f0] __crash_kexec at ffffffff9b1c1e0d #2 [ffffba4c81c6b9b8] crash_kexec at ffffffff9b1c2ff8 #3 [ffffba4c81c6b9c0] oops_end at ffffffff9b02827b #4 [ffffba4c81c6b9e0] do_trap at ffffffff9b024abe #5 [ffffba4c81c6ba30] do_error_trap at ffffffff9b024b75 #6 [ffffba4c81c6ba70] exc_invalid_op at ffffffff9bb116be #7 [ffffba4c81c6ba90] asm_exc_invalid_op at ffffffff9bc00af6 [exception RIP: __list_del_entry_valid.cold+0x1d] RIP: ffffffff9badbb86 RSP: ffffba4c81c6bb48 RFLAGS: 00010246 RAX: 0000000000000054 RBX: ffff96a4bedc2bd0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff96a5efb198a0 RDI: ffff96a5efb198a0 RBP: ffffffffc0d70b60 R8: 0000000000000000 R9: 00000000ffff7fff R10: ffffba4c81c6b9e8 R11: ffffffff9cde9748 R12: ffff96a4d0e34100 R13: 0000000000000000 R14: 0000000000000000 R15: ffff96a4d0e34108 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffffba4c81c6bb40] __list_del_entry_valid.cold at ffffffff9badbb86 #9 [ffffba4c81c6bb48] list_lru_del at ffffffff9b307325 #10 [ffffba4c81c6bb80] nfsd_file_lru_remove at ffffffffc0cfab72 [nfsd] #11 [ffffba4c81c6bb98] nfsd_file_unhash_and_queue at ffffffffc0cfb947 [nfsd] #12 [ffffba4c81c6bbc8] __nfsd_file_close_inode at ffffffffc0cfba1b [nfsd] #13 [ffffba4c81c6bc50] nfsd_file_close_inode at ffffffffc0cfbb48 [nfsd] #14 [ffffba4c81c6bc98] nfsd_file_fsnotify_handle_event at ffffffffc0cfbd28 [nfsd] #15 [ffffba4c81c6bcb0] send_to_group at ffffffff9b40dbd1 #16 [ffffba4c81c6bd10] fsnotify at ffffffff9b40deb3 #17 [ffffba4c81c6bdb0] vfs_unlink at ffffffff9b3cad50 #18 [ffffba4c81c6bdf0] nfsd_unlink at ffffffffc0cf3a1d [nfsd] #19 [ffffba4c81c6be38] nfsd3_proc_remove at ffffffffc0cfdaff [nfsd] #20 [ffffba4c81c6be68] nfsd_dispatch at ffffffffc0cec419 [nfsd] #21 [ffffba4c81c6be90] svc_process_common at ffffffffc0b649ac [sunrpc] #22 [ffffba4c81c6bee0] svc_process at ffffffffc0b64c87 [sunrpc] #23 [ffffba4c81c6bef8] nfsd at ffffffffc0cebe95 [nfsd] #24 [ffffba4c81c6bf18] kthread at ffffffff9b11de29 #25 [ffffba4c81c6bf50] ret_from_fork at ffffffff9b001f82 ...the nfsd_file that it complained on is in 0xffff96a4bedc2bc8: crash> struct nfsd_file ffff96a4bedc2bc8 struct nfsd_file { nf_rhash = { next = 0xffff96a282e02321 }, nf_lru = { next = 0xffff96a46e7aa480, prev = 0xffff96a47f849d70 }, nf_rcu = { next = 0xffff96a282807400, func = 0x0 }, nf_file = 0xffff96a291704500, nf_cred = 0xffff96a4bf23c180, nf_net = 0xffffffff9da0ad40 <init_net>, nf_flags = 0xc, nf_inode = 0xffff96a546051500, nf_ref = { refs = { counter = 0x1 } }, nf_may = 0x2, nf_mark = 0xffff96a4bedc13c0, nf_birthtime = 0x3ef5d89b78d } ...which looks fine, but the next one on the LRU has its nf_lru list pointing to itself. crash> struct -l nfsd_file.nf_lru nfsd_file 0xffff96a46e7aa480 struct nfsd_file { nf_rhash = { next = 0xffff96a522c0e410 }, nf_lru = { next = 0xffff96a46e7aa480, prev = 0xffff96a46e7aa480 }, nf_rcu = { next = 0xffff96a47e20d700, func = 0x0 }, nf_file = 0xffff96a2820c7300, nf_cred = 0xffff96a500d0e3c0, nf_net = 0xffffffff9da0ad40 <init_net>, nf_flags = 0xc, nf_inode = 0xffff96a546050d00, nf_ref = { refs = { counter = 0x1 } }, nf_may = 0x4, nf_mark = 0xffff96a4bf162960, nf_birthtime = 0x3f01c5477f8 } ...and thus list corruption. Both entries have the same flag (GC and REFERENCED). The HASHED bit is set when the thing is newly allocated, so this entry was in the hash and then removed at some point. Looking at the LRU itself: crash> struct list_lru_node 0xffff96a4d0e34100 struct list_lru_node { lock = { { rlock = { raw_lock = { { val = { counter = 0x1 }, { locked = 0x1, pending = 0x0 }, { locked_pending = 0x1, tail = 0x0 } } } } } }, lru = { list = { next = 0xffff96a47f849d70, prev = 0xffff96a47fbe2210 }, nr_items = 0x4 }, nr_items = 0x4 } The list is clearly corrupt, but walking from the prev pointer shows all 4 entries: crash> list 0xffff96a47fbe2210 ffff96a47fbe2210 ffff96a4d0e34108 << head of the list ffff96a47f849d70 ffff96a4bedc2bd0 ffff96a46e7aa480 So in summary, we have 4 nfsd_file entries on the LRU list, two of them with NFSD_FILE_LRU clear (indicating that they are just waiting to be removed). One of those entries has an empty nf_lru list_head. I don't see how that can happen given the way the list_lru spinlocking works, so this looks a lot like a memory scribble or UAF. I might see one way this could occur. In nfsd_file_close_inode, we do this after queueing the thing to the dispose list: -----------------8<-------------------- list_for_each_entry_safe(nf, tmp, &dispose, nf_lru) { trace_nfsd_file_closing(nf); if (!refcount_dec_and_test(&nf->nf_ref)) list_del_init(&nf->nf_lru); } -----------------8<-------------------- ...but, while it's on the dispose list, we don't necessarily hold the only reference to it. Something else could race in and try to put this object on the LRU just before we add it to the dispose list. I suspect that our use of nf_lru for both the LRU and the dispose list is racy. I'm looking at ways to shore that up now. The simplest may be to just add a new nf_dispose list_head to struct nfsd_file. In any case, I think the crash I hit yesterday is different from the originally reported bug here, as the backported patches I'm testing have changed this code significantly. My current theory is that the problem is in nfsd_file_unhash_and_queue, specifically when called from __nfsd_file_close_inode. That code attempts to take a reference to the nf and then queue it to a dispose list_head using the nf_lru list_head in the struct. Once it's on a dispose list, the nf_lru is manipulated locklessly. The problem is that we don't know for certain that we hold the last reference in this case. Someone else could be attempting to manipulate the nf_lru at the same time. I think the solution is to ensure that we don't attempt to repurpose the nf_lru for anything but the actual LRU unless the last reference is put. I'm working on a patch for that now. > How reproducible:
> 100%
Is this really 100% reproducible with this reproducer? I'd be thrilled if so, but I'd like to confirm that as this problem has been very elusive so far.
*** Bug 2154740 has been marked as a duplicate of this bug. *** Moving to VERIFIED according to comment#34. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2458 |