Bug 1699402
Summary: | smallfile caused kernel Cephfs crash in RHOCS (OpenShift-on-Ceph) | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Ben England <bengland> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
kernel sub component: | CephFS | QA Contact: | Hemanth Kumar <hyelloji> |
Status: | CLOSED ERRATA | Docs Contact: | John Wilkins <jowilkin> |
Severity: | high | ||
Priority: | high | CC: | ceph-eng-bugs, cmaiolin, jlayton, pdonnell, rhandlin, sweil, tmuthami, vakulkar, vereddy, zguo, zyan |
Version: | 7.6 | ||
Target Milestone: | rc | ||
Target Release: | 7.7 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | kernel-3.10.0-1128.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-09-29 21:00:47 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ben England
2019-04-12 15:42:00 UTC
BUG is at [125150.911742] kernel BUG at fs/ceph/mds_client.c:2100! BUG_ON(p > end); [125150.912549] [<ffffffffc0cf5352>] __do_request+0x342/0x430 [ceph] [125150.912569] [<ffffffffc0cf6a8d>] ceph_mdsc_do_request+0x9d/0x280 [ceph] [125150.912589] [<ffffffffc0cd6901>] ceph_d_revalidate+0x221/0x520 [ceph] [125150.912608] [<ffffffff8925c43a>] ? __d_lookup+0x7a/0x160 [125150.912623] [<ffffffff8924d2ea>] lookup_fast+0x1da/0x230 [125150.912637] [<ffffffff8925156d>] path_lookupat+0x16d/0x8b0 [125150.912653] [<ffffffff8921bcb5>] ? kmem_cache_alloc+0x35/0x1f0 [125150.912668] [<ffffffff89252b2f>] ? getname_flags+0x4f/0x1a0 [125150.912682] [<ffffffff89251cdb>] filename_lookup+0x2b/0xc0 [125150.912696] [<ffffffff89253cc7>] user_path_at_empty+0x67/0xc0 [125150.912712] [<ffffffff891ebc6d>] ? handle_mm_fault+0x39d/0x9b0 [125150.913402] [<ffffffff89253d31>] user_path_at+0x11/0x20 [125150.914045] [<ffffffff89246bc3>] vfs_fstatat+0x63/0xc0 [125150.914681] [<ffffffff89246f7e>] SYSC_newstat+0x2e/0x60 [125150.915314] [<ffffffff89138b36>] ? __audit_syscall_exit+0x1e6/0x280 [125150.915943] [<ffffffff8924743e>] SyS_newstat+0xe/0x10 [125150.916567] [<ffffffff89774ddb>] system_call_fastpath+0x22/0x27 Encoding lookup request is simple. there is no req->r_xxx_drop. No idea what happened I'm poking around in this code at the moment, so I'll go ahead and grab this one. Notes from vmcore: I believe the struct ceph_mds_request is at 0xffff982ba7a69000, and ceph_msg is at 0xffff983a82aebc00. This is the "front": crash> struct ceph_msg.front ffff983a82aebc00 front = { iov_base = 0xffff98396bf36180, iov_len = 0x8b } So, "end" should be 0xffff98396bf3620b. After encoding the filepaths: crash> struct ceph_mds_request.r_request_release_offset 0xffff982ba7a69000 r_request_release_offset = 0x8a So it looks like encoding the filepaths ended up taking almost all of the buffer, before we even got to encoding the timespec, which is probably what pushed it over the limit. We're dealing with a dentry revalidation here, so this is just a lookup and r_dentry is set in the request. We calculate the frontlen of the request like this: len = sizeof(*head) + pathlen1 + pathlen2 + 2*(1 + sizeof(u32) + sizeof(u64)) + sizeof(struct ceph_timespec); pathlen2 should 0, and the rest of the fields are fixed-length, so pathlen1 should be 17 bytes (0x11). Looking at r_dentry though: d_name = { { { hash = 0xb7f43245, len = 0x18 }, hash_len = 0x18b7f43245 }, name = 0xffff98445a522ff8 "starting_gate.tmp.notyet" }, Just this component is 0x18 bytes long, and it's not at the root of the fs. set_request_path_attr is what sets the pointers to the path strings and the length. In the case of a nosnap dentry, we just return a pointer to the d_name string. rcu_read_lock(); if (!dir) dir = d_inode_rcu(dentry->d_parent); if (dir && ceph_snap(dir) == CEPH_NOSNAP) { *pino = ceph_ino(dir); rcu_read_unlock(); *ppath = dentry->d_name.name; *ppathlen = dentry->d_name.len; return 0; } rcu_read_unlock(); But when we go to actually encode the thing, we don't pass in the ppathlen: ceph_encode_filepath(&p, end, ino1, path1); I think we hit a race of some sort here, where the length of the string changed after we did the initial set_request_path_attr, but before the encoding. I'm not yet sure how that could happen however. Ok, I think the thing probably got involved in a rename while we were working in it. __d_move will call switch_names to swap the dentry names. I suspect that's what happened here. It could be other things too, but basically I think that build_dentry_path is likely not safe. I think we'll probably need to stop build_dentry_path from passing back pointers to dentry->d_name.name. We either need to hold d_lock in order to stabilize the pointers (probably not feasible), or make a copy of the name before returning. Still looking over the code to figure out the right thing to do here. good job. Thanks. Upstream code is pretty much the same here too, so I'm working on a patch for that first. We can discuss/fix it there and then backport to RHEL7. Patch posted upstream: https://marc.info/?l=ceph-devel&m=155534672111970&w=2 suggestion: run this with RHEL8.1 Cephfs client and servers running OCP 4.3 and OCS 4.2 GA. If it doesn't happen there, then we mark it fixed. I can't really do that until I have the same hardware config. It doesn't say this here (sigh) but I was running this test on bare metal cluster, OCP 4.3 will support bare metal deployments with UPI, and OCS should run on that. Patch(es) committed on kernel-3.10.0-1128.el7 Ceph QE is running sanity test. Will update the BZ once test are complete. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4060 |