Bug 2218985
| Summary: | [5.3z3 ceph cluster] MDS - ceph-16.2.10/src/mds/Locker.cc: 2357: FAILED ceph_assert(!cap->is_new()) | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Brett Hull <bhull> |
| Component: | CephFS | Assignee: | Venky Shankar <vshankar> |
| Status: | CLOSED UPSTREAM | QA Contact: | Hemanth Kumar <hyelloji> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 5.3 | CC: | bhull, ceph-eng-bugs, cephqe-warriors, gfarnum, mcaldeir, olim, vshankar |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | Backlog | Flags: | mcaldeir:
needinfo-
mcaldeir: needinfo- |
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2026-03-04 08:51:15 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Shoot this bz got missed when it came in — Venky, please triage. (In reply to Greg Farnum from comment #1) > Shoot this bz got missed when it came in — Venky, please triage. ACK. On it today. Another customer encounter the same crash:
-1> 2024-01-04T20:08:44.458+0000 7feefe3ab700 -1 /builddir/build/BUILD/ceph-16.2.10/src/mds/Locker.cc: In function 'int Locker::issue_caps(CInode*, Capability*)' thread 7feefe3ab700 time 2024-01-04T20:08:44.457978+0000
/builddir/build/BUILD/ceph-16.2.10/src/mds/Locker.cc: 2357: FAILED ceph_assert(!cap->is_new())
ceph version 16.2.10-172.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7fef0d0017b8]
2: /usr/lib64/ceph/libceph-common.so.2(+0x2799d2) [0x7fef0d0019d2]
3: (Locker::issue_caps(CInode*, Capability*)+0x1682) [0x559790c06ca2]
4: (Locker::issue_caps_set(std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >&)+0x2e) [0x559790c06fde]
5: (Locker::drop_locks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0xcb) [0x559790c233fb]
6: (MDCache::request_cleanup(boost::intrusive_ptr<MDRequestImpl>&)+0xfa) [0x559790b5575a]
7: (MDCache::request_finish(boost::intrusive_ptr<MDRequestImpl>&)+0x17b) [0x559790b55b5b]
8: (Server::reply_client_request(boost::intrusive_ptr<MDRequestImpl>&, boost::intrusive_ptr<MClientReply> const&)+0x6fc) [0x559790a91b4c]
9: (Server::respond_to_request(boost::intrusive_ptr<MDRequestImpl>&, int)+0x44b) [0x559790a9267b]
10: (C_MDS_inode_update_finish::finish(int)+0xa9) [0x559790b1f4d9]
11: (MDSContext::complete(int)+0x203) [0x559790d49383]
12: (MDSIOContextBase::complete(int)+0x6ac) [0x559790d49b2c]
13: (MDSLogContextBase::complete(int)+0x44) [0x559790d49db4]
14: (Finisher::finisher_thread_entry()+0x1a5) [0x7fef0d0a3735]
15: /lib64/libpthread.so.0(+0x81ca) [0x7fef0bfe01ca]
16: clone()
Hello Venky, I created KCS #7091056, (https://access.redhat.com/solutions/7091056) which will prompt people to change their system so it can deal with a large Core Image. Best regards, Manny This product has been discontinued or is no longer tracked in Red Hat Bugzilla. |
Created attachment 1973454 [details] stack dump in the mds log. Description of problem: v5.3z3 cluster has seen many MDS issues, this most recent issue is a ceph_assert in the Locker.cc code. 2023-06-16T09:16:13.555+0000 7f926e8b1700 -1 /builddir/build/BUILD/ceph-16.2.10/src/mds/Locker.cc: In function 'int Locker::issue_caps(CInode*, Capability*)' thread 7f926e8b1700 time 2023-06-16T09:16:13.554458+0000 /builddir/build/BUILD/ceph-16.2.10/src/mds/Locker.cc: 2357: FAILED ceph_assert(!cap->is_new()) ceph version 16.2.10-172.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f92774fb7b8] 2: /usr/lib64/ceph/libceph-common.so.2(+0x2799d2) [0x7f92774fb9d2] 3: (Locker::issue_caps(CInode*, Capability*)+0x1682) [0x5592893d0ca2] <- int Locker::issue_caps(CInode *in, Capability *only_cap) 4: (Locker::simple_sync(SimpleLock*, bool*)+0x4bf) [0x5592893d98ef] 5: (Locker::_rdlock_kick(SimpleLock*, bool)+0x22f) [0x5592893ef35f] 6: (Locker::rdlock_start(SimpleLock*, boost::intrusive_ptr<MDRequestImpl>&, bool)+0xdf) [0x5592893efe5f] 7: (Locker::acquire_locks(boost::intrusive_ptr<MDRequestImpl>&, MutationImpl::LockOpVec&, CInode*, bool)+0x28d6) [0x5592893f2d26] 8: (Server::handle_client_getattr(boost::intrusive_ptr<MDRequestImpl>&, bool)+0x329) [0x559289260269] 9: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0x677) [0x5592892abe57] 10: (Server::handle_client_request(boost::intrusive_ptr<MClientRequest const> const&)+0x403) [0x5592892ace53] 11: (Server::dispatch(boost::intrusive_ptr<Message const> const&)+0x12b) [0x5592892b165b] 12: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&)+0xbb4) [0x5592892069f4] 13: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x7bb) [0x5592892093ab] 14: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x55) [0x5592892099a5] 15: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x108) [0x5592891f9598] 16: (DispatchQueue::entry()+0x126a) [0x7f927774435a] 17: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f92777f77b1] 18: /lib64/libpthread.so.0(+0x81ca) [0x7f92764da1ca] 19: clone() 2023-06-16T09:16:13.557+0000 7f926e8b1700 -1 *** Caught signal (Aborted) ** in thread 7f926e8b1700 thread_name:ms_dispatch ceph version 16.2.10-172.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific (stable) 1: /lib64/libpthread.so.0(+0x12cf0) [0x7f92764e4cf0] 2: gsignal() 3: abort() 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f92774fb809] 5: /usr/lib64/ceph/libceph-common.so.2(+0x2799d2) [0x7f92774fb9d2] 6: (Locker::issue_caps(CInode*, Capability*)+0x1682) [0x5592893d0ca2] 7: (Locker::simple_sync(SimpleLock*, bool*)+0x4bf) [0x5592893d98ef] 8: (Locker::_rdlock_kick(SimpleLock*, bool)+0x22f) [0x5592893ef35f] 9: (Locker::rdlock_start(SimpleLock*, boost::intrusive_ptr<MDRequestImpl>&, bool)+0xdf) [0x5592893efe5f] 10: (Locker::acquire_locks(boost::intrusive_ptr<MDRequestImpl>&, MutationImpl::LockOpVec&, CInode*, bool)+0x28d6) [0x5592893f2d26] 11: (Server::handle_client_getattr(boost::intrusive_ptr<MDRequestImpl>&, bool)+0x329) [0x559289260269] 12: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0x677) [0x5592892abe57] 13: (Server::handle_client_request(boost::intrusive_ptr<MClientRequest const> const&)+0x403) [0x5592892ace53] 14: (Server::dispatch(boost::intrusive_ptr<Message const> const&)+0x12b) [0x5592892b165b] 15: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&)+0xbb4) [0x5592892069f4] 16: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x7bb) [0x5592892093ab] 17: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x55) [0x5592892099a5] 18: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x108) [0x5592891f9598] 19: (DispatchQueue::entry()+0x126a) [0x7f927774435a] 20: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f92777f77b1] 21: /lib64/libpthread.so.0(+0x81ca) [0x7f92764da1ca] 22: clone() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Version-Release number of selected component (if applicable): 16.2.10-172.el8cp Storage <- 5.3.z3 - 5.3.3 How reproducible: Has not been reproduced. Steps to Reproduce: 1. 2. 3. Actual results: Abort Expected results: continue Additional info: --- Locker.cc int Locker::issue_caps(CInode *in, Capability *only_cap) { // count conflicts with int nissued = 0; int all_allowed = -1, loner_allowed = -1, xlocker_allowed = -1; ceph_assert(in->is_head()); // client caps . . . if (!(pending & ~allowed)) { // skip if suppress or new, and not revocation if (cap->is_new() || cap->is_suppress() || cap->is_stale()) { dout(20) << " !revoke and new|suppressed|stale, skipping client." << it->first << dendl; continue; } } else { ceph_assert(!cap->is_new()); <- ptr from dump_stack. if (cap->is_stale()) { dout(20) << " revoke stale cap from client." << it->first << dendl; ceph_assert(!cap->is_valid()); cap->issue(allowed & pending, false); mds->queue_waiter_front(new C_Locker_RevokeStaleCap(this, in, it->first)); continue; } I will attach the ceph-mds log.