Bug 2228251
| Summary: | [GSS][CephFS] MDS crashing with ceph_assert(lock->get_state() == LOCK_LOCK || lock->get_state() == LOCK_MIX || lock->get_state() == LOCK_MIX_SYNC2) | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Anton Mark <amark> |
| Component: | CephFS | Assignee: | Xiubo Li <xiubli> |
| Status: | ASSIGNED --- | QA Contact: | Hemanth Kumar <hyelloji> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.0 | CC: | bkunal, ceph-eng-bugs, cephqe-warriors, vshankar, xiubli |
| Target Milestone: | --- | Flags: | xiubli:
needinfo?
(amark) |
| Target Release: | 7.1 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Description of problem: Slow/blocked ops leading to mds failures with MDS crash. After multiple ranks of MDS daemons fail the cluster returned to normal operation. Version-Release number of selected component (if applicable): { "mon": { "ceph version 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable)": 5 }, "mgr": { "ceph version 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable)": 2 }, "osd": { "ceph version 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable)": 312 }, "mds": { "ceph version 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable)": 8 }, "overall": { "ceph version 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable)": 327 } } How reproducible: Unknown. Steps to Reproduce: Unknown. Actual results: Client IO interruption and MDS daemon crash. Expected results: No crash or IO interruption for clients. Additional info: { "assert_condition": "lock->get_state() == LOCK_LOCK || lock->get_state() == LOCK_MIX || lock->get_state() == LOCK_MIX_SYNC2", "assert_file": "/builddir/build/BUILD/ceph-17.2.6/src/mds/Locker.cc", "assert_func": "void Locker::handle_file_lock(ScatterLock*, ceph::cref_t<MLock>&)", "assert_line": 5767, "assert_msg": "/builddir/build/BUILD/ceph-17.2.6/src/mds/Locker.cc: In function 'void Locker::handle_file_lock(ScatterLock*, ceph::cref_t<MLock>&)' thread 7f21a0425640 time 2023-07-31T19:16:21.053778+0000\n/builddir/build/BUILD/ceph-17.2.6/src/mds/Locker.cc: 5767: FAILED ceph_assert(lock->get_state() == LOCK_LOCK || lock->get_state() == LOCK_MIX || lock->get_state() == LOCK_MIX_SYNC2)\n", "assert_thread_name": "ms_dispatch", "backtrace": [ "/lib64/libc.so.6(+0x54df0) [0x7f21a52c0df0]", "/lib64/libc.so.6(+0xa154c) [0x7f21a530d54c]", "raise()", "abort()", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0x7f21a591dae1]", "/usr/lib64/ceph/libceph-common.so.2(+0x143c45) [0x7f21a591dc45]", "(Locker::handle_file_lock(ScatterLock*, boost::intrusive_ptr<MLock const> const&)+0x14ac) [0x5571d93300ac]", "(Locker::dispatch(boost::intrusive_ptr<Message const> const&)+0x164) [0x5571d9304d34]", "(MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x4f3) [0x5571d919a443]", "(MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x5c) [0x5571d919a99c]", "(MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x195) [0x5571d9184815]", "(DispatchQueue::entry()+0x53a) [0x7f21a5b0975a]", "/usr/lib64/ceph/libceph-common.so.2(+0x3bb6b1) [0x7f21a5b956b1]", "/lib64/libc.so.6(+0x9f802) [0x7f21a530b802]", "clone()" ], "ceph_version": "17.2.6-70.el9cp", "crash_id": "2023-07-31T19:16:21.079357Z_5e8c4120-46ce-4e59-9a0f-3bd35449dfcb", "entity_name": "mds.root.host10.fckajv", "os_id": "rhel", "os_name": "Red Hat Enterprise Linux", "os_version": "9.2 (Plow)", "os_version_id": "9.2", "process_name": "ceph-mds", "stack_sig": "62a9400fe4f62ee81c69ab8eb327515c3a3ab2affea4ab8fd8f190aad70bfeba", "timestamp": "2023-07-31T19:16:21.079357Z", "utsname_hostname": "host10", "utsname_machine": "x86_64", "utsname_release": "4.18.0-477.13.1.el8_8.x86_64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP Thu May 18 10:27:05 EDT 2023"