Description of problem: /builddir/build/BUILD/ceph-12.2.4/src/mds/Locker.cc: 3793: FAILED assert(mds->is_rejoin() || mds->is_clientreplay() || mds->is_active() || mds->is_stopping()) ceph version 12.2.4-6.el7cp (78f60b924802e34d44f7078029a40dbe6c0c922f) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55bba8bda2a0] 2: (Locker::handle_lock(MLock*)+0x57) [0x55bba8a4a7c7] 3: (Locker::dispatch(Message*)+0x85) [0x55bba8a565e5] 4: (MDSRank::handle_deferrable_message(Message*)+0xbb4) [0x55bba88bd264] 5: (MDSRank::_dispatch(Message*, bool)+0x1e3) [0x55bba88cab33] 6: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x55bba88cb975] 7: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x55bba88b4593] 8: (DispatchQueue::entry()+0x792) [0x55bba8ec3d32] 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x55bba8c5fefd] 10: (()+0x7dd5) [0x7f2098147dd5] 11: (clone()+0x6d) [0x7f2097227b3d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Version-Release number of selected component (if applicable): ceph version 12.2.4-6.el7cp How reproducible: 1/1 Steps to Reproduce: 1. Configure cluster with 3 MDS (2 acitve and 1 standby) 2. Reduce the max mds to 1 3. deactivate the 1 rank MDS's wait till mds stops. 4. Stop the MDS daemons of standby MDS 5. restart the active MDS wait till it comes active 6. Start the standby MDS 7. Increate max_mds to 2, MDS will be in starting state. MDS was in starting state for more than 2 hrs, Tried to restart the MDS (which is in starting state), after service restart found the it is in starting state again and move d to resolve state. Still the MDS is in resolve state, observed assert in MDS log. Actual results: MDS in resolve state, and fs state is degraded Expected results: MDS should become active and FS should be OK Additional info:
Sorry please ignore the before comment updated in wrong bz.
Added doc text
this one should be fixed by https://github.com/ceph/ceph/pull/21601
*** Bug 1567030 has been marked as a duplicate of this bug. ***
Moving this bug to verified state. Not observed any MDS assert during testing. Verified in ceph version 12.2.4-27.el7cp CI Automation regression runs passed without any issues.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2177