Description of problem: A standy MDS transitioning to active rank can crash during transition phase (state) with the following backtrace: -1> 2021-07-08 15:14:13.283 7f3804255700 -1 /builddir/build/BUILD/ceph-14.2.20/src/mds/MDLog.cc: In function 'void MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)' thread 7f3804255700 time 2021-07-08 15:14:13.283719 /builddir/build/BUILD/ceph-14.2.20/src/mds/MDLog.cc: 288: FAILED ceph_assert(!segments.empty()) ceph version 14.2.20 (36274af6eb7f2a5055f2d53ad448f2694e9046a0) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f380d72cfe7] 2: (()+0x25d1af) [0x7f380d72d1af] 3: (MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)+0x599) [0x557471ec5959] 4: (Server::journal_close_session(Session*, int, Context*)+0x9ed) [0x557471c7e02d] 5: (Server::kill_session(Session*, Context*)+0x234) [0x557471c81914] 6: (Server::apply_blacklist(std::set<entity_addr_t, std::less<entity_addr_t>, std::allocator<entity_addr_t> > const&)+0x14d) [0x557471c8449d] 7: (MDSRank::reconnect_start()+0xcf) [0x557471c49c5f] 8: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x1c29) [0x557471c57979] 9: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xa9b) [0x557471c3091b] 10: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0xed) [0x557471c3216d] 11: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xc3) [0x557471c32983] 12: (DispatchQueue::entry()+0x1699) [0x7f380d952b79] 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f380da008ed] 14: (()+0x7ea5) [0x7f380b5eeea5] 15: (clone()+0x6d) [0x7f380a29e96d] (above backtrace is from a nautilus install, the bug still exists is othere releases).
Clearing NI - doc text provided.
minor reword - instead of "the Ceph MDS would crash after being promoted..." change to: "the Ceph MDS might crash in some circumstances after being promoted..."
Looks good!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1174