Bug 2016380 - mds: crash when journaling during replay
Summary: mds: crash when journaling during replay
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 5.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 5.1
Assignee: Venky Shankar
QA Contact: Amarnath
Ranjini M N
URL:
Whiteboard:
Depends On:
Blocks: 2031073
TreeView+ depends on / blocked
 
Reported: 2021-10-21 12:45 UTC by Venky Shankar
Modified: 2022-04-04 10:22 UTC (History)
5 users (show)

Fixed In Version: ceph-16.2.6-19.el8cp
Doc Type: Bug Fix
Doc Text:
.The Ceph Metadata Server (MDS) no longer crashes after being promoted to an active rank Previously, the Ceph MDS might crash in some circumstances after being promoted to an active rank and remain unavailable, resulting in downtime for clients accessing the system due to a failover. With this update, the MDS failover results in the file system being available after transitioning to an active rank.
Clone Of:
Environment:
Last Closed: 2022-04-04 10:22:04 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 51589 0 None None None 2021-10-21 12:45:52 UTC
Red Hat Issue Tracker RHCEPH-2122 0 None None None 2021-10-28 10:16:36 UTC
Red Hat Product Errata RHSA-2022:1174 0 None None None 2022-04-04 10:22:28 UTC

Description Venky Shankar 2021-10-21 12:45:52 UTC
Description of problem:
A standy MDS transitioning to active rank can crash during transition phase (state) with the following backtrace:

  -1> 2021-07-08 15:14:13.283 7f3804255700 -1 /builddir/build/BUILD/ceph-14.2.20/src/mds/MDLog.cc: In function 'void MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)' thread 7f3804255700 time 2021-07-08 15:14:13.283719
/builddir/build/BUILD/ceph-14.2.20/src/mds/MDLog.cc: 288: FAILED ceph_assert(!segments.empty())

 ceph version 14.2.20 (36274af6eb7f2a5055f2d53ad448f2694e9046a0) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f380d72cfe7]
 2: (()+0x25d1af) [0x7f380d72d1af]
 3: (MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)+0x599) [0x557471ec5959]
 4: (Server::journal_close_session(Session*, int, Context*)+0x9ed) [0x557471c7e02d]
 5: (Server::kill_session(Session*, Context*)+0x234) [0x557471c81914]
 6: (Server::apply_blacklist(std::set<entity_addr_t, std::less<entity_addr_t>, std::allocator<entity_addr_t> > const&)+0x14d) [0x557471c8449d]
 7: (MDSRank::reconnect_start()+0xcf) [0x557471c49c5f]
 8: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x1c29) [0x557471c57979]
 9: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xa9b) [0x557471c3091b]
 10: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0xed) [0x557471c3216d]
 11: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xc3) [0x557471c32983]
 12: (DispatchQueue::entry()+0x1699) [0x7f380d952b79]
 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f380da008ed]
 14: (()+0x7ea5) [0x7f380b5eeea5]
 15: (clone()+0x6d) [0x7f380a29e96d]

(above backtrace is from a nautilus install, the bug still exists is othere releases).

Comment 10 Venky Shankar 2022-02-01 03:46:39 UTC
Clearing NI - doc text provided.

Comment 12 Venky Shankar 2022-02-21 04:55:44 UTC
minor reword - instead of

        "the Ceph MDS would crash after being promoted..."

change to:

        "the Ceph MDS might crash in some circumstances after being promoted..."

Comment 13 Venky Shankar 2022-02-21 07:09:43 UTC
Looks good!

Comment 15 errata-xmlrpc 2022-04-04 10:22:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1174


Note You need to log in before you can comment on or make changes to this bug.