Bug 2230067 - [GSS] ceph crash rocksdb::port::Mutex::Mutex [NEEDINFO]
Summary: [GSS] ceph crash rocksdb::port::Mutex::Mutex
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.11
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: ---
Assignee: Radoslaw Zarzynski
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-08 15:44 UTC by kelwhite
Modified: 2023-08-14 15:49 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
kelwhite: needinfo? (rzarzyns)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 60268 0 None None None 2023-08-09 19:26:04 UTC

Description kelwhite 2023-08-08 15:44:12 UTC
Description of problem (please be detailed as possible and provide log
snippests):

ceph mon-b is crashing with the following assert:

    "archived": "2023-08-03 19:41:43.584662",
    "backtrace": [
        "[0x3ffc9df8fde]",
        "gsignal()",
        "abort()",
        "(rocksdb::port::Mutex::Mutex(bool)+0) [0x2aa1715ac68]",
        "ceph-mon(+0x75adba) [0x2aa1715adba]",
        "(rocksdb::InstrumentedMutex::Lock()+0xda) [0x2aa1709ddba]",
        "ceph-mon(+0x55e748) [0x2aa16f5e748]",
        "(rocksdb::Cleanable::~Cleanable()+0x2a) [0x2aa17108552]",
        "(rocksdb::DBIter::~DBIter()+0x520) [0x2aa16fd6120]",
        "(rocksdb::ArenaWrappedDBIter::~ArenaWrappedDBIter()+0x30) [0x2aa171718a0]",
        "(std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()+0x5a) [0x2aa16c4e122]",
        "(std::_Sp_counted_ptr<MonitorDBStore::WholeStoreIteratorImpl*, (__gnu_cxx::_Lock_policy)2>::_M_dispose()+0x62) [0x2aa16cabe0a]",
        "(std::_Rb_tree<unsigned long, std::pair<unsigned long const, Monitor::SyncProvider>, std::_Select1st<std::pair<unsigned long const, Monitor::SyncProvider> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Monitor::SyncProvider> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, Monitor::SyncProvider> >*)+0xe6) [0x2aa16cb3646]",
        "(Monitor::~Monitor()+0x394) [0x2aa16c971a4]",
        "(Monitor::~Monitor()+0x16) [0x2aa16c979ce]",
        "main()",
        "__libc_start_main()",
        "ceph-mon(+0x247404) [0x2aa16c47404]",
        "[(nil)]"
    "ceph_version": "16.2.8-84.el8cp",
    "crash_id": "2023-07-28T20:42:49.295902Z_db4ed366-a88e-47f8-bfef-de0eb3c90660",
    "entity_name": "mon.b",
    "os_id": "rhel",
    "os_name": "Red Hat Enterprise Linux",
    "os_version": "8.6 (Ootpa)",
    "os_version_id": "8.6",
    "process_name": "ceph-mon",
    "stack_sig": "2a56dfb3ea296f126d13a277cc531950fd2f183e2c4a986b67436b8cbea6dba7",
    "timestamp": "2023-07-28T20:42:49.295902Z",
    "utsname_hostname": "rook-ceph-mon-b-bfbc4c9fd-xhtv8",
    "utsname_machine": "s390x",
    "utsname_release": "4.18.0-372.52.1.el8_6.s390x",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Fri Mar 31 06:14:27 EDT 2023"

Is there a way to prevent this crash?
What does this crash mean? seems were asking for another mutex when we already have one?

Version of all relevant components (if applicable):
ODF 4.11
Ceph  5.2

Is there any workaround available to the best of your knowledge?
No

Additional info:
Found an upstream tracker that seems to be the same issue https://tracker.ceph.com/issues/60268


Note You need to log in before you can comment on or make changes to this bug.