Bug 2233622 - [cee/sd][ceph-mon]one monitor got crashed with thread_name:safe_timer
Summary: [cee/sd][ceph-mon]one monitor got crashed with thread_name:safe_timer
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.11
Hardware: x86_64
OS: All
unspecified
medium
Target Milestone: ---
: ---
Assignee: Brad Hubbard
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-22 18:21 UTC by Janmejay Singh
Modified: 2023-09-12 06:21 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description Janmejay Singh 2023-08-22 18:21:01 UTC
Description of problem (please be detailed as possible and provide log
snippests):

one monitor pod crashed with below trace:

{
    "crash_id": "2023-08-04T21:01:11.604927Z_aeede7d0-05ed-4104-91a9-d8d846e80497",
    "timestamp": "2023-08-04T21:01:11.604927Z",
    "process_name": "ceph-mon",
    "entity_name": "mon.a",
    "ceph_version": "16.2.10-138.el8cp",
    "utsname_hostname": "rook-ceph-mon-a-66d458c45b-74npw",
    "utsname_sysname": "Linux",
    "utsname_release": "4.18.0-372.52.1.el8_6.x86_64",
    "utsname_version": "#1 SMP Fri Mar 31 06:22:44 EDT 2023",
    "utsname_machine": "x86_64",
    "os_name": "Red Hat Enterprise Linux",
    "os_id": "rhel",
    "os_version_id": "8.7",
    "os_version": "8.7 (Ootpa)",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12cf0) [0x7fe72ba73cf0]",
        "(MonitorDBStore::get_synchronizer(std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&)+0x41) [0x562ce5913671]",
        "(Monitor::_scrub(ScrubResult*, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >*, int*)+0x20f) [0x562ce58cd22f]",
        "(Monitor::scrub()+0x3b8) [0x562ce58d20c8]",
        "(Monitor::scrub_start()+0x148) [0x562ce58d22d8]",
        "(Context::complete(int)+0xd) [0x562ce58f5a7d]",
        "(CommonSafeTimer<std::mutex>::timer_thread()+0x10f) [0x7fe72de2ed0f]",
        "(CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x7fe72de300a1]",
        "/lib64/libpthread.so.0(+0x81ca) [0x7fe72ba691ca]",
        "clone()"
    ]
}

Version of all relevant components (if applicable):

OCP: 4.11
ODF: 4.11
Ceph: 16.2.10-138.el8cp

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No




Can this issue reproducible?
No

Can this issue reproduce from the UI?
NA




Actual results:
- Monitor pod crashed.

Expected results:
- Monitor pod crashed and restarted,however customer is looking for RCA. I couldn't find any downstream BZ to explain this specific trace therefore I need help from engineering.

- I see there is a tracker with similar issue reported.
https://tracker.ceph.com/issues/49809

Additional info:


Note You need to log in before you can comment on or make changes to this bug.