Description of problem: The osd daemon got crashed once in ceph version 16.2.0-117.el8cp(RHCS 5.0) in thread 7f79a8cc0700 thread_name:msgr-worker-2. The crash info: { "backtrace": [ "/lib64/libpthread.so.0(+0x12b20) [0x7f79adcbbb20]", "(std::_Rb_tree<boost::intrusive_ptr<AsyncConnection>, boost::intrusive_ptr<AsyncConnection>, std::_Identity<boost::intrusive_ptr<AsyncConnection> >, std::less<boost::intrusive_ptr<AsyncConnection> >, std::allocator<boost::intrusive_ptr<AsyncConnection> > >::find(boost::intrusive_ptr<AsyncConnection> const&) const+0x2c) [0x5557c7d6cacc]", "(AsyncConnection::_stop()+0xab) [0x5557c7d66c7b]", "(ProtocolV2::stop()+0x8f) [0x5557c7d91d5f]", "(ProtocolV2::handle_existing_connection(boost::intrusive_ptr<AsyncConnection> const&)+0x742) [0x5557c7da74a2]", "(ProtocolV2::handle_client_ident(ceph::buffer::v15_2_0::list&)+0xeef) [0x5557c7da8d3f]", "(ProtocolV2::handle_frame_payload()+0x20b) [0x5557c7da934b]", "(ProtocolV2::handle_read_frame_dispatch()+0x160) [0x5557c7da95d0]", "(ProtocolV2::_handle_read_frame_epilogue_main()+0x95) [0x5557c7da97c5]", "(ProtocolV2::_handle_read_frame_segment()+0x92) [0x5557c7da9872]", "(ProtocolV2::handle_read_frame_segment(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)+0x201) [0x5557c7daa9c1]", "(ProtocolV2::run_continuation(Ct<ProtocolV2>&)+0x3c) [0x5557c7d92bfc]", "(AsyncConnection::process()+0x789) [0x5557c7d69d19]", "(EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xcb7) [0x5557c7bb8797]", "/usr/bin/ceph-osd(+0xe8f2bc) [0x5557c7bbc2bc]", "/lib64/libstdc++.so.6(+0xc2ba3) [0x7f79ad306ba3]", "/lib64/libpthread.so.0(+0x814a) [0x7f79adcb114a]", "clone()" ], "ceph_version": "16.2.0-117.el8cp", "crash_id": "2021-11-29T23:28:04.044560Z_c336b554-0ea2-4adc-88e7-a595182acb5e", "entity_name": "osd.1", "os_id": "rhel", "os_name": "Red Hat Enterprise Linux", "os_version": "8.4 (Ootpa)", "os_version_id": "8.4", "process_name": "ceph-osd", "stack_sig": "f0a30000aaf2ae26cfc68aa3a57d6101fd483063ed20b285a94140185b036bff", "timestamp": "2021-11-29T23:28:04.044560Z", "utsname_hostname": "ceph-1", "utsname_machine": "x86_64", "utsname_release": "4.18.0-240.el8.x86_64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP Wed Sep 23 05:13:10 EDT 2020" } We could find the same details from the journalctl of the osd daemon which was crashed. The journalctl says: Nov 30 00:28:04 ceph-1 conmon[375101]: *** Caught signal (Segmentation fault) ** Nov 30 00:28:04 ceph-1 conmon[375101]: in thread 7f79a8cc0700 thread_name:msgr-worker-2 Nov 30 00:28:04 ceph-1 conmon[375101]: ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable) Nov 30 00:28:04 ceph-1 conmon[375101]: 1: /lib64/libpthread.so.0(+0x12b20) [0x7f79adcbbb20] Nov 30 00:28:04 ceph-1 conmon[375101]: 2: (std::_Rb_tree<boost::intrusive_ptr<AsyncConnection>, boost::intrusive_ptr<AsyncConnection>, std::_Identity<boost::intrusive_ptr<AsyncConnection> >, std::less<boost::intrusive_ptr<AsyncConnection> >, std::allocator<boost::intrusive_ptr<AsyncConnection> > >::find(boost::intrusive_ptr<AsyncConnection> const&) const+0x2c) [0x5557c7d6cacc] Nov 30 00:28:04 ceph-1 conmon[375101]: 3: (AsyncConnection::_stop()+0xab) [0x5557c7d66c7b] Nov 30 00:28:04 ceph-1 conmon[375101]: 4: (ProtocolV2::stop()+0x8f) [0x5557c7d91d5f] Nov 30 00:28:04 ceph-1 conmon[375101]: 5: (ProtocolV2::handle_existing_connection(boost::intrusive_ptr<AsyncConnection> const&)+0x742) [0x5557c7da74a2] Nov 30 00:28:04 ceph-1 conmon[375101]: 6: (ProtocolV2::handle_client_ident(ceph::buffer::v15_2_0::list&)+0xeef) [0x5557c7da8d3f] Nov 30 00:28:04 ceph-1 conmon[375101]: 7: (ProtocolV2::handle_frame_payload()+0x20b) [0x5557c7da934b] Nov 30 00:28:04 ceph-1 conmon[375101]: 8: (ProtocolV2::handle_read_frame_dispatch()+0x160) [0x5557c7da95d0] Nov 30 00:28:04 ceph-1 conmon[375101]: 9: (ProtocolV2::_handle_read_frame_epilogue_main()+0x95) [0x5557c7da97c5] Nov 30 00:28:04 ceph-1 conmon[375101]: 10: (ProtocolV2::_handle_read_frame_segment()+0x92) [0x5557c7da9872] Nov 30 00:28:04 ceph-1 conmon[375101]: 11: (ProtocolV2::handle_read_frame_segment(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)+0x201) [0x5557c7daa9c1] Nov 30 00:28:04 ceph-1 conmon[375101]: 12: (ProtocolV2::run_continuation(Ct<ProtocolV2>&)+0x3c) [0x5557c7d92bfc] Nov 30 00:28:04 ceph-1 conmon[375101]: 13: (AsyncConnection::process()+0x789) [0x5557c7d69d19] Nov 30 00:28:04 ceph-1 conmon[375101]: 14: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xcb7) [0x5557c7bb8797] Nov 30 00:28:04 ceph-1 conmon[375101]: 15: /usr/bin/ceph-osd(+0xe8f2bc) [0x5557c7bbc2bc] Nov 30 00:28:04 ceph-1 conmon[375101]: 16: /lib64/libstdc++.so.6(+0xc2ba3) [0x7f79ad306ba3] Nov 30 00:28:04 ceph-1 conmon[375101]: 17: /lib64/libpthread.so.0(+0x814a) [0x7f79adcb114a] Nov 30 00:28:04 ceph-1 conmon[375101]: 18: clone() Nov 30 00:28:04 ceph-1 conmon[375101]: debug 2021-11-29T23:28:04.045+0000 7f79a8cc0700 -1 *** Caught signal (Segmentation fault) ** Nov 30 00:28:04 ceph-1 conmon[375101]: in thread 7f79a8cc0700 thread_name:msgr-worker-2 Nov 30 00:28:04 ceph-1 conmon[375101]: Nov 30 00:28:04 ceph-1 conmon[375101]: ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable) Nov 30 00:28:04 ceph-1 conmon[375101]: 1: /lib64/libpthread.so.0(+0x12b20) [0x7f79adcbbb20] Nov 30 00:28:04 ceph-1 conmon[375101]: 2: (std::_Rb_tree<boost::intrusive_ptr<AsyncConnection>, boost::intrusive_ptr<AsyncConnection>, std::_Identity<boost::intrusive_ptr<AsyncConnection> >, std::less<boost::intrusive_ptr<AsyncConnection> >, std::allocator<boost::intrusive_ptr<AsyncConnection> > >::find(boost::intrusive_ptr<AsyncConnection> const&) const+0x2c) [0x5557c7d6cacc] Nov 30 00:28:04 ceph-1 conmon[375101]: 3: (AsyncConnection::_stop()+0xab) [0x5557c7d66c7b] Nov 30 00:28:04 ceph-1 conmon[375101]: 4: (ProtocolV2::stop()+0x8f) [0x5557c7d91d5f] Nov 30 00:28:04 ceph-1 conmon[375101]: 5: (ProtocolV2::handle_existing_connection(boost::intrusive_ptr<AsyncConnection> const&)+0x742) [0x5557c7da74a2] Nov 30 00:28:04 ceph-1 conmon[375101]: 6: (ProtocolV2::handle_client_ident(ceph::buffer::v15_2_0::list&)+0xeef) [0x5557c7da8d3f] Nov 30 00:28:04 ceph-1 conmon[375101]: 7: (ProtocolV2::handle_frame_payload()+0x20b) [0x5557c7da934b] Nov 30 00:28:04 ceph-1 conmon[375101]: 8: (ProtocolV2::handle_read_frame_dispatch()+0x160) [0x5557c7da95d0] Nov 30 00:28:04 ceph-1 conmon[375101]: 9: (ProtocolV2::_handle_read_frame_epilogue_main()+0x95) [0x5557c7da97c5] Nov 30 00:28:04 ceph-1 conmon[375101]: 10: (ProtocolV2::_handle_read_frame_segment()+0x92) [0x5557c7da9872] Nov 30 00:28:04 ceph-1 conmon[375101]: 11: (ProtocolV2::handle_read_frame_segment(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)+0x201) [0x5557c7daa9c1] Nov 30 00:28:04 ceph-1 conmon[375101]: 12: (ProtocolV2::run_continuation(Ct<ProtocolV2>&)+0x3c) [0x5557c7d92bfc] Nov 30 00:28:04 ceph-1 conmon[375101]: 13: (AsyncConnection::process()+0x789) [0x5557c7d69d19] Nov 30 00:28:04 ceph-1 conmon[375101]: 14: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xcb7) [0x5557c7bb8797] Nov 30 00:28:04 ceph-1 conmon[375101]: 15: /usr/bin/ceph-osd(+0xe8f2bc) [0x5557c7bbc2bc] Nov 30 00:28:04 ceph-1 conmon[375101]: 16: /lib64/libstdc++.so.6(+0xc2ba3) [0x7f79ad306ba3] Nov 30 00:28:04 ceph-1 conmon[375101]: 17: /lib64/libpthread.so.0(+0x814a) [0x7f79adcb114a] Nov 30 00:28:04 ceph-1 conmon[375101]: 18: clone() Nov 30 00:28:04 ceph-1 conmon[375101]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. From the further investigation on this case, I could find a similar issue has been raised in upstream..[1] [1] https://tracker.ceph.com/issues/49237 Version-Release number of selected component (if applicable): ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1174