Seen in this teuthology run which thrashes the mirror daemon for active/active HA test: https://pulpito.ceph.com/vshankar-2021-08-05_02:19:15-fs-wip-cephfs-mirror-ha-active-active-20210802-054956-distro-basic-smithi/ 2021-08-05T02:39:35.991+0000 7fddebb3c700 -1 *** Caught signal (Segmentation fault) ** in thread 7fddebb3c700 thread_name:msgr-worker-1 ceph version 17.0.0-6593-gede67e63 (ede67e630d11e5f6758fa1e18b166b29d499c421) quincy (dev) 1: /lib64/libpthread.so.0(+0x12b20) [0x7fddf032db20] 2: (ProtocolV2::send_message(Message*)+0xa1) [0x7fddf14e0f37] 3: (AsyncConnection::send_message(Message*)+0x813) [0x7fddf14ad0db] 4: (Connection::send_message2(boost::intrusive_ptr<Message>)+0x1e) [0x7fddf14ade22] 5: (MonClient::_send_mon_message(boost::intrusive_ptr<Message>)+0x8a) [0x7fddf1588568] 6: (MonClient::_finish_hunting(int)+0x5f9) [0x7fddf1593eb5] 7: (MonClient::handle_auth_done(Connection*, AuthConnectionMeta*, unsigned long, unsigned int, ceph::buffer::v15_2_0::list const&, CryptoKey*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0x2ce) [0x7fddf1595400] 8: (ProtocolV2::handle_auth_done(ceph::buffer::v15_2_0::list&)+0x4b2) [0x7fddf14f15b4] 9: (ProtocolV2::handle_frame_payload()+0x1f6) [0x7fddf1500130] 10: (ProtocolV2::handle_read_frame_dispatch()+0x179) [0x7fddf15003cf] 11: (ProtocolV2::_handle_read_frame_epilogue_main()+0xc2) [0x7fddf15005e8] 12: (ProtocolV2::_handle_read_frame_segment()+0xa6) [0x7fddf1500938] 13: (ProtocolV2::handle_read_frame_segment(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)+0xc7) [0x7fddf1501f13] 14: (CtRxNode<ProtocolV2>::call(ProtocolV2*) const+0x31) [0x7fddf1502621] 15: (ProtocolV2::run_continuation(Ct<ProtocolV2>&)+0x3b) [0x7fddf14e7eaf] 16: /usr/lib64/ceph/libceph-common.so.2(+0x65d495) [0x7fddf14e8495] 17: (std::function<void (char*, long)>::operator()(char*, long) const+0x23) [0x7fddf14ae307] 18: (AsyncConnection::process()+0xeb5) [0x7fddf14ac099] 19: (C_handle_read::do_request(unsigned long)+0x16) [0x7fddf14aee24] 20: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x594) [0x7fddf150a7a0] 21: /usr/lib64/ceph/libceph-common.so.2(+0x68a4e7) [0x7fddf15154e7] 22: (std::function<void ()>::operator()() const+0x12) [0x7fddf1513ba6] 23: (std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void ()> > > >::_M_run()+0x11) [0x7fddf1513bc1] 24: /lib64/libstdc++.so.6(+0xc2ba3) [0x7fddef562ba3] 25: /lib64/libpthread.so.0(+0x814a) [0x7fddf032314a] 26: clone() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Basically, there is a race when the mirror daemon is shutting down and the mirror daemon receiving a fs_map update:: 2021-08-05T02:39:05.977+0000 7fddf302af80 10 cephfs::mirror::Mirror run: canceling timer task=0x55e35491d4e0 2021-08-05T02:39:05.977+0000 7fddf302af80 10 cephfs::mirror::Mirror run: trying to shutdown filesystem={fscid=2, fs_name=cephfs} 2021-08-05T02:39:05.977+0000 7fddf302af80 20 cephfs::mirror::FSMirror shutdown 2021-08-05T02:39:05.977+0000 7fddf302af80 20 cephfs::mirror::FSMirror shutdown_peer_replayers 2021-08-05T02:39:05.977+0000 7fddf302af80 5 cephfs::mirror::FSMirror shutdown_peer_replayers: shutting down replayer for peer={uuid=3aeddb3f-3d31-4db5-9da0-aeed11538b3c, remote_cluster={client_name=client.mirror_remote, cluster_name=ceph, fs_name=backup_fs}} 2021-08-05T02:39:05.977+0000 7fddf302af80 20 cephfs::mirror::PeerReplayer(3aeddb3f-3d31-4db5-9da0-aeed11538b3c) shutdown 2021-08-05T02:39:05.977+0000 7fddcbafc700 5 cephfs::mirror::PeerReplayer(3aeddb3f-3d31-4db5-9da0-aeed11538b3c) run: exiting 2021-08-05T02:39:05.977+0000 7fddcb2fb700 5 cephfs::mirror::PeerReplayer(3aeddb3f-3d31-4db5-9da0-aeed11538b3c) run: exiting 2021-08-05T02:39:05.977+0000 7fddcc2fd700 5 cephfs::mirror::PeerReplayer(3aeddb3f-3d31-4db5-9da0-aeed11538b3c) run: exiting 2021-08-05T02:39:05.980+0000 7fddf302af80 20 cephfs::mirror::FSMirror shutdown_mirror_watcher 2021-08-05T02:39:05.980+0000 7fddf302af80 20 cephfs::mirror::MirrorWatcher shutdown 2021-08-05T02:39:05.980+0000 7fddf302af80 20 cephfs::mirror::MirrorWatcher unregister_watcher 2021-08-05T02:39:05.980+0000 7fddf302af80 20 cephfs::mirror::Watcher unregister_watch 2021-08-05T02:39:05.982+0000 7fdde9337700 20 cephfs::mirror::MirrorWatcher handle_unregister_watcher: r=0 2021-08-05T02:39:05.982+0000 7fdde9337700 20 cephfs::mirror::FSMirror handle_shutdown_mirror_watcher: r=0 2021-08-05T02:39:05.982+0000 7fdde9337700 20 cephfs::mirror::FSMirror shutdown_instance_watcher 2021-08-05T02:39:05.982+0000 7fdde9337700 20 cephfs::mirror::InstanceWatcher shutdown 2021-08-05T02:39:05.982+0000 7fdde9337700 20 cephfs::mirror::InstanceWatcher unregister_watcher 2021-08-05T02:39:05.982+0000 7fdde9337700 20 cephfs::mirror::Watcher unregister_watch 2021-08-05T02:39:05.983+0000 7fdde9337700 20 cephfs::mirror::InstanceWatcher handle_unregister_watcher: r=0 2021-08-05T02:39:05.983+0000 7fdde9337700 20 cephfs::mirror::InstanceWatcher remove_instance 2021-08-05T02:39:05.985+0000 7fdde0325700 20 cephfs::mirror::InstanceWatcher handle_remove_instance: r=0 2021-08-05T02:39:05.985+0000 7fdde9337700 20 cephfs::mirror::FSMirror handle_shutdown_instance_watcher: r=0 2021-08-05T02:39:05.985+0000 7fdde9337700 20 cephfs::mirror::FSMirror cleanup 2021-08-05T02:39:06.435+0000 7fddebb3c700 20 cephfs::mirror::ClusterWatcher handle_fsmap 2021-08-05T02:39:06.435+0000 7fddebb3c700 5 cephfs::mirror::ClusterWatcher handle_fsmap: mirroring enabled=[], mirroring_disabled=[{fscid=2, fs_name=cephfs}] 2021-08-05T02:39:06.435+0000 7fddebb3c700 10 cephfs::mirror::ServiceDaemon: 0x55e35494c4e0 remove_filesystem: fscid=2 2021-08-05T02:39:06.435+0000 7fddebb3c700 10 cephfs::mirror::ServiceDaemon: 0x55e35494c4e0 schedule_update_status 2021-08-05T02:39:06.435+0000 7fddebb3c700 10 cephfs::mirror::Mirror mirroring_disabled: filesystem={fscid=2, fs_name=cephfs} 2021-08-05T02:39:07.435+0000 7fdde4b2e700 20 cephfs::mirror::ServiceDaemon: 0x55e35494c4e0 update_status: 0 filesystem(s) 2021-08-05T02:39:35.986+0000 7fddf302af80 10 cephfs::mirror::Mirror run: shutdown filesystem={fscid=2, fs_name=cephfs}, r=0 2021-08-05T02:39:35.986+0000 7fddf302af80 20 cephfs::mirror::FSMirror ~FSMirror 2021-08-05T02:39:35.986+0000 7fddf302af80 10 cephfs::mirror::Mirror ~Mirror 2021-08-05T02:39:35.986+0000 7fddebb3c700 5 cephfs::mirror::Mirror mirroring_disabledshutting down 2021-08-05T02:39:35.986+0000 7fddebb3c700 5 cephfs::mirror::ClusterWatcher handle_fsmap: peers added={}, peers removed={} 2021-08-05T02:39:35.986+0000 7fddf302af80 10 cephfs::mirror::ServiceDaemon: 0x55e35494c4e0 ~ServiceDaemon
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 5.0 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4105