Created attachment 1697829 [details] ceph crash info details Description of problem: Observed that OSDs are crashing with aborted in thread_name:ms_dispatch Version-Release number of selected component (if applicable): ceph version 14.2.8-68.el7cp (c3d1f04bd7aa9ccc99ffd545ff2c5431b2df316e) nautilus (stable) How reproducible: Steps to Reproduce: 1.creating the pool sudo ceph osd pool create test_pool514 128 2.Executing the rados bench write command with 4MB data for 360 seconds on pool sudo rados --no-log-to-stderr -b 4194304 -p test_pool58 bench 360 write 3.Noticed that OSDs are getting crash.Observed that 180GB memory is occupied and we cleaned memory also OSD are down. Actual results: OSDs crash Expected results: OSD should not crash Additional info: [root@ceph-bharat-1592319447950-node13-osd ceph]# ceph crash ls ID ENTITY NEW 2020-06-16_17:08:55.746313Z_bd255010-4ad7-4ff0-81d9-75bb30d52dbc osd.16 * 2020-06-16_17:09:00.627586Z_5e9e7d9d-c665-4163-a4b5-0548ab47b7d9 osd.16 * 2020-06-16_17:09:04.894484Z_aa19e655-0780-4d86-a3a3-d64f38b3f8fc osd.16 * 2020-06-16_17:09:09.290174Z_5184c0b4-6f65-4ec3-9c4e-19630ddbac3a osd.16 * 2020-06-16_17:09:17.732047Z_0a989691-a0b9-4217-81d8-5296ae53104d osd.15 * 2020-06-16_17:09:21.039004Z_4b682265-7be4-4dc6-959e-70c423f66bc6 osd.15 * 2020-06-16_17:09:24.214661Z_285d46bd-aa97-4628-ac4b-693f8f9abf2a osd.15 * 2020-06-16_17:09:27.630156Z_d04f9f74-ea52-444a-9e6e-f4bb76440e89 osd.15 * 2020-06-16_17:09:58.889344Z_1cf66286-c37c-4cb0-8aea-693fe7e6436b osd.1 * 2020-06-16_17:10:02.426099Z_ed19570a-0740-4bc4-96f9-95e25b79dafc osd.1 * 2020-06-16_17:10:05.867627Z_50887ca6-90cd-4852-856f-cfb7ef44c40a osd.1 * 2020-06-16_17:10:09.141085Z_ec537be0-3430-46de-9dc5-c6108a61696f osd.1 * 2020-06-16_17:10:56.610033Z_b950383d-c2b0-4e0f-a7e5-f1b191af0d8c osd.14 * 2020-06-16_17:10:56.704379Z_711bc216-4c42-45e7-bfec-9b71ecda11da osd.9 * 2020-06-16_17:11:00.868746Z_f07d8336-905b-43af-9f30-0a6ded42767f osd.9 * 2020-06-16_17:11:01.881965Z_20232102-e43a-43a3-977f-202262abd5c1 osd.14 * 2020-06-16_17:11:03.928712Z_dd3f7a8e-72f4-4e5d-bbac-57e6fe6b76cb osd.9 * 2020-06-16_17:11:05.078065Z_a1575c56-7f12-45bd-81e4-c077650fe6e5 osd.14 * 2020-06-16_17:11:06.136871Z_629689db-d771-48d8-aed6-21cbf749a0cc osd.9 * 2020-06-16_17:11:08.344013Z_b6b4e950-411c-424d-adbd-9d70d30ea028 osd.14 * 2020-06-16_17:22:38.781667Z_b5889638-e75e-48e9-bee9-a2398e717282 osd.6 * 2020-06-16_17:23:40.076285Z_cf01ab90-8047-4ff6-9e98-e8adbd4da1ff osd.6 * 2020-06-16_17:23:42.767327Z_71ef7898-a443-40a1-b3ea-e6dcd4eebb18 osd.6 * 2020-06-16_17:23:44.804608Z_ca932fc7-9943-4182-a9d4-409186f5ad5c osd.6 * [root@ceph-bharat-1592319447950-node13-osd ceph]# =============================================================================================================================================== [cephuser@ceph-bharat-1592319447950-node9-clientnfs ~]$ sudo ceph df RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 282 GiB 80 GiB 180 GiB 202 GiB 71.60 TOTAL 282 GiB 80 GiB 180 GiB 202 GiB 71.60 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL cephfs_data 1 0 B 0 0 B 0 0 B cephfs_metadata 2 2.3 KiB 22 1.6 MiB 100.00 0 B .rgw.root 3 2.4 KiB 6 1.1 MiB 100.00 0 B default.rgw.control 4 0 B 8 0 B 0 0 B default.rgw.meta 5 493 B 2 512 KiB 100.00 0 B default.rgw.log 6 3.8 KiB 206 6.6 MiB 100.00 0 B rbd 7 0 B 0 0 B 0 0 B test_pool135 8 64 GiB 14.82k 191 GiB 100.00 0 B [cephuser@ceph-bharat-1592319447950-node9-clientnfs ~]$ =============================================================================== [cephuser@ceph-bharat-1592319447950-node9-clientnfs ~]$ sudo ceph versions { "mon": { "ceph version 14.2.8-68.el7cp (c3d1f04bd7aa9ccc99ffd545ff2c5431b2df316e) nautilus (stable)": 3 }, "mgr": { "ceph version 14.2.8-68.el7cp (c3d1f04bd7aa9ccc99ffd545ff2c5431b2df316e) nautilus (stable)": 1 }, "osd": { "ceph version 14.2.8-68.el7cp (c3d1f04bd7aa9ccc99ffd545ff2c5431b2df316e) nautilus (stable)": 22 }, "mds": { "ceph version 14.2.8-68.el7cp (c3d1f04bd7aa9ccc99ffd545ff2c5431b2df316e) nautilus (stable)": 2 }, "rgw": { "ceph version 14.2.8-68.el7cp (c3d1f04bd7aa9ccc99ffd545ff2c5431b2df316e) nautilus (stable)": 2 }, "rgw-nfs": { "ceph version 14.2.8-68.el7cp (c3d1f04bd7aa9ccc99ffd545ff2c5431b2df316e) nautilus (stable)": 1 }, "overall": { "ceph version 14.2.8-68.el7cp (c3d1f04bd7aa9ccc99ffd545ff2c5431b2df316e) nautilus (stable)": 31 } } [cephuser@ceph-bharat-1592319447950-node9-clientnfs ~]$ ============================================================================= Crash dump snippet:- 2020-06-17 08:33:22.638 7f2dbf4f3700 -1 *** Caught signal (Aborted) ** in thread 7f2dbf4f3700 thread_name:ms_dispatch ceph version 14.2.8-68.el7cp (c3d1f04bd7aa9ccc99ffd545ff2c5431b2df316e) nautilus (stable) 1: (()+0xf630) [0x7f2dd475a630] 2: (gsignal()+0x37) [0x7f2dd354e387] 3: (abort()+0x148) [0x7f2dd354fa78] 4: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0x1a5) [0x5573629b8b60] 5: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0xc8d) [0x557362f0e3ad] 6: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x370) [0x557362f230d0] 7: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x7f) [0x557362b19d1f] 8: (OSD::handle_osd_map(MOSDMap*)+0x3234) [0x557362aadd44] 9: (OSD::_dispatch(Message*)+0xa1) [0x557362abc411] 10: (OSD::ms_dispatch(Message*)+0x69) [0x557362abc779] 11: (DispatchQueue::entry()+0x129c) [0x55736336a28c] 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x5573631cd47d] 13: (()+0x7ea5) [0x7f2dd4752ea5] 14: (clone()+0x6d) [0x7f2dd36168dd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -1128> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command assert hook 0x55736d426500 -1127> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command abort hook 0x55736d426500 -1126> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command perfcounters_dump hook 0x55736d426500 -1125> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command 1 hook 0x55736d426500 -1124> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command perf dump hook 0x55736d426500 -1123> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command perfcounters_schema hook 0x55736d426500 -1122> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command perf histogram dump hook 0x55736d426500 -1121> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command 2 hook 0x55736d426500 -1120> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command perf schema hook 0x55736d426500 -1119> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command perf histogram schema hook 0x55736d426500 -1118> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command perf reset hook 0x55736d426500 -1117> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command config show hook 0x55736d426500 -1116> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command config help hook 0x55736d426500 -1115> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command config set hook 0x55736d426500 -1114> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command config unset hook 0x55736d426500 -1113> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command config get hook 0x55736d426500 -1112> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command config diff hook 0x55736d426500 -1111> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command config diff get hook 0x55736d426500 -1110> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command log flush hook 0x55736d426500 -1109> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command log dump hook 0x55736d426500 -1108> 2020-06-17 08:33:20.123 7f2dd770ca80 5 asok(0x55736d4c6000) register_command log reopen hook 0x55736d426500 -1107> 2020-06-17 08:33:20.124 7f2dd770ca80 5 asok(0x55736d4c6000) register_command dump_mempools hook 0x55736d4747c8 -1106> 2020-06-17 08:33:20.128 7f2dd770ca80 10 monclient: get_monmap_and_config Log files and ceph crash info details files are attached.
Created attachment 1697832 [details] Log files
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4144