+++ This bug was initially created as a clone of Bug #2186234 +++ Description of problem (please be detailed as possible and provide log snippests): ceph osd.1 running on OCP/ODF 4.10.10 cluster crash and restart automatically. ceph status report WARNING related to one daemon crash but look like it's a one shot event. cluster: id: 7ec4a1a3-5211-4e4a-9bf6-c9e01f39a474 health: HEALTH_WARN 1 daemons have recently crashed services: mon: 3 daemons, quorum a,b,c (age 7w) mgr: a(active, since 5h) mds: 1/1 daemons up, 1 hot standby osd: 3 osds: 3 up (since 5h), 3 in (since 2M) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 11 pools, 177 pgs objects: 643.32k objects, 318 GiB usage: 960 GiB used, 540 GiB / 1.5 TiB avail pgs: 177 active+clean io: client: 9.4 MiB/s rd, 13 MiB/s wr, 14 op/s rd, 122 op/s wr Version of all relevant components (if applicable): "osd": { "ceph version 16.2.7-126.el8cp (fe0af61d104d48cb9d116cde6e593b5fc8c197e4) pacific (stable)": 3 }, The ceph container run the following image 5-235 : registry.redhat.io/rhceph/rhceph-5-rhel8@sha256:fc25524ccb0ea78526257778ab54bfb1a25772b75fcc97df98eb06a0e67e1bf6 Which is downloadable at the following site : https://catalog.redhat.com/software/containers/rhceph/rhceph-5-rhel8/60ec72a74a6a2c7844abe5fb?tag=5-235&push_date=1656624434000 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No impact on production so far Is there any workaround available to the best of your knowledge? For now, no workaround because it is working. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? No idea Can this issue reproducible? For now, it is not. Can this issue reproduce from the UI? No If this is a regression, please provide more details to justify this: No regression. Steps to Reproduce: No reproductible so far Actual results: N/A Expected results: Additional info: Here is the full stack trace on the crash we have : ceph version 16.2.7-126.el8cp (fe0af61d104d48cb9d116cde6e593b5fc8c197e4) pacific (stable) 1: /lib64/libpthread.so.0(+0x12ce0) [0x7f705ce60ce0] 2: (BlueStore::Onode::put()+0x2cb) [0x5634a77de74b] 3: (BlueStore::omap_get_values(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::ba sic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std: :__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*)+0x1a5) [0x5634a7825c15] 4: (MapCacher::MapCacher<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list>::get_keys(std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator <char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocat or<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*)+0x3e3) [0x5634a7506393] 5: (SnapMapper::get_snaps(hobject_t const&, SnapMapper::object_snaps*)+0xec) [0x5634a74fc80c] 6: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> > const&, MapCacher::Transaction<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list>*)+0xcb) [0x5634a750015b] 7: (PG::update_snap_map(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, ceph::os::Transaction&)+0x814) [0x5634a73b2394] 8: (non-virtual thunk to PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, std::optional<pg_hit_set_history_t> const&, eversion_t const&, eversion_t const&, eversion_t const&, bool, ceph::os::Transaction&, bool)+0x290) [0x5634a74c73f0] 9: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xd0e) [0x5634a769a48e] 10: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x267) [0x5634a76aa997] 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x52) [0x5634a74dd192] 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5de) [0x5634a74802ee] 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x309) [0x5634a7308aa9] 14: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x5634a7567218] 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xc28) [0x5634a7328f18] 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x5634a799c1d4] 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x5634a799ee74] 18: /lib64/libpthread.so.0(+0x81cf) [0x7f705ce561cf] 19: clone() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. This stack trace was captured by ceph-crash running on ODF. I attach the whole file capture by OCS Mustgather. --- Additional comment from RHEL Program Management on 2023-04-12 14:02:23 UTC --- This bug having no release flag set previously, is now set with release flag 'odf‑4.13.0' to '?', and so is being proposed to be fixed at the ODF 4.13.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag. --- Additional comment from on 2023-04-12 14:21:00 UTC --- I am suspecting the following patch which does not look to be in the 16.2.7-126. ~~~~~ commit 0960f981f72009086317710f51c5acefe1bd7232 Author: Igor Fedotov <ifedotov> Date: Tue Dec 14 17:56:37 2021 +0300 os/bluestore: get rid of fake onode nref increment for pinned entry Looks like this isn't necessary any more after fixing https://tracker.ceph.com/issues/53002 Signed-off-by: Igor Fedotov <igor.fedotov> diff --git a/src/os/bluestore/BlueStore.cc b/src/os/bluestore/BlueStore.cc index 4d8b7f3f3e1..78709eb5dc8 100644 --- a/src/os/bluestore/BlueStore.cc +++ b/src/os/bluestore/BlueStore.cc @@ -3623,14 +3623,7 @@ void BlueStore::Onode::calc_omap_tail( _key_encode_u64(o->onode.nid, out); out->push_back('~'); } -// -// A tricky thing about Onode's ref counter is that we do an additional -// increment when newly pinned instance is detected. And -1 on unpin. -// This prevents from a conflict with a delete call (when nref == 0). -// The latter might happen while the thread is in unpin() function -// (and e.g. waiting for lock acquisition) since nref is already -// decremented. And another 'putting' thread on the instance will release it. -// + void BlueStore::Onode::get() { if (++nref >= 2 && !pinned) { OnodeCacheShard* ocs = c->get_onode_cache(); @@ -3643,11 +3636,7 @@ void BlueStore::Onode::get() { } bool was_pinned = pinned; pinned = nref >= 2; - // additional increment for newly pinned instance bool r = !was_pinned && pinned; - if (r) { - ++nref; - } if (cached && r) { ocs->_pin(this); } @@ -3657,7 +3646,7 @@ void BlueStore::Onode::get() { void BlueStore::Onode::put() { ++put_nref; int n = --nref; - if (n == 2) { + if (n == 1) { OnodeCacheShard* ocs = c->get_onode_cache(); ocs->lock.lock(); // It is possible that during waiting split_cache moved us to different OnodeCacheShard. @@ -3667,8 +3656,7 @@ void BlueStore::Onode::put() { ocs->lock.lock(); } bool need_unpin = pinned; - pinned = pinned && nref > 2; // intentionally use > not >= as we have - // +1 due to pinned state + pinned = pinned && nref >= 2; need_unpin = need_unpin && !pinned; if (cached && need_unpin) { if (exists) { @@ -3679,10 +3667,6 @@ void BlueStore::Onode::put() { c->onode_map._remove(oid); } } - // additional decrement for newly unpinned instance - if (need_unpin) { - --nref; - } ocs->lock.unlock(); } auto pn = --put_nref; ~~~~~ --- Additional comment from Sunil Kumar Acharya on 2023-04-17 15:43:34 UTC --- Moving this out of 4.13 as the issue still need to be picked up for investigation. Please feel free to propose it back to 4.13 if this turns out to be a blocker bz with a comment.