Bug 2175257

Summary: Bluestore OSD crash with bstore_kv_final Segmentation Fault
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Scott Nipp <snipp>
Component: RADOSAssignee: Adam Kupczyk <akupczyk>
Status: CLOSED NEXTRELEASE QA Contact: Pawan <pdhiran>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.2CC: bhubbard, ceph-eng-bugs, cephqe-warriors, nojha, roemerso, vumrao
Target Milestone: ---Flags: snipp: needinfo-
Target Release: 7.1   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-07-10 13:10:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Scott Nipp 2023-03-03 16:57:44 UTC
Description of problem:
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]: *** Caught signal (Segmentation fault) **
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  in thread 7f5d76557700 thread_name:bstore_kv_final
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  ceph version 16.2.8-85.el8cp (0bdc6db9a80af40dd496b05674a938d406a9f6f5) pacific (stable)
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  1: /lib64/libpthread.so.0(+0x12ce0) [0x7f5d85ff4ce0]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  2: (ceph::buffer::v15_2_0::ptr::release()+0x13) [0x56427437a773]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  3: (BlueStore::Onode::put()+0x1a9) [0x5642740068a9]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  4: (std::_Rb_tree<boost::intrusive_ptr<BlueStore::Onode>, boost::intrusive_ptr<BlueStore::Onode>, std::_Identity<boost::intrusive_ptr<BlueStore::Onode> >, std::less<boost::intrusive_ptr<BlueStore::Onode> >, std::allocator<boost::intrusive_ptr<BlueStore::Onode> > >::_M_erase(std::_Rb_tree_node<boost::intrusive_ptr<BlueStore::Onode> >*)+0x31) [0x5642740bd121]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  5: (BlueStore::TransContext::~TransContext()+0x12f) [0x5642740bd44f]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  6: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x23e) [0x5642740688ee]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  7: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x257) [0x564274074e57]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  8: (BlueStore::_kv_finalize_thread()+0x54e) [0x5642740763fe]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  9: (BlueStore::KVFinalizeThread::entry()+0x11) [0x5642740c1271]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  10: /lib64/libpthread.so.0(+0x81cf) [0x7f5d85fea1cf]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  11: clone()
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]: debug 2023-03-02T04:56:22.226+0000 7f5d76557700 -1 *** Caught signal (Segmentation fault) **
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  in thread 7f5d76557700 thread_name:bstore_kv_final


Version-Release number of selected component (if applicable):
5.2

How reproducible:
Single occurrence

Steps to Reproduce:
1.
2.
3.

Actual results:
OSD crashed and restart in 1 minute due to seg fault

Expected results:
No seg fault

Additional info:
ONode ref counting is broken - BACKPORT #58676

Comment 1 Scott Nipp 2023-03-08 20:58:38 UTC
Customer encountered another abrupt crash and restart of another OSD on another node in the cluster.  Both of the OSDs in question are SSD as opposed to HDDs which is the bulk of their environment.


2023-03-07T16:04:03.859+0000 7f23fcaed700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f23fcaed700 thread_name:bstore_kv_final

 ceph version 16.2.8-85.el8cp (0bdc6db9a80af40dd496b05674a938d406a9f6f5) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12ce0) [0x7f240c58ace0]
 2: (BlueStore::Onode::put()+0x193) [0x55868061b893]
 3: (std::_Rb_tree<boost::intrusive_ptr<BlueStore::Onode>, boost::intrusive_ptr<BlueStore::Onode>, std::_Identity<boost::intrusive_p
tr<BlueStore::Onode> >, std::less<boost::intrusive_ptr<BlueStore::Onode> >, std::allocator<boost::intrusive_ptr<BlueStore::Onode> > 
>::_M_erase(std::_Rb_tree_node<boost::intrusive_ptr<BlueStore::Onode> >*)+0x31) [0x5586806d2121]
 4: (BlueStore::TransContext::~TransContext()+0x122) [0x5586806d2442]
 5: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x23e) [0x55868067d8ee]
 6: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x257) [0x558680689e57]
 7: (BlueStore::_kv_finalize_thread()+0x54e) [0x55868068b3fe]
 8: (BlueStore::KVFinalizeThread::entry()+0x11) [0x5586806d6271]
 9: /lib64/libpthread.so.0(+0x81cf) [0x7f240c5801cf]
 10: clone()