Bug 2175257 - Bluestore OSD crash with bstore_kv_final Segmentation Fault
Summary: Bluestore OSD crash with bstore_kv_final Segmentation Fault
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 5.2
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
: 7.1
Assignee: Adam Kupczyk
QA Contact: Pawan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-03 16:57 UTC by Scott Nipp
Modified: 2023-07-14 15:01 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-10 13:10:47 UTC
Embargoed:
snipp: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 50072 0 None open pacific: os/bluestore: fix onode ref counting 2023-03-03 16:57:43 UTC
Red Hat Issue Tracker RHCEPH-6227 0 None None None 2023-03-03 16:58:32 UTC

Description Scott Nipp 2023-03-03 16:57:44 UTC
Description of problem:
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]: *** Caught signal (Segmentation fault) **
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  in thread 7f5d76557700 thread_name:bstore_kv_final
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  ceph version 16.2.8-85.el8cp (0bdc6db9a80af40dd496b05674a938d406a9f6f5) pacific (stable)
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  1: /lib64/libpthread.so.0(+0x12ce0) [0x7f5d85ff4ce0]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  2: (ceph::buffer::v15_2_0::ptr::release()+0x13) [0x56427437a773]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  3: (BlueStore::Onode::put()+0x1a9) [0x5642740068a9]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  4: (std::_Rb_tree<boost::intrusive_ptr<BlueStore::Onode>, boost::intrusive_ptr<BlueStore::Onode>, std::_Identity<boost::intrusive_ptr<BlueStore::Onode> >, std::less<boost::intrusive_ptr<BlueStore::Onode> >, std::allocator<boost::intrusive_ptr<BlueStore::Onode> > >::_M_erase(std::_Rb_tree_node<boost::intrusive_ptr<BlueStore::Onode> >*)+0x31) [0x5642740bd121]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  5: (BlueStore::TransContext::~TransContext()+0x12f) [0x5642740bd44f]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  6: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x23e) [0x5642740688ee]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  7: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x257) [0x564274074e57]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  8: (BlueStore::_kv_finalize_thread()+0x54e) [0x5642740763fe]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  9: (BlueStore::KVFinalizeThread::entry()+0x11) [0x5642740c1271]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  10: /lib64/libpthread.so.0(+0x81cf) [0x7f5d85fea1cf]
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  11: clone()
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]: debug 2023-03-02T04:56:22.226+0000 7f5d76557700 -1 *** Caught signal (Segmentation fault) **
Mar 01 23:56:22 host1 ceph-b613d998-5548-11ed-a037-88e9a4423a75-osd-85[12816]:  in thread 7f5d76557700 thread_name:bstore_kv_final


Version-Release number of selected component (if applicable):
5.2

How reproducible:
Single occurrence

Steps to Reproduce:
1.
2.
3.

Actual results:
OSD crashed and restart in 1 minute due to seg fault

Expected results:
No seg fault

Additional info:
ONode ref counting is broken - BACKPORT #58676

Comment 1 Scott Nipp 2023-03-08 20:58:38 UTC
Customer encountered another abrupt crash and restart of another OSD on another node in the cluster.  Both of the OSDs in question are SSD as opposed to HDDs which is the bulk of their environment.


2023-03-07T16:04:03.859+0000 7f23fcaed700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f23fcaed700 thread_name:bstore_kv_final

 ceph version 16.2.8-85.el8cp (0bdc6db9a80af40dd496b05674a938d406a9f6f5) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12ce0) [0x7f240c58ace0]
 2: (BlueStore::Onode::put()+0x193) [0x55868061b893]
 3: (std::_Rb_tree<boost::intrusive_ptr<BlueStore::Onode>, boost::intrusive_ptr<BlueStore::Onode>, std::_Identity<boost::intrusive_p
tr<BlueStore::Onode> >, std::less<boost::intrusive_ptr<BlueStore::Onode> >, std::allocator<boost::intrusive_ptr<BlueStore::Onode> > 
>::_M_erase(std::_Rb_tree_node<boost::intrusive_ptr<BlueStore::Onode> >*)+0x31) [0x5586806d2121]
 4: (BlueStore::TransContext::~TransContext()+0x122) [0x5586806d2442]
 5: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x23e) [0x55868067d8ee]
 6: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x257) [0x558680689e57]
 7: (BlueStore::_kv_finalize_thread()+0x54e) [0x55868068b3fe]
 8: (BlueStore::KVFinalizeThread::entry()+0x11) [0x5586806d6271]
 9: /lib64/libpthread.so.0(+0x81cf) [0x7f240c5801cf]
 10: clone()


Note You need to log in before you can comment on or make changes to this bug.