Bug 1678470 - BlueStore OSD crashes in _do_read - BlueStore::_do_read
Summary: BlueStore OSD crashes in _do_read - BlueStore::_do_read
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS
Version: 3.1
Hardware: x86_64
OS: Linux
Target Milestone: z2
: 3.2
Assignee: Neha Ojha
QA Contact: Manohar Murthy
John Brier
Depends On:
Blocks: 1629656
TreeView+ depends on / blocked
Reported: 2019-02-18 20:55 UTC by Vikhyat Umrao
Modified: 2019-04-30 16:22 UTC (History)
13 users (show)

Fixed In Version: RHEL: ceph-12.2.8-113.el7cp Ubuntu: ceph_12.2.8-96redhat1xenial
Doc Type: Bug Fix
Doc Text:
.An OSD daemon no longer crashes when a block device has read errors Previously, an OSD daemon would crash when a block device had read errors, because the daemon expected only a general EIO error code, not the more specific errors the kernel generates. With this release, low-level errors are mapped to EIO, resulting in an OSD daemon not crashing because of an unrecognized error code.
Clone Of:
Last Closed: 2019-04-30 15:56:46 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Github ceph ceph pull 25855 None closed luminous: os/bluestore: KernelDevice::read() does the EIO mapping now. 2020-06-12 02:15:31 UTC
Red Hat Product Errata RHSA-2019:0911 None None None 2019-04-30 15:57:00 UTC

Description Vikhyat Umrao 2019-02-18 20:55:48 UTC
Description of problem:
BlueStore OSD crashes in _do_read - BlueStore::_do_read

2019-02-15 21:42:51.930852 7fbc3ea6d700 -1 *** Caught signal (Aborted) **
 in thread 7fbc3ea6d700 thread_name:tp_osd_tp

 ceph version 12.2.5-45redhat1xenial (d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)
 1: (()+0xa7e5d4) [0x55dc7ecda5d4]
 2: (()+0x11390) [0x7fbc57fd4390]
 3: (pread64()+0x33) [0x7fbc57fd3d43]
 4: (KernelDevice::read(unsigned long, unsigned long, ceph::buffer::list*, IOContext*, bool)+0x31d) [0x55dc7ecbb91d]
 5: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x1b4a) [0x55dc7ebc85aa]
 6: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x4d7) [0x55dc7ebca377]
 7: (ReplicatedBackend::be_deep_scrub(hobject_t const&, unsigned int, ScrubMap::object&, ThreadPool::TPHandle&, ScrubMap*)+0x204) [0x55dc7ea2b314]
 8: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t, std::allocator<hobject_t> > const&, bool, unsigned int, ThreadPool::TPHandle&)+0x3ec) [0x55dc7e937b8c]
 9: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x239) [0x55dc7e7d1759]
 10: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x65d) [0x55dc7e7d213d]
 11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x82d) [0x55dc7e89bc8d]
 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9) [0x55dc7e711fa9]
 13: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x55dc7e9bc477]
 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1047) [0x55dc7e740357]
 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x55dc7ed220a4]
 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55dc7ed250e0]
 17: (()+0x76ba) [0x7fbc57fca6ba]
 18: (clone()+0x6d) [0x7fbc5704141d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 3.1

Comment 19 Neha Ojha 2019-04-15 17:50:59 UTC
That sounds right.

Comment 21 John Brier 2019-04-29 19:39:42 UTC

Can you fill out the Doc Text? I will use this to write a description of the bug/fix for the 3.2z2 Release notes. The Doc Text field is on the right side of the details of the bug at the top. Please fill out the "Cause: Consequence: Fix: Result:"

Comment 24 errata-xmlrpc 2019-04-30 15:56:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.