Bug 1678470

Summary: BlueStore OSD crashes in _do_read - BlueStore::_do_read
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vikhyat Umrao <vumrao>
Component: RADOSAssignee: Neha Ojha <nojha>
Status: CLOSED ERRATA QA Contact: Manohar Murthy <mmurthy>
Severity: medium Docs Contact: John Brier <jbrier>
Priority: medium    
Version: 3.1CC: agunn, akupczyk, anharris, ceph-eng-bugs, dzafman, jbrier, kchai, nojha, rzarzyns, tchandra, tpetr, tserlin, vumrao
Target Milestone: z2Keywords: CodeChange
Target Release: 3.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-12.2.8-113.el7cp Ubuntu: ceph_12.2.8-96redhat1xenial Doc Type: Bug Fix
Doc Text:
.An OSD daemon no longer crashes when a block device has read errors Previously, an OSD daemon would crash when a block device had read errors, because the daemon expected only a general EIO error code, not the more specific errors the kernel generates. With this release, low-level errors are mapped to EIO, resulting in an OSD daemon not crashing because of an unrecognized error code.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-30 15:56:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1629656    

Description Vikhyat Umrao 2019-02-18 20:55:48 UTC
Description of problem:
BlueStore OSD crashes in _do_read - BlueStore::_do_read

2019-02-15 21:42:51.930852 7fbc3ea6d700 -1 *** Caught signal (Aborted) **
 in thread 7fbc3ea6d700 thread_name:tp_osd_tp

 ceph version 12.2.5-45redhat1xenial (d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)
 1: (()+0xa7e5d4) [0x55dc7ecda5d4]
 2: (()+0x11390) [0x7fbc57fd4390]
 3: (pread64()+0x33) [0x7fbc57fd3d43]
 4: (KernelDevice::read(unsigned long, unsigned long, ceph::buffer::list*, IOContext*, bool)+0x31d) [0x55dc7ecbb91d]
 5: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x1b4a) [0x55dc7ebc85aa]
 6: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x4d7) [0x55dc7ebca377]
 7: (ReplicatedBackend::be_deep_scrub(hobject_t const&, unsigned int, ScrubMap::object&, ThreadPool::TPHandle&, ScrubMap*)+0x204) [0x55dc7ea2b314]
 8: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t, std::allocator<hobject_t> > const&, bool, unsigned int, ThreadPool::TPHandle&)+0x3ec) [0x55dc7e937b8c]
 9: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x239) [0x55dc7e7d1759]
 10: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x65d) [0x55dc7e7d213d]
 11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x82d) [0x55dc7e89bc8d]
 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9) [0x55dc7e711fa9]
 13: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x55dc7e9bc477]
 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1047) [0x55dc7e740357]
 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x55dc7ed220a4]
 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55dc7ed250e0]
 17: (()+0x76ba) [0x7fbc57fca6ba]
 18: (clone()+0x6d) [0x7fbc5704141d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 3.1
12.2.5-45redhat1xenial

Comment 19 Neha Ojha 2019-04-15 17:50:59 UTC
That sounds right.

Comment 21 John Brier 2019-04-29 19:39:42 UTC
Neha,

Can you fill out the Doc Text? I will use this to write a description of the bug/fix for the 3.2z2 Release notes. The Doc Text field is on the right side of the details of the bug at the top. Please fill out the "Cause: Consequence: Fix: Result:"

Comment 24 errata-xmlrpc 2019-04-30 15:56:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911