Bug 1687039 - osd/PG.cc: account for missing set irrespective of last_complete
Summary: osd/PG.cc: account for missing set irrespective of last_complete
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS
Version: 3.2
Hardware: Unspecified
OS: Unspecified
Target Milestone: z2
: 3.2
Assignee: Neha Ojha
QA Contact: Manohar Murthy
Depends On:
TreeView+ depends on / blocked
Reported: 2019-03-09 00:42 UTC by Neha Ojha
Modified: 2019-05-01 17:24 UTC (History)
10 users (show)

Fixed In Version: RHEL: ceph-12.2.8-113.el7cp Ubuntu: ceph_12.2.8-96redhat1xenial
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-04-30 15:57:07 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 37919 None None None 2019-03-09 00:42:10 UTC
Github ceph ceph pull 26236 None None None 2019-03-09 00:43:55 UTC
Red Hat Product Errata RHSA-2019:0911 None None None 2019-04-30 15:57:22 UTC

Description Neha Ojha 2019-03-09 00:42:10 UTC
Description of problem:

When adding source info from another OSD, check if an object that needs
recovery is present in its missing set. If yes, do not include the OSD
as a missing loc else we might end with a following failure

/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-2590-gea1c8ca/rpm/el7/BUILD/ceph-14.0.1-2590-gea1c8ca/src/osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing(hoid))

 ceph version 14.0.1-2590-gea1c8ca (ea1c8caf95758ce122b97d7d708086b9eff3187f) nautilus (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x55bc80abc070]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55bc80abc23e]
 3: (ECBackend::get_all_avail_shards(hobject_t const&, std::set<pg_shard_t, std::less<pg_shard_t>, std::allocator<pg_shard_t> > const&, std::set<int, std::less<int>, std::allocator<int> >&, std::map<shard_id_t, pg_shard_t, std::less<shard_id_t>, std::allocator<std::pair<shard_id_t const, pg_shard_t> > >&, bool)+0xc
1c) [0x55bc80f0a5cc]
 4: (ECBackend::get_min_avail_to_read_shards(hobject_t const&, std::set<int, std::less<int>, std::allocator<int> > const&, bool, bool, std::map<pg_shard_t, std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >, std::less<pg_shard_t>, std::allocator<std::pair<pg_shard_t const, std::vector<std::pair
<int, int>, std::allocator<std::pair<int, int> > > > > >*)+0x104) [0x55bc80f0a734]
 5: (ECBackend::continue_recovery_op(ECBackend::RecoveryOp&, RecoveryMessages*)+0x313) [0x55bc80f0f433]
 6: (ECBackend::run_recovery_op(PGBackend::RecoveryHandle*, int)+0x1457) [0x55bc80f137e7]
 7: (PrimaryLogPG::maybe_kick_recovery(hobject_t const&)+0x27d) [0x55bc80d66fad]
 8: (PrimaryLogPG::wait_for_degraded_object(hobject_t const&, boost::intrusive_ptr<OpRequest>)+0x48) [0x55bc80d673a8]
 9: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x1949) [0x55bc80da7169]
 10: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xbd4) [0x55bc80daac14]
 11: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1a9) [0x55bc80bf61d9]
 12: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x55bc80e80832]
 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xa0c) [0x55bc80c0f7cc]
 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) [0x55bc8120ab53]
 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55bc8120dbf0]
 16: (()+0x7e25) [0x7f61fea2ee25]
 17: (clone()+0x6d) [0x7f61fd8f7bad]

Comment 10 errata-xmlrpc 2019-04-30 15:57:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.