Bug 1343235
Summary: | [RH Ceph 2.0 / 10.2.1-7.el7cp ] tcmalloc/libc deadlock causes rados thrash ec profile k=2,m=1 osd restart failed/timeout | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Vasu Kulkarni <vakulkar> |
Component: | RADOS | Assignee: | Samuel Just <sjust> |
Status: | CLOSED ERRATA | QA Contact: | Vasu Kulkarni <vakulkar> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 2.0 | CC: | ceph-eng-bugs, ceph-qe-bugs, dzafman, gmeno, hnallurv, kchai, kdreyer, sjust, vakulkar |
Target Milestone: | rc | ||
Target Release: | 2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | gperftools-2.4-8.el7 ceph-10.2.2-19.el7cp | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-08-23 19:40:57 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Vasu Kulkarni
2016-06-06 21:43:02 UTC
In both cases, it looks like the osd started for a bit and then the process ended or hung, no evidence of a crash. I also notice that in both cases, it looks like the class loading thread is hung. That probably makes this a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1334636, except that the workaround won't help here. Vasu: can you confirm by confirming that 1) while stuck like this the process is running and 2) attaching gdb while stuck like this and getting a dump of all threads You'll probably have to increase the timeout to give yourself time to get to the process. A rerun of the tests has actually passed, i am going to retry few more times and try to attach gdb when the process is stuck rerun logs: http://pulpito.ceph.redhat.com/vasu-2016-06-09_13:43:05-rados-jewel---basic-clara/ The plan is for vasu to do a bunch more runs over the weekend so we can get an idea of the frequency, and if it recurs, hopefully get the gdb backtraces so we can confirm. Unlike with the hung ceph-disk version of this bug, we probably don't have a good workaround (although I guess we could hack up the systemd start script to automatically restart the osd process if the admin socket doesn't appear after a set amount of time?). The "fix" would be to ship a version of tcmalloc without this problem. Marking needs-info on vasu until we get more frequency information. Over the weekend I think I hit this 3 times in around 60+ runs, so the frequency is quite good enough to get the logs, I think the image is stripped of debug symbols, but regardless the backtrace is (no debugging symbols found)...done. Loaded symbols for /usr/lib64/rados-classes/libcls_replica_log.so Reading symbols from /usr/lib64/rados-classes/libcls_log.so...Reading symbols from /usr/lib64/rados-classes/libcls_log.so...(no debugging symbols found)...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib64/rados-classes/libcls_log.so Reading symbols from /usr/lib64/rados-classes/libcls_cephfs.so...Reading symbols from /usr/lib64/rados-classes/libcls_cephfs.so...(no debugging symbols found)...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib64/rados-classes/libcls_cephfs.so Reading symbols from /usr/lib64/rados-classes/libcls_rbd.so...Reading symbols from /usr/lib64/rados-classes/libcls_rbd.so...(no debugging symbols found)...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib64/rados-classes/libcls_rbd.so Reading symbols from /usr/lib64/rados-classes/libcls_timeindex.so...Reading symbols from /usr/lib64/rados-classes/libcls_timeindex.so...(no debugging symbols found)...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib64/rados-classes/libcls_timeindex.so Reading symbols from /usr/lib64/rados-classes/libcls_version.so...Reading symbols from /usr/lib64/rados-classes/libcls_version.so...(no debugging symbols found)...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib64/rados-classes/libcls_version.so Reading symbols from /usr/lib64/rados-classes/libcls_lock.so...Reading symbols from /usr/lib64/rados-classes/libcls_lock.so...(no debugging symbols found)...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib64/rados-classes/libcls_lock.so Reading symbols from /usr/lib64/rados-classes/libcls_hello.so...Reading symbols from /usr/lib64/rados-classes/libcls_hello.so...(no debugging symbols found)...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib64/rados-classes/libcls_hello.so Reading symbols from /usr/lib64/rados-classes/libcls_statelog.so...Reading symbols from /usr/lib64/rados-classes/libcls_statelog.so...(no debugging symbols found)...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib64/rados-classes/libcls_statelog.so Reading symbols from /usr/lib64/rados-classes/libcls_user.so...Reading symbols from /usr/lib64/rados-classes/libcls_user.so...(no debugging symbols found)...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib64/rados-classes/libcls_user.so Reading symbols from /usr/lib64/rados-classes/libcls_journal.so...Reading symbols from /usr/lib64/rados-classes/libcls_journal.so...(no debugging symbols found)...done. (no debugging symbols found)...done. Loaded symbols for /usr/lib64/rados-classes/libcls_journal.so 0x00007f1ed6b02573 in base::internal::SpinLockDelay(int volatile*, int, int) () from /lib64/libtcmalloc.so.4 Missing separate debuginfos, use: debuginfo-install ceph-osd-10.2.1-7.el7cp.x86_64 (gdb) thread apply all bt Thread 28 (Thread 0x7f1ed29ac700 (LWP 1016329)): #0 0x00007f1ed58ab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1ed78bd20b in ceph::log::Log::entry() () #2 0x00007f1ed58a7dc5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f1ed3f32ced in clone () from /lib64/libc.so.6 Thread 27 (Thread 0x7f1ed1a38700 (LWP 1016330)): #0 0x00007f1ed6b02573 in base::internal::SpinLockDelay(int volatile*, int, int) () from /lib64/libtcmalloc.so.4 #1 0x00007f1ed6b02447 in SpinLock::SlowLock() () from /lib64/libtcmalloc.so.4 #2 0x00007f1ed6af6958 in tcmalloc::ThreadCache::CreateCacheIfNecessary() () from /lib64/libtcmalloc.so.4 #3 0x00007f1ed6ae4f15 in (anonymous namespace)::do_malloc_no_errno(unsigned long) [clone .part.15] () from /lib64/libtcmalloc.so.4 #4 0x00007f1ed6b067b0 in tc_new () from /lib64/libtcmalloc.so.4 #5 0x00007f1ed78bc6cd in ceph::log::Log::create_entry(int, int, unsigned long*) () #6 0x00007f1ed78b4a0f in ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, long) () #7 0x00007f1ed78b544e in ceph::HeartbeatMap::is_healthy() () #8 0x00007f1ed78b5c2c in ceph::HeartbeatMap::check_touch_file() () #9 0x00007f1ed79903cb in CephContextServiceThread::entry() () #10 0x00007f1ed58a7dc5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f1ed3f32ced in clone () from /lib64/libc.so.6 Thread 26 (Thread 0x7f1ed1237700 (LWP 1016331)): #0 0x00007f1ed6b02573 in base::internal::SpinLockDelay(int volatile*, int, int) () from /lib64/libtcmalloc.so.4 #1 0x00007f1ed6b02447 in SpinLock::SlowLock() () from /lib64/libtcmalloc.so.4 #2 0x00007f1ed6af3078 in tcmalloc::CentralFreeList::Populate() () from /lib64/libtcmalloc.so.4 #3 0x00007f1ed6af3148 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4 #4 0x00007f1ed6af31dd in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4 #5 0x00007f1ed6af6235 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /lib64/libtcmalloc.so.4 #6 0x00007f1ed6b06718 in tc_new () from /lib64/libtcmalloc.so.4 #7 0x00007f1ed79c3cd6 in boost::spirit::classic::rule<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>& boost::spirit::classic::rule<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>::operator=<boost::spirit::classic::sequence<boost::spirit::classic::rule<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::kleene_star<boost::spirit::classic::alternative<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::rule<boost::spirit::classic::scanner<__gnu_cxx::__normal_iterator<char const*, std::string>, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t> >, boost::spirit::classic::chlit<char> > > > >(boost::spirit::classic::sequence<boost::spirit::classic::rule<boost::spiri---Type <return> to continue, or q <return> to quit--- more logs: http://magna002.ceph.redhat.com/vasu-2016-06-11_14:40:22-rados-jewel---basic-clara/238562/teuthology.log [ubuntu@clara010 ~]$ sudo ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 6.00000 root default -3 6.00000 rack localrack -2 6.00000 host localhost 0 1.00000 osd.0 up 1.00000 0.33659 1 1.00000 osd.1 up 1.00000 1.00000 2 1.00000 osd.2 up 1.00000 1.00000 3 1.00000 osd.3 down 1.00000 0.88971 4 1.00000 osd.4 up 1.00000 0 5 1.00000 osd.5 up 1.00000 1.00000 This only includes the first 3 threads... Can you include all of them? At the bottom, it mentions that you need to hit <return> to continue to the next page. It also indicates how to get the debug symbols: Missing separate debuginfos, use: debuginfo-install ceph-osd-10.2.1-7.el7cp.x86_64 Threads 27 and 28 linked above, however, do appear to be hung on a tcmalloc lock, which would be consistent with the tcmalloc bug. Vasu: to confirm, please reproduce again and include *all* of the thread backtraces. I think that this data suggests we can responsibly exclude this issue from blocking 2.0GA Linking the tcmalloc fixup bz. From clara003 for OSD.4 ( This is a different node ) #8 0x00007f5640b4b8ab in std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) () from /lib64/libstdc++.so.6 #9 0x00007f5640b4b954 in std::string::reserve(unsigned long) () from /lib64/libstdc++.so.6 #10 0x00007f5640b4bbbf in std::string::append(char const*, unsigned long) () from /lib64/libstdc++.so.6 #11 0x00007f5640b03864 in std::basic_istream<char, std::char_traits<char> >& std::getline<char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, char) () from /lib64/libstdc++.so.6 #12 0x00007f5643d0bce4 in dump_cmd_to_json(ceph::Formatter*, std::string const&) () #13 0x00007f5643d0cb4c in dump_cmd_and_help_to_json(ceph::Formatter*, std::string const&, std::string const&, std::string const&) () #14 0x00007f5643cead04 in GetdescsHook::call(std::string, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::less<std::string>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> > > >&, std::string, ceph::buffer::list&) () #15 0x00007f5643ce2f91 in AdminSocket::do_accept() () #16 0x00007f5643ce4948 in AdminSocket::entry() () #17 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #18 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 25 (Thread 0x7f563c241700 (LWP 361636)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5640b3e9ec in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6 #2 0x00007f56436d1a09 in ceph::timer_detail::timer<ceph::time_detail::mono_clock>::timer_thread() () #3 0x00007f5640b42220 in ?? () from /lib64/libstdc++.so.6 #4 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 24 (Thread 0x7f563ba40700 (LWP 361637)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643cc609f in SimpleMessenger::reaper_entry() () #2 0x00007f5643ccb9dd in SimpleMessenger::ReaperThread::entry() () #3 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 23 (Thread 0x7f563b23f700 (LWP 361638)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643cc609f in SimpleMessenger::reaper_entry() () #2 0x00007f5643ccb9dd in SimpleMessenger::ReaperThread::entry() () #3 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 22 (Thread 0x7f563aa3e700 (LWP 361639)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643cc609f in SimpleMessenger::reaper_entry() () #2 0x00007f5643ccb9dd in SimpleMessenger::ReaperThread::entry() () #3 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 21 (Thread 0x7f563a23d700 (LWP 361640)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643cc609f in SimpleMessenger::reaper_entry() () #2 0x00007f5643ccb9dd in SimpleMessenger::ReaperThread::entry() () #3 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 20 (Thread 0x7f5639a3c700 (LWP 361641)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #1 0x00007f5643cc609f in SimpleMessenger::reaper_entry() () #2 0x00007f5643ccb9dd in SimpleMessenger::ReaperThread::entry() () #3 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 19 (Thread 0x7f563923b700 (LWP 361642)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643cc609f in SimpleMessenger::reaper_entry() () #2 0x00007f5643ccb9dd in SimpleMessenger::ReaperThread::entry() () #3 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 18 (Thread 0x7f5638a3a700 (LWP 361643)): #0 0x00007f5641c24f4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f5641c20d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007f5641c20c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f5643c9cff8 in Mutex::Lock(bool) () #4 0x00007f5643cd8152 in SafeTimer::timer_thread() () #5 0x00007f5643cd9c5d in SafeTimerThread::entry() () #6 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 17 (Thread 0x7f5638239700 (LWP 361644)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643cd861f in SafeTimer::timer_thread() () #2 0x00007f5643cd9c5d in SafeTimerThread::entry() () #3 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 16 (Thread 0x7f5637a38700 (LWP 361645)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643cd861f in SafeTimer::timer_thread() () #2 0x00007f5643cd9c5d in SafeTimerThread::entry() () #3 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 15 (Thread 0x7f5637237700 (LWP 361646)): #0 0x00007f5641c24f4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f5641c20d1d in _L_lock_840 () from /lib64/libpthread.so.0 #2 0x00007f5641c20c3a in pthread_mutex_lock () from /lib64/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #3 0x00007f56430ba029 in tls_get_addr_tail () from /lib64/ld-linux-x86-64.so.2 #4 0x00007f5642e79849 in GetStackTrace_libunwind(void**, int, int) () from /lib64/libtcmalloc.so.4 #5 0x00007f5642e7a0be in GetStackTrace(void**, int, int) () from /lib64/libtcmalloc.so.4 #6 0x00007f5642e6b314 in tcmalloc::PageHeap::GrowHeap(unsigned long) () from /lib64/libtcmalloc.so.4 #7 0x00007f5642e6b633 in tcmalloc::PageHeap::New(unsigned long) () from /lib64/libtcmalloc.so.4 #8 0x00007f5642e69f64 in tcmalloc::CentralFreeList::Populate() () from /lib64/libtcmalloc.so.4 #9 0x00007f5642e6a148 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4 #10 0x00007f5642e6a1dd in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4 #11 0x00007f5642e6d235 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /lib64/libtcmalloc.so.4 #12 0x00007f5642e7df58 in tc_newarray () from /lib64/libtcmalloc.so.4 #13 0x00007f56425d6878 in snappy::Compress(snappy::Source*, snappy::Sink*) () from /lib64/libsnappy.so.1 #14 0x00007f56425d7271 in snappy::RawCompress(char const*, unsigned long, char*, unsigned long*) () from /lib64/libsnappy.so.1 #15 0x00007f56428158d4 in leveldb::TableBuilder::WriteBlock(leveldb::BlockBuilder*, leveldb::BlockHandle*) () from /lib64/libleveldb.so.1 #16 0x00007f5642815a64 in leveldb::TableBuilder::Flush() () from /lib64/libleveldb.so.1 #17 0x00007f5642815bfb in leveldb::TableBuilder::Add(leveldb::Slice const&, leveldb::Slice const&) () from /lib64/libleveldb.so.1 #18 0x00007f56427f710b in leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*) () from /lib64/libleveldb.so.1 #19 0x00007f56427f78c3 in leveldb::DBImpl::BackgroundCompaction() () from /lib64/libleveldb.so.1 #20 0x00007f56427f84b0 in leveldb::DBImpl::BackgroundCall() () from /lib64/libleveldb.so.1 #21 0x00007f56428195cb in leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper(void*) () from /lib64/libleveldb.so.1 #22 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #23 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 14 (Thread 0x7f56365a8700 (LWP 361647)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56439f0aff in WBThrottle::get_next_should_flush(boost::tuples::tuple<ghobject_t, std::shared_ptr<FDCache::FD>, WBThrottle::PendingWB, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type>*) () #2 0x00007f56439f1182 in WBThrottle::entry() () #3 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 13 (Thread 0x7f563505a700 (LWP 361648)): #0 0x00007f5642e79573 in base::internal::SpinLockDelay(int volatile*, int, int) () from /lib64/libtcmalloc.so.4 #1 0x00007f5642e79447 in SpinLock::SlowLock() () from /lib64/libtcmalloc.so.4 #2 0x00007f5642e6a078 in tcmalloc::CentralFreeList::Populate() () from /lib64/libtcmalloc.so.4 #3 0x00007f5642e6a148 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4 #4 0x00007f5642e6a1dd in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4 #5 0x00007f5642e6d235 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /lib64/libtcmalloc.so.4 #6 0x00007f5642e7d718 in tc_new () from /lib64/libtcmalloc.so.4 #7 0x00007f5643c336cd in ceph::log::Log::create_entry(int, int, unsigned long*) () #8 0x00007f56439b042d in FileStore::sync_entry() () ---Type <return> to continue, or q <return> to quit--- #9 0x00007f56439d977d in FileStore::SyncThread::entry() () #10 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 12 (Thread 0x7f5634859700 (LWP 361649)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f564369b95e in Cond::Wait(Mutex&) () #2 0x00007f5643a96604 in FileJournal::write_thread_entry() () #3 0x00007f56439d18dd in FileJournal::Writer::entry() () #4 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7f5634058700 (LWP 361650)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643c19e38 in Finisher::finisher_thread_entry() () #2 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x7f5633857700 (LWP 361651)): #0 0x00007f5641c22a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643ce02fa in ThreadPool::worker(ThreadPool::WorkThread*) () #2 0x00007f5643ce1590 in ThreadPool::WorkThread::entry() () #3 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x7f5633056700 (LWP 361652)): #0 0x00007f5641c22a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643ce02fa in ThreadPool::worker(ThreadPool::WorkThread*) () #2 0x00007f5643ce1590 in ThreadPool::WorkThread::entry() () #3 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7f5632855700 (LWP 361653)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643c19e38 in Finisher::finisher_thread_entry() () #2 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f5632054700 (LWP 361654)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #1 0x00007f5643c19e38 in Finisher::finisher_thread_entry() () #2 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f5631853700 (LWP 361655)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643c19e38 in Finisher::finisher_thread_entry() () #2 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f5631052700 (LWP 361656)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643c19e38 in Finisher::finisher_thread_entry() () #2 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f5630851700 (LWP 361657)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643c19e38 in Finisher::finisher_thread_entry() () #2 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f5630050700 (LWP 361658)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643c19e38 in Finisher::finisher_thread_entry() () #2 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f562f84f700 (LWP 361659)): #0 0x00007f5641c226d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5643cd861f in SafeTimer::timer_thread() () #2 0x00007f5643cd9c5d in SafeTimerThread::entry() () #3 0x00007f5641c1edc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56402a9ced in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f56432be800 (LWP 361628)): #0 0x00007f5642e79573 in base::internal::SpinLockDelay(int volatile*, int, int) () from /lib64/libtcmalloc.so.4 #1 0x00007f5642e79447 in SpinLock::SlowLock() () from /lib64/libtcmalloc.so.4 #2 0x00007f5642e6a078 in tcmalloc::CentralFreeList::Populate() () from /lib64/libtcmalloc.so.4 #3 0x00007f5642e6a148 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4 ---Type <return> to continue, or q <return> to quit--- #4 0x00007f5642e6a1dd in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4 #5 0x00007f5642e6d235 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /lib64/libtcmalloc.so.4 #6 0x00007f5642e7d718 in tc_new () from /lib64/libtcmalloc.so.4 #7 0x00007f5640b4ac69 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /lib64/libstdc++.so.6 #8 0x00007f5640b4c521 in char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) () from /lib64/libstdc++.so.6 #9 0x00007f5640b4c958 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) () from /lib64/libstdc++.so.6 #10 0x00007f562ec0e1ea in ?? () from /usr/lib64/rados-classes/libcls_rbd.so #11 0x00007f56430c83a3 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2 #12 0x00007f56430ccab6 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2 #13 0x00007f56430c81b4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 #14 0x00007f56430cc1ab in _dl_open () from /lib64/ld-linux-x86-64.so.2 #15 0x00007f5641a1402b in dlopen_doit () from /lib64/libdl.so.2 #16 0x00007f56430c81b4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 #17 0x00007f5641a1462d in _dlerror_run () from /lib64/libdl.so.2 #18 0x00007f5641a140c1 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2 #19 0x00007f56436eb891 in ClassHandler::_load_class(ClassHandler::ClassData*) () #20 0x00007f56436ebc9f in ClassHandler::open_class(std::string const&, ClassHandler::ClassData**) () #21 0x00007f56436ebf79 in ClassHandler::open_all_classes() () #22 0x00007f5643681dbc in OSD::init() () #23 0x00007f56435e5c15 in main () (gdb) (gdb) t1: dlopen CentralFreeList::Populate() t15: CentralFreeList::Populate() tls_get_addr_tail() So yeah, this appears to be exactly the same as the linked bugs. The bug is in tcmalloc and the right fix would be to ship a patched tcmalloc (detailed in the upstream bug). Verified in 10.2.2-21 and 22. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html |