| Summary: | 0.34-17 segfault under load in HA and lot of QMF schema queries | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Pavel Moravec <pmoravec> |
| Component: | qpid-cpp | Assignee: | messaging-bugs <messaging-bugs> |
| Status: | POST --- | QA Contact: | Messaging QE <messaging-qe-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.2 | CC: | jross, mcressma |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Mine analysis: - many threads (# 2-4,6-10 and 15) are processing QMF "get list of bindings" request - thread 14 routing a message via TopicExchange - thread 10 doing periodicProcessing - thread 5 creates a subscription for HA backup broker - thread 1 segfaults when calculating encodedSize of ExecutionSyncBody frame - inside some std::string stuff, but ExecutionSyncBody has quite static size not depending on any string It smells me like QMF requests & periodicProcessing caused some memory corruption that caused encodedSize to segfault.. Theoretical fix at: http://git.app.eng.bos.redhat.com/git/rh-qpid.git/commit/?h=0.34-mrg-aconway-bz1368196 Not proven. The fix will log 'warning epoll:' messages if we hit any of the fixed codepaths, so if the reproducer doesn't crash and we see those log messages, then we are probably in good shape. Bug 1368196 - Fix theoretcial race condition in EpollPoller Theoretical fix for apparent race condition that allows AsynchIOHandler connection resources to be deleted while still in use. The theory: To safely delete a PollerHandle it must be: 1. not concurrently be in use by any worker thread. 2. un-registered from epoll. 3. not eligible to be re-activated in epoll. DeletionManager only enforces 1. There is no enforcement of 2 or 3 in EpollPoller. There are assert() macros showing that EpollPoller.cpp *assumes* that 2 & 3 hold, but these have no effect in a release build if they are incorrect. This patch replaces the asserts with if statements. If the assert would be true, the behavior is unchanged, otherwise a safe behavior is substituted. |
Description of problem: Under unknown scenario, but having a 0.34-17 broker: - in HA cluster with some backup nodes (the problematic is the primary broker) - under some stress test - processing a lot of QMF schema queries (>5 "get list of bindings" queries run in parallel) - doing periodicProcessing - sending execution.sync on a session to a backup broker Then calculating the encodedSize of ExecutionSyncBody segfaults with backtrace: #5 0x00000030c27940c2 in qpid::framing::AMQFrame::encodedSize (this=0x7f89a7ffd1a0) at /usr/src/debug/qpid-cpp-0.34/src/qpid/framing/AMQFrame.cpp:46 (full BT below) Version-Release number of selected component (if applicable): qpid-cpp-server-* 0.34-17 How reproducible: ??? Steps to Reproduce: 1. ??? Actual results: Expected results: Additional info: Backtrace: Thread 1 (Thread 0x7f89a7fff700 (LWP 28118)): #0 0x0000003d6129d28c in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6 #1 0x00000030c2778d40 in _Construct<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__first=<value optimized out>, __last="", __result=0x30c2a72630) at /usr/include/c++/4.4.7/bits/stl_construct.h:80 #2 uninitialized_copy<__gnu_cxx::__normal_iterator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const*, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >*> ( __first=<value optimized out>, __last="", __result=0x30c2a72630) at /usr/include/c++/4.4.7/bits/stl_uninitialized.h:74 #3 uninitialized_copy<__gnu_cxx::__normal_iterator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const*, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >*> ( __first=<value optimized out>, __last="", __result=0x30c2a72630) at /usr/include/c++/4.4.7/bits/stl_uninitialized.h:116 #4 std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const*, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__first=<value optimized out>, __last="", __result=0x30c2a72630) at /usr/include/c++/4.4.7/bits/stl_uninitialized.h:256 #5 0x00000030c27940c2 in qpid::framing::AMQFrame::encodedSize (this=0x7f89a7ffd1a0) at /usr/src/debug/qpid-cpp-0.34/src/qpid/framing/AMQFrame.cpp:46 #6 0x00000030c2784bbc in qpid::SessionState::senderRecord (this=0x2925120, f=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/SessionState.cpp:123 #7 0x00000030c2792b9d in qpid::amqp_0_10::SessionHandler::handleOut (this=0x2810870, f=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/amqp_0_10/SessionHandler.cpp:138 #8 0x00000030c27a7b7a in qpid::framing::Proxy::send (this=0x2810940, b=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/framing/Proxy.cpp:37 #9 0x00000030c26d7d09 in qpid::framing::AMQP_ClientProxy::Execution::sync (this=0x2810940) at /usr/src/debug/qpid-cpp-0.34/src/qpid/framing/AMQP_ClientProxy.cpp:178 #10 0x00000030c2eeb532 in qpid::broker::SessionState::deliver (this=<value optimized out>, message=..., destination=<value optimized out>, isRedelivered=false, ttl=0, acceptMode=<value optimized out>, acquireMode=qpid::framing::message::ACQUIRE_MODE_NOT_ACQUIRED, annotations=std::map with 0 elements, sync=true) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/SessionState.cpp:325 #11 0x00000030c2ecf368 in qpid::broker::SemanticStateConsumerImpl::deliver (this=0x2833a20, cursor=..., msg=..., consumer=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/SemanticState.cpp:396 #12 0x00007f89af4af5dc in qpid::ha::ReplicatingSubscription::sendEvent (this=0x2833a20, event=<value optimized out>) at /usr/src/debug/qpid-cpp-0.34/src/qpid/ha/ReplicatingSubscription.cpp:306 #13 0x00007f89af4af81f in qpid::ha::ReplicatingSubscription::sendIdEvent (this=<value optimized out>, pos=<value optimized out>, l=<value optimized out>) at /usr/src/debug/qpid-cpp-0.34/src/qpid/ha/ReplicatingSubscription.cpp:297 #14 0x00007f89af4b101b in qpid::ha::ReplicatingSubscription::deliver (this=0x2833a20, c=..., m=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/ha/ReplicatingSubscription.cpp:212 #15 0x00000030c2e32557 in qpid::broker::Queue::dispatch (this=<value optimized out>, c=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/Queue.cpp:507 #16 0x00000030c2ed25bb in qpid::broker::SemanticStateConsumerImpl::doDispatch (this=<value optimized out>) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/SemanticState.cpp:663 #17 0x00007f89af4afc14 in qpid::ha::ReplicatingSubscription::doDispatch (this=0x2833a20) at /usr/src/debug/qpid-cpp-0.34/src/qpid/ha/ReplicatingSubscription.cpp:317 #18 0x00000030c2ecc221 in qpid::broker::SemanticStateConsumerImpl::doOutput (this=0x2833a20) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/SemanticState.cpp:729 #19 0x00000030c27b7221 in qpid::sys::AggregateOutput::doOutput (this=0x7f896c091768) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/AggregateOutput.cpp:54 #20 0x00000030c2e58069 in qpid::broker::amqp_0_10::Connection::doOutput (this=0x7f896c091720) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/amqp_0_10/Connection.cpp:400 #21 0x00000030c2de5791 in qpid::amqp_0_10::Connection::encode (this=0x7f896c005310, buffer=<value optimized out>, size=65536) at /usr/src/debug/qpid-cpp-0.34/src/qpid/amqp_0_10/Connection.cpp:101 #22 0x00000030c27b9f54 in qpid::sys::AsynchIOHandler::idle (this=0x28f1e10) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/AsynchIOHandler.cpp:221 #23 0x00000030c2737968 in operator() (this=0x29269c0, h=...) at /usr/include/boost/function/function_template.hpp:1013 #24 qpid::sys::posix::AsynchIO::writeable (this=0x29269c0, h=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/posix/AsynchIO.cpp:582 #25 0x00000030c27bea83 in boost::function1<void, qpid::sys::DispatchHandle&>::operator() (this=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/function/function_template.hpp:1013 #26 0x00000030c27bd3ee in qpid::sys::DispatchHandle::processEvent (this=0x29269c8, type=qpid::sys::Poller::WRITABLE) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/DispatchHandle.cpp:287 #27 0x00000030c275dfad in process (this=0x266b9a0) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/Poller.h:131 #28 qpid::sys::Poller::run (this=0x266b9a0) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/epoll/EpollPoller.cpp:522 #29 0x00000030c275216a in qpid::sys::(anonymous namespace)::runRunnable (p=<value optimized out>) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/posix/Thread.cpp:35 #30 0x0000003d5f607a51 in start_thread (arg=0x7f89a7fff700) at pthread_create.c:301 #31 0x0000003d5f2e896d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115