Bug 568831
| Summary: | clusters_test fails sporadically with core in client::Connector code. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Alan Conway <aconway> |
| Component: | qpid-cpp | Assignee: | Andrew Stitcher <astitcher> |
| Status: | CLOSED DUPLICATE | QA Contact: | MRG Quality Engineering <mrgqe-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | Development | CC: | astitcher, freznice, gsim |
| Target Milestone: | 1.3 | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2010-04-15 22:02:44 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This is a regression At r901548 (before thread per connection change) I got 305 cluster_test runs with no errors At r901550 (thread per connection fix), I can rarely get even one clean cluster_test run, I get the writebuff stack trace right away. Current trunk is better so it appears some races were improved but there's still one left. A faster reproducer: make -k cluster_test 'cluster_test_OBJECTS=unit_test.o ClusterFixture.o ForkedBroker.o ClusterFailover.o' Then run cluster_test in a loop. It fails in under a minute for me and I've seen it produce both stack traces. To allow cluster testing to advance I comitted a workaround in r923414, a 100ms sleep at line 166 of cluster/UpdateClient.cpp between calling Connection::close() and ~Connection. This does not address the problem but the reproducer of comment 2 will no longer work unless you remove the sleep. As to the underlying problem, it appears that the TCPConnector object is deleted by another thread between lines 284 and 285 of DispatchHandle.cpp The stack trace shows -> DispatchHandle::processEvent state=DELETING type=READ_WRITABLE, in writeableCallback on line 285 ... -> TCPConnector::writebuff - invalid vptr, object has been deleted. I have been able to reproduce this by simply re-starting a 3 member cluster in a loop but its not as reliable as described as comment 2. I have not been able to reproduce with a separate client. I think the problem stems from a change to TCPConnector::closeInternal in r901550. The previous code stopped the connections poller and joined its thread, thereby guaranteeing that it is safe to delete the Connector once close() returns. The new code has no such wait, and a poller thread is calling writebuff after it has been deleted. From Andrew Stitcher: The AsynchIO::queueForDeletion() in closeInternal should ensure that no more calls ever come up from that aio, it sounds like you've found a race condition, which stops that working properly. *** Bug 580138 has been marked as a duplicate of this bug. *** *** This bug has been marked as a duplicate of bug 580824 *** |
cluster_test fails sporadically when run in a loop. I generally see a failure in about 20 iterations, but have seen up to 70 repetitions before a core. The cores are in client connection code run by a broker sending an update to another broker. The crash does not appear to be associated with a particular test, crashes have been seen in at least these tests: - testReconnectSameSessionName, - testDequeueWaitingSubscription - testAcl - testMessageTimeToLive So far I have seen 2 different stack traces (below.) The first is very clearly in UpdateClient as it is closing a connection. The second is in a Poller thread, it occurs when no other thread is in UpdateClient code, which suggests that the UpdateClient thread has just terminated when the crash occurs (there is no other code that could be causing client::Connector code to run.) I don't know that the two traces are related but it seems likely. *** backtrace 1 #0 0x0000003da3430265 in raise () from /lib64/libc.so.6 #1 0x0000003da3431d10 in abort () from /lib64/libc.so.6 #2 0x0000003da34296e6 in __assert_fail () from /lib64/libc.so.6 #3 0x00002ad609b8bb92 in qpid::sys::Mutex::unlock (this=0x591ad80) at ../../cpp/include/qpid/sys/posix/Mutex.h:120 #4 0x00002ad609b8bbac in ~ScopedLock (this=0x477e5030) at ../../cpp/include/qpid/sys/Mutex.h:34 #5 0x00002ad60a27c009 in qpid::sys::DispatchHandle::rewatchWrite (this=0x591ac78) at ../../cpp/src/qpid/sys/DispatchHandle.cpp:118 #6 0x00002ad60a18c4ca in qpid::sys::posix::AsynchIO::notifyPendingWrite (this=0x591ac70) at ../../cpp/src/qpid/sys/posix/AsynchIO.cpp:346 #7 0x00002ad60acfa202 in qpid::client::TCPConnector::send (this=0x5919f70, frame=@0x477e52a0) at ../../cpp/src/qpid/client/TCPConnector.cpp:203 #8 0x00002ad60acbcefd in boost::_mfi::mf1<void, qpid::client::Connector, qpid::framing::AMQFrame&>::call<boost::scoped_ptr<qpid::client::Connector>, qpid::framing::AMQFrame> ( this=0x5919600, u=@0x5919d68, b1=@0x477e52a0) at /usr/include/boost/bind/mem_fn_template.hpp:140 #9 0x00002ad60acbcf29 in boost::_mfi::mf1<void, qpid::client::Connector, qpid::framing::AMQFrame&>::operator()<boost::scoped_ptr<qpid::client::Connector> > (this=0x5919600, u=@0x5919d68, a1=@0x477e52a0) at /usr/include/boost/bind/mem_fn_template.hpp:154 #10 0x00002ad60acbcf85 in boost::_bi::list2<boost::reference_wrapper<boost::scoped_ptr<qpid::client::Connector> >, boost::arg<1> >::operator()<boost::_mfi::mf1<void, qpid::client::Co\ nnector, qpid::framing::AMQFrame&>, boost::_bi::list1<qpid::framing::AMQFrame&> > (this=0x5919610, f=@0x5919600, a=@0x477e51c0) at /usr/include/boost/bind.hpp:287 #11 0x00002ad60acbcfcc in boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::client::Connector, qpid::framing::AMQFrame&>, boost::_bi::list2<boost::reference_wrapper<boost::scoped\ _ptr<qpid::client::Connector> >, boost::arg<1> > >::operator()<qpid::framing::AMQFrame> (this=0x5919600, a1=@0x477e52a0) at /usr/include/boost/bind/bind_template.hpp:32 #12 0x00002ad60acbcff3 in boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::client::Connector, qpid::framing::AMQFrame&>, boos\ t::_bi::list2<boost::reference_wrapper<boost::scoped_ptr<qpid::client::Connector> >, boost::arg<1> > >, void, qpid::framing::AMQFrame&>::invoke (function_obj_ptr= {obj_ptr = 0x5919600, const_obj_ptr = 0x5919600, func_ptr = 0x5919600, data = ""}, a0=@0x477e52a0) at /usr/include/boost/function/function_template.hpp:136 #13 0x00002ad60acaf49a in boost::function1<void, qpid::framing::AMQFrame&, std::allocator<void> >::operator() (this=0x5919bc0, a0=@0x477e52a0) at /usr/include/boost/function/function_template.hpp:576 #14 0x00002ad60acaf4ca in qpid::client::ConnectionHandler::Adapter::handle (this=0x5919c50, f=@0x477e52a0) at ../../cpp/src/qpid/client/ConnectionHandler.h:60 #15 0x00002ad60a26a4ed in qpid::framing::Proxy::send (this=0x5919c68, b=@0x477e5300) at ../../cpp/src/qpid/framing/Proxy.cpp:37 #16 0x00002ad60a1b1672 in qpid::framing::AMQP_ServerProxy::Connection::close (this=0x5919c68, replyCode=200, replyText=@0x2ad60afeba48) at qpid/framing/AMQP_ServerProxy.cpp:144 #17 0x00002ad60aca95f3 in qpid::client::ConnectionHandler::close (this=0x5919ad8) at ../../cpp/src/qpid/client/ConnectionHandler.cpp:151 #18 0x00002ad60acb5cde in qpid::client::ConnectionImpl::close (this=0x59199f0) at ../../cpp/src/qpid/client/ConnectionImpl.cpp:287 #19 0x00002ad60aca4ee1 in qpid::client::Connection::close (this=0x5713bf8) at ../../cpp/src/qpid/client/Connection.cpp:152 #20 0x00002ad60a6b8ba8 in qpid::cluster::UpdateClient::updateConnection (this=0x5713a80, updateConnection=@0x5713e80) at ../../cpp/src/qpid/cluster/UpdateClient.cpp:358 #21 0x00002ad60a6bc33c in boost::_mfi::mf1<void, qpid::cluster::UpdateClient, boost::intrusive_ptr<qpid::cluster::Connection> const&>::operator() (this=0x477e5a60, p=0x5713a80, a1=@0x5713e80) at /usr/include/boost/bind/mem_fn_template.hpp:149 #22 0x00002ad60a6bd25d in boost::_bi::list2<boost::_bi::value<qpid::cluster::UpdateClient*>, boost::arg<1> >::operator()<boost::_mfi::mf1<void, qpid::cluster::UpdateClient, boost::in\ trusive_ptr<qpid::cluster::Connection> const&>, boost::_bi::list1<boost::intrusive_ptr<qpid::cluster::Connection>&> > (this=0x477e5a70, f=@0x477e5a60, a=@0x477e5a10) at /usr/include/boost/bind.hpp:287 #23 0x00002ad60a6bd2a4 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::cluster::UpdateClient, boost::intrusive_ptr<qpid::cluster::Connection> const&>, boost::_bi::list2<boos\ t::_bi::value<qpid::cluster::UpdateClient*>, boost::arg<1> > >::operator()<boost::intrusive_ptr<qpid::cluster::Connection> > (this=0x477e5a60, a1=@0x5713e80) at /usr/include/boost/bind/bind_template.hpp:32 #24 0x00002ad60a6bd2d1 in std::for_each<__gnu_cxx::__normal_iterator<boost::intrusive_ptr<qpid::cluster::Connection>*, std::vector<boost::intrusive_ptr<qpid::cluster::Connection>, st\ d::allocator<boost::intrusive_ptr<qpid::cluster::Connection> > > >, boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::cluster::UpdateClient, boost::intrusive_ptr<qpid::cluster::C\ onnection> const&>, boost::_bi::list2<boost::_bi::value<qpid::cluster::UpdateClient*>, boost::arg<1> > > > (__first={_M_current = 0x5713e80}, __last={_M_current = 0x5713e88}, __f= {f_ = {f_ = 0x2ad60a6b849c <qpid::cluster::UpdateClient::updateConnection(boost::intrusive_ptr<qpid::cluster::Connection> const&)>}, l_ = {a1_ = {t_ = 0x5713a80}, a2_ = {<No \ data fields>}}}) at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_algo.h:159 #25 0x00002ad60a6b751a in qpid::cluster::UpdateClient::update (this=0x5713a80) at ../../cpp/src/qpid/cluster/UpdateClient.cpp:142 #26 0x00002ad60a6b7a92 in qpid::cluster::UpdateClient::run (this=0x5713a80) at ../../cpp/src/qpid/cluster/UpdateClient.cpp:120 #27 0x00002ad60a1951bc in runRunnable (p=0x5713a80) at ../../cpp/src/qpid/sys/posix/Thread.cpp:35 #28 0x0000003da4006617 in start_thread () from /lib64/libpthread.so.0 #29 0x0000003da34d3c2d in clone () from /lib64/libc.so.6 *** backtrace 2 #0 0x00002b0c3aef9eae in qpid::client::TCPConnector::writebuff (this=0x2aaaac0422b0) at ../../cpp/src/qpid/client/TCPConnector.cpp:218 #1 0x00002b0c3aeff12a in boost::_mfi::mf1<void, qpid::client::TCPConnector, qpid::sys::AsynchIO&>::operator() (this=0x1961fe40, p=0x2aaaac0422b0, a1=@0x1961fa20) at /usr/include/boost/bind/mem_fn_template.hpp:149 #2 0x00002b0c3aeff30f in boost::_bi::list2<boost::_bi::value<qpid::client::TCPConnector*>, boost::arg<1> >::operator()<boost::_mfi::mf1<void, qpid::client::TCPConnector, qpid::sys::\ AsynchIO&>, boost::_bi::list1<qpid::sys::AsynchIO&> > (this=0x1961fe50, f=@0x1961fe40, a=@0x47fd3930) at /usr/include/boost/bind.hpp:287 #3 0x00002b0c3aeff356 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::client::TCPConnector, qpid::sys::AsynchIO&>, boost::_bi::list2<boost::_bi::value<qpid::client::TCPConn\ ector*>, boost::arg<1> > >::operator()<qpid::sys::AsynchIO> (this=0x1961fe40, a1=@0x1961fa20) at /usr/include/boost/bind/bind_template.hpp:32 #4 0x00002b0c3aeff37d in boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::client::TCPConnector, qpid::sys::AsynchIO&>, boost\ ::_bi::list2<boost::_bi::value<qpid::client::TCPConnector*>, boost::arg<1> > >, void, qpid::sys::AsynchIO&>::invoke (function_obj_ptr= {obj_ptr = 0x1961fe40, const_obj_ptr = 0x1961fe40, func_ptr = 0x1961fe40, data = "@"}, a0=@0x1961fa20) at /usr/include/boost/function/function_template.hpp:136 #5 0x00002b0c3a391d04 in boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> >::operator() (this=0x1961fbd8, a0=@0x1961fa20) at /usr/include/boost/function/function_template.hpp:576 #6 0x00002b0c3a38c8bd in qpid::sys::posix::AsynchIO::writeable (this=0x1961fa20, h=@0x1961fa28) at ../../cpp/src/qpid/sys/posix/AsynchIO.cpp:536 #7 0x00002b0c3a390e7a in boost::_mfi::mf1<void, qpid::sys::posix::AsynchIO, qpid::sys::DispatchHandle&>::operator() (this=0x1961fd60, p=0x1961fa20, a1=@0x1961fa28) at /usr/include/boost/bind/mem_fn_template.hpp:149 #8 0x00002b0c3a3910f9 in boost::_bi::list2<boost::_bi::value<qpid::sys::posix::AsynchIO*>, boost::arg<1> >::operator()<boost::_mfi::mf1<void, qpid::sys::posix::AsynchIO, qpid::sys::\ DispatchHandle&>, boost::_bi::list1<qpid::sys::DispatchHandle&> > (this=0x1961fd70, f=@0x1961fd60, a=@0x47fd3c90) at /usr/include/boost/bind.hpp:287 #9 0x00002b0c3a391140 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::sys::posix::AsynchIO, qpid::sys::DispatchHandle&>, boost::_bi::list2<boost::_bi::value<qpid::sys::posi\ x::AsynchIO*>, boost::arg<1> > >::operator()<qpid::sys::DispatchHandle> ( this=0x1961fd60, a1=@0x1961fa28) at /usr/include/boost/bind/bind_template.hpp:32 #10 0x00002b0c3a391167 in boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::sys::posix::AsynchIO, qpid::sys::DispatchHandle&>,\ boost::_bi::list2<boost::_bi::value<qpid::sys::posix::AsynchIO*>, boost::arg<1> > >, void, qpid::sys::DispatchHandle&>::invoke (function_obj_ptr= {obj_ptr = 0x1961fd60, const_obj_ptr = 0x1961fd60, func_ptr = 0x1961fd60, data = "`"}, a0=@0x1961fa28) at /usr/include/boost/function/function_template.hpp:136 #11 0x00002b0c3a4815e2 in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() (this=0x1961fa50, a0=@0x1961fa28) at /usr/include/boost/function/function_template.hpp:576 #12 0x00002b0c3a47b176 in qpid::sys::DispatchHandle::processEvent (this=0x1961fa28, type=qpid::sys::Poller::READ_WRITABLE) at ../../cpp/src/qpid/sys/DispatchHandle.cpp:285 #13 0x00002b0c3a39cc49 in qpid::sys::Poller::Event::process (this=0x47fd3ed0) at ../../cpp/src/qpid/sys/Poller.h:123 #14 0x00002b0c3a39c36a in qpid::sys::Poller::run (this=0x2aaaac041ae0) at ../../cpp/src/qpid/sys/epoll/EpollPoller.cpp:483 #15 0x00002b0c3a394ff8 in runRunnable (p=0x2aaaac041ae0) at ../../cpp/src/qpid/sys/posix/Thread.cpp:35 #16 0x0000003da4006617 in start_thread () from /lib64/libpthread.so.0 #17 0x0000003da34d3c2d in clone () from /lib64/libc.so.6