(gdb) thread apply all bt Thread 6 (Thread 27593): #0 0x00000034de4d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x00000034df52bb81 in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib64/libqpidcommon.so.3 #2 0x00000034df52c617 in qpid::sys::Poller::run() () from /usr/lib64/libqpidcommon.so.3 #3 0x00000034e190ed62 in qpid::broker::Broker::run() () from /usr/lib64/libqpidbroker.so.3 #4 0x00000000004093f4 in ?? () #5 0x00000034e1930d9e in qpid::broker::Daemon::fork() () from /usr/lib64/libqpidbroker.so.3 #6 0x0000000000406c65 in ?? () #7 0x00000000004055af in std::ios_base::Init::~Init() () #8 0x00000034de41d994 in __libc_start_main () from /lib64/libc.so.6 #9 0x0000000000405019 in std::ios_base::Init::~Init() () #10 0x00007fffddbc45e8 in ?? () #11 0x0000000000000000 in ?? () Thread 5 (Thread 27594): #0 0x00000034dec0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000034df5fafdf in qpid::sys::Timer::run() () from /usr/lib64/libqpidcommon.so.3 #2 0x00000034df52286a in ?? () from /usr/lib64/libqpidcommon.so.3 #3 0x00000034dec0673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000034de4d3d1d in clone () from /lib64/libc.so.6 Thread 4 (Thread 27596): #0 0x00000034dec0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000034df5fafdf in qpid::sys::Timer::run() () from /usr/lib64/libqpidcommon.so.3 #2 0x00000034df52286a in ?? () from /usr/lib64/libqpidcommon.so.3 #3 0x00000034dec0673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000034de4d3d1d in clone () from /lib64/libc.so.6 Thread 3 (Thread 27680): #0 0x00000034dec0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002b9a1e8ba52d in qpid::sys::Waitable::wait() () from /usr/lib64/libqpidclient.so.3 #2 0x00002b9a1e8b5719 in qpid::client::SessionImpl::open(unsigned int) () from /usr/lib64/libqpidclient.so.3 #3 0x00002b9a1e87f1d8 in qpid::client::ConnectionImpl::newSession(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned short) () from /usr/lib64/libqpidclient.so.3 #4 0x00002b9a1e8732c9 in qpid::client::Connection::newSession(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int) () from /usr/lib64/libqpidclient.so.3 #5 0x00002b9a1e1bf593 in qpid::cluster::UpdateClient::run() () from /usr/lib64/qpid/daemon/cluster.so #6 0x00000034df52286a in ?? () from /usr/lib64/libqpidcommon.so.3 #7 0x00000034dec0673d in start_thread () from /lib64/libpthread.so.0 #8 0x00000034de4d3d1d in clone () from /lib64/libc.so.6 Thread 2 (Thread 27681): #0 0x00000034de4c60ab in write () from /lib64/libc.so.6 #1 0x00000034df51443f in qpid::sys::Socket::write(void const*, unsigned long) const () from /usr/lib64/libqpidcommon.so.3 #2 0x00000034df51e3eb in qpid::sys::posix::AsynchIO::writeable(qpid::sys::DispatchHandle&) () from /usr/lib64/libqpidcommon.so.3 #3 0x00000034df5f4b27 in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator()(qpid::sys::DispatchHandle&) const () from /usr/lib64/libqpidcommon.so.3 #4 0x00000034df5f4114 in qpid::sys::DispatchHandle::processEvent(qpid::sys::Poller::EventType) () from /usr/lib64/libqpidcommon.so.3 #5 0x00000034df52c644 in qpid::sys::Poller::run() () from /usr/lib64/libqpidcommon.so.3 #6 0x00000034df52286a in ?? () from /usr/lib64/libqpidcommon.so.3 #7 0x00000034dec0673d in start_thread () from /lib64/libpthread.so.0 #8 0x00000034de4d3d1d in clone () from /lib64/libc.so.6 Thread 1 (Thread 27597): #0 0x00000034de430265 in raise () from /lib64/libc.so.6 #1 0x00000034de431d10 in abort () from /lib64/libc.so.6 #2 0x00000034df5f13d4 in qpid::sys::assertClusterSafe() () from /usr/lib64/libqpidcommon.so.3 #3 0x00000034e19bce87 in qpid::broker::SemanticState::attached() () from /usr/lib64/libqpidbroker.so.3 #4 0x00000034e19dc7a9 in qpid::broker::SessionState::readyToSend() () from /usr/lib64/libqpidbroker.so.3 #5 0x00000034df5be504 in qpid::amqp_0_10::SessionHandler::attach(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) () from /usr/lib64/libqpidcommon.so.3 #6 0x00000034df57b50d in qpid::framing::AMQP_AllOperations::SessionHandler::Invoker::visit(qpid::framing::SessionAttachBody const&) () from /usr/lib64/libqpidcommon.so.3 #7 0x00000034df5be9fe in qpid::framing::Invoker::Result qpid::framing::invoke<qpid::amqp_0_10::SessionHandler>(qpid::amqp_0_10::SessionHandler&, qpid::framing::AMQMethodBody const&) () from /usr/lib64/libqpidcommon.so.3 #8 0x00000034df5b9d83 in qpid::amqp_0_10::SessionHandler::invoke(qpid::framing::AMQMethodBody const&) () from /usr/lib64/libqpidcommon.so.3 #9 0x00000034df5bcdfa in qpid::amqp_0_10::SessionHandler::handleIn(qpid::framing::AMQFrame&) () from /usr/lib64/libqpidcommon.so.3 #10 0x00000034e1924e29 in qpid::broker::Connection::received(qpid::framing::AMQFrame&) () from /usr/lib64/libqpidbroker.so.3 #11 0x00002b9a1e1adfdc in qpid::cluster::Connection::received(qpid::framing::AMQFrame&) () from /usr/lib64/qpid/daemon/cluster.so #12 0x00002b9a1e1ac021 in qpid::cluster::Connection::decode(char const*, unsigned long) () from /usr/lib64/qpid/daemon/cluster.so #13 0x00000034df5ef5f2 in qpid::sys::AsynchIOHandler::readbuff(qpid::sys::AsynchIO&, qpid::sys::AsynchIOBufferBase*) () from /usr/lib64/libqpidcommon.so.3 #14 0x00000034df5202aa in boost::function2<void, qpid::sys::AsynchIO&, qpid::sys::AsynchIOBufferBase*, std::allocator<boost::function_base> >::operator()(qpid::sys::AsynchIO&, qpid::sys::AsynchIOBufferBase*) const () from /usr/lib64/libqpidcommon.so.3 #15 0x00000034df51da13 in qpid::sys::posix::AsynchIO::readable(qpid::sys::DispatchHandle&) () from /usr/lib64/libqpidcommon.so.3 #16 0x00000034df5f4b27 in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator()(qpid::sys::DispatchHandle&) const () from /usr/lib64/libqpidcommon.so.3 #17 0x00000034df5f4300 in qpid::sys::DispatchHandle::processEvent(qpid::sys::Poller::EventType) () from /usr/lib64/libqpidcommon.so.3 #18 0x00000034df52c644 in qpid::sys::Poller::run() () from /usr/lib64/libqpidcommon.so.3 #19 0x00000034df52286a in ?? () from /usr/lib64/libqpidcommon.so.3 #20 0x00000034dec0673d in start_thread () from /lib64/libpthread.so.0 #21 0x00000034de4d3d1d in clone () from /lib64/libc.so.6 (gdb)
Reproducer not yet known, stack traces reported as above however against first beta packages.
The stack trace shows that the broker is connecting to give an update to a new broker, while at the same it is receiving an update connection. I don't know how this could come about. Are there any log files available or a description of what was happening at the time of the core dump? It would make sense for a broker to refuse the erroneous update connection in this situation, if there's no more information available. I can make that change.
Created attachment 440424 [details] broker log file 1
Created attachment 440425 [details] broker log file 2
Created attachment 440426 [details] client log file 1
Created attachment 440427 [details] client log file 2
The reproducer is as attached here: https://bugzilla.redhat.com/attachment.cgi?id=436252 Additionally the following information was given by the user: "This client run was using the 'failover_exchange' cluster failover method. Just to confirm, the sequence of events for these logs was as follows: - start broker 1 - start broker 2 - start client app, confirm messages are being sent/received (the 'Sent:/got message' logging) - stop broker 1 - broker 2 becomes primary, app successfully fails over to broker 2 and continues sending/receiving - start broker 1. Note I did not clean broker 1's state files before starting - I don't believe this should be necessary though. At this point broker 2 crashes with the attached backtrace, and the client comes to a halt. I took a number of thread dumps at different points in the lifecyle, which are all in the client log file (pubsubtest.out.0.tgz). Eventually the client gives up and exits. Broker 1 appears to start, but is unusable: - tried to re-run the test client app. It is unable to connect to broker 1. Client logs inc. stack traces attached. (pubsubtest.out.1.tgz) - manually stop broker 1. " The brokers are I believe running in VMs, not sure if that is relevant.
It appears that the broker's cluster-url has been specified incorrectly. From the log: 2010-08-23 12:11:31 notice cluster(10.34.22.65:8356 UPDATER) sending update to 10.34.22.64:16813 at amqp:tcp:10.34.22.65:5678,tcp:10.34.22.64:5678 Note that the URL includes _both_ brokers addresses. This is causing the updater broker to connect to itself rather than the new broker, causing the crash. Each brokers cluster-url should specify only the address(es) of that one broker. The brokers collaborate to provide the clients with the URLs of all brokers in the cluster. I will change the broker to refuse the erroneous catch-up connection attempt with a sensible error message rather than crash, and check if the documentation for cluster-url needs clarification. I've moved the bug to 1.4, I don't think its critical for the 1.3 release.
Additional error checking comitted ontrunk r988312 Check for and abort invalid catchup connections. Detect attempt to make a catch-up connection while we are not expecting an update.
*** Bug 634168 has been marked as a duplicate of this bug. ***