Using the FailoverManager, the client often segfaults when a broker in a cluster is killed. ** Preparation - Configure and start openais - Start 4 brokers which creates a cluster: $ rm -rf /var/lib/qpidd/qpid1/ && qpidd --worker-threads 10 --log-to-stderr 0 --daemon --cluster-name=rocks --log-to-file /var/log/qpidd1.log --data-dir /var/lib/qpidd/qpid1 --port 5672 --auth 0 $ rm -rf /var/lib/qpidd/qpid2/ && qpidd --worker-threads 10 --log-to-stderr 0 --daemon --cluster-name=rocks --log-to-file /var/log/qpidd2.log --data-dir /var/lib/qpidd/qpid2 --port 9102 --auth 0 $ rm -rf /var/lib/qpidd/qpid3/ && qpidd --worker-threads 10 --log-to-stderr 0 --daemon --cluster-name=rocks --log-to-file /var/log/qpidd3.log --data-dir /var/lib/qpidd/qpid3 --port 9103 --auth 0 rm -rf /var/lib/qpidd/qpid4/ && qpidd --worker-threads 10 --log-to-stderr 0 --daemon --cluster-name=rocks --log-to-file /var/log/qpidd4.log --data-dir /var/lib/qpidd/qpid4 --port 9104 --auth 0 - Start the fom_timed_sender client (will be attached soon) - Kill qpid process 1 (qpid1) - Do this loop until the client crashes: - Find out which broker the client reconnected to and kill that one too - Find out which broker the client reconnected to and kill that one too - Start up 2 of the killed qpid processes, using the startup lines above On successful reproduction of this failure you will get this backtrace from the client: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x530c6940 (LWP 15568)] 0x0000003919e7b475 in memcpy () from /lib64/libc.so.6 (gdb) bt #0 0x0000003919e7b475 in memcpy () from /lib64/libc.so.6 #1 0x000000303af5956c in qpid::sys::Socket::connect () from /usr/lib64/libqpidcommon.so.0 #2 0x000000303b46657c in qpid::client::TCPConnector::connect () from /usr/lib64/libqpidclient.so.0 #3 0x000000303b45b5e9 in qpid::client::ConnectionImpl::open () from /usr/lib64/libqpidclient.so.0 #4 0x000000303b44d745 in qpid::client::Connection::open () from /usr/lib64/libqpidclient.so.0 #5 0x000000303b47d082 in qpid::client::FailoverManager::attempt () from /usr/lib64/libqpidclient.so.0 #6 0x000000303b47da7d in qpid::client::FailoverManager::attempt () from /usr/lib64/libqpidclient.so.0 #7 0x000000303b47dce4 in qpid::client::FailoverManager::connect () from /usr/lib64/libqpidclient.so.0 #8 0x000000303b47e5f8 in qpid::client::FailoverManager::execute () from /usr/lib64/libqpidclient.so.0 #9 0x000000000040c558 in child_process (thrargs=<value optimized out>) at sender.cpp:160 #10 0x000000391a606367 in start_thread () from /lib64/libpthread.so.0 #11 0x0000003919ed30ad in clone () from /lib64/libc.so.6 At the moment, it is unclear if the error is due to a bug in the fom_timed_sender.cpp client, or if it is an issue with the FailoverManager interface. The intention of the fom_timed_sender.cpp client is to start up X threads (controlled by an argument) which sends continuously messages with just a time stamp to random queues (number of queues available are controlled via an argument). The future consumer will do almost the same, but will be pulling down all available messages and track the time used for a message delivery. This tester being prepared will be used to test and hopefully confirm bug #478874
*** Bug 489346 has been marked as a duplicate of this bug. ***
Created attachment 334554 [details] Client which fails ... FailOverManager Timed Sender
I have fixed a bug which I believe could have caused this problem (BZ485682)
As above, this appears to have been fixed by r751844.
Verified fixed in SVN r751844. After this patch I could kill brokers almost indefinitely without triggering this bug. Considered fixed.
Fixed and verified; closing.