Bug 494393

Summary: First two nodes join 'simultaneously'; no node can reach the 'ready' state.
Product: Red Hat Enterprise MRG Reporter: Frantisek Reznicek <freznice>
Component: qpid-cppAssignee: Alan Conway <aconway>
Status: CLOSED ERRATA QA Contact: Frantisek Reznicek <freznice>
Severity: high Docs Contact:
Priority: urgent    
Version: 1.1CC: aconway, esammons, gsim, iboverma
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, it was possible for two brokers to join a cluster simultaneously. Consequent to this, none of the brokers was recognized as the first node, and both the qpidd service and clients stopped responding. With this update, one of the brokers always assumes the role of the first node, and both the qpidd service and clients now work as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-14 15:58:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 592999    
Bug Blocks:    
Attachments:
Description Flags
automated malual bz478874 test (run ./run.sh)
none
bz494393 new reproducer none

Description Frantisek Reznicek 2009-04-06 17:14:02 UTC
Created attachment 338361 [details]
automated malual bz478874 test (run ./run.sh)

Description of problem:
During test devel for bz478874 I modified sender and receiver c++ client to observe failover time.

Unfortunately the observed behavior was wrong. I was seeing qpidd hang during phase modified sender sends messages to queues.

There were around 10 5 of cases when sending of messages to clustered broker went just fine.

All described behavior seen on rhel 5.2 x86_64.


Version-Release number of selected component (if applicable):
[freznice@dhcp-lab-200 MRG_Messaging]$ rpm -qa | egrep '(qpid|rhm|openai)' | sort -u
openais-0.80.3-22.el5_3.4
openais-devel-0.80.3-22.el5_3.4
python-qpid-0.5.752581-1.el5
qpidc-0.5.752581-3.el5
qpidc-debuginfo-0.5.752581-3.el5
qpidc-devel-0.5.752581-3.el5
qpidc-perftest-0.5.752581-3.el5
qpidc-rdma-0.5.752581-3.el5
qpidc-ssl-0.5.752581-3.el5
qpidd-0.5.752581-3.el5
qpidd-acl-0.5.752581-3.el5
qpidd-cluster-0.5.752581-3.el5
qpidd-devel-0.5.752581-3.el5
qpidd-rdma-0.5.752581-3.el5
qpidd-ssl-0.5.752581-3.el5
qpidd-xml-0.5.752581-3.el5
qpid-java-client-0.5.751061-1.el5
qpid-java-common-0.5.751061-1.el5
rhm-0.5.3206-1.el5
rhm-docs-0.5.756148-1.el5


How reproducible:
90%

Steps to Reproduce:
1. install and set-up openais-0.80.3-22.el5_3.4 (service openais start)
2. extract test from attachement and run ./run.sh
3. see transcript and expect the behavior from Additional info.
  
Actual results:
qpidd & c++ clients hang

Expected results:
No hang is expected.

Additional info: (transcript & pstack)
[root@dhcp-lab-200 bz478874]# ./run.sh
client compile, ecode:0000
client[s] ready
starting brokers in the cluster:....done
broker[s] running (ports:33865 48171 36305 52824 ,#:4, pids:30035 30037 30033 30031 ,#:4)
run0-----------------------------------
.broker[s] running (ports:33865 48171 36305 52824 ,#:4, pids:30035 30037 30033 30031 ,#:4)
launching senders...done
waiting for senders...


4 running brokers
[root@dhcp-lab-200 bz478874]# gps qpidd | grep -v grep
root     30031  0.0  0.2 238260 11252 pts/13   Sl+  18:50   0:00 qpidd -p 0 --auth no --cluster-name dhcp-lab-200.englab.brq.redhat.com_cluster_bz478874 --log-enable trace+ --data-dir data_0_0
root     30033  0.0  0.2 238264 11252 pts/13   Sl+  18:50   0:00 qpidd -p 0 --auth no --cluster-name dhcp-lab-200.englab.brq.redhat.com_cluster_bz478874 --log-enable trace+ --data-dir data_0_1
root     30035  0.0  0.2 238532 11292 pts/13   Sl+  18:50   0:00 qpidd -p 0 --auth no --cluster-name dhcp-lab-200.englab.brq.redhat.com_cluster_bz478874 --log-enable trace+ --data-dir data_0_2
root     30037  0.0  0.2 238528 11256 pts/13   Sl+  18:50   0:00 qpidd -p 0 --auth no --cluster-name dhcp-lab-200.englab.brq.redhat.com_cluster_bz478874 --log-enable trace+ --data-dir data_0_3
[root@dhcp-lab-200 bz478874]# gps qpidd | grep -v grep | awk '{printf("pstack %d\n",$2) }' | sh
Thread 13 (Thread 1112500544 (LWP 30044)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 12 (Thread 1122990400 (LWP 30046)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 11 (Thread 1133480256 (LWP 30047)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 10 (Thread 1143970112 (LWP 30060)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 9 (Thread 1090156864 (LWP 30068)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 8 (Thread 1100646720 (LWP 30069)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 7 (Thread 1154459968 (LWP 30070)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 6 (Thread 1164949824 (LWP 30071)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 5 (Thread 1175439680 (LWP 30076)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 4 (Thread 1185929536 (LWP 30077)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 1196419392 (LWP 30078)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 1206909248 (LWP 30079)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47761338967968 (LWP 30031)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ece2ccb06 in qpid::broker::Broker::run ()
#4  0x0000000000406948 in qpid::log::Options::~Options ()
#5  0x0000000000405438 in __cxa_pure_virtual ()
#6  0x000000350b01d8b4 in __libc_start_main () from /lib64/libc.so.6
#7  0x0000000000404eb9 in __cxa_pure_virtual ()
#8  0x00007fff5d0928b8 in ?? ()
#9  0x0000000000000000 in ?? ()
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
Thread 13 (Thread 1105402176 (LWP 30056)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 12 (Thread 1115892032 (LWP 30057)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 11 (Thread 1126381888 (LWP 30058)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 10 (Thread 1136871744 (LWP 30062)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 9 (Thread 1091389760 (LWP 30080)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 8 (Thread 1147361600 (LWP 30081)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 7 (Thread 1157851456 (LWP 30082)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 6 (Thread 1168341312 (LWP 30083)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 5 (Thread 1178831168 (LWP 30084)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 4 (Thread 1189321024 (LWP 30085)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 1199810880 (LWP 30086)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 1210300736 (LWP 30087)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 46927920687008 (LWP 30033)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ece2ccb06 in qpid::broker::Broker::run ()
#4  0x0000000000406948 in qpid::log::Options::~Options ()
#5  0x0000000000405438 in __cxa_pure_virtual ()
#6  0x000000350b01d8b4 in __libc_start_main () from /lib64/libc.so.6
#7  0x0000000000404eb9 in __cxa_pure_virtual ()
#8  0x00007fff68a2d248 in ?? ()
#9  0x0000000000000000 in ?? ()
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
Thread 13 (Thread 1090844992 (LWP 30050)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 12 (Thread 1101334848 (LWP 30053)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 11 (Thread 1111824704 (LWP 30054)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 10 (Thread 1122314560 (LWP 30063)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 9 (Thread 1132804416 (LWP 30088)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 8 (Thread 1143294272 (LWP 30089)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 7 (Thread 1153784128 (LWP 30090)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 6 (Thread 1164273984 (LWP 30091)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 5 (Thread 1174763840 (LWP 30092)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 4 (Thread 1185253696 (LWP 30093)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 1195743552 (LWP 30094)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 1206233408 (LWP 30095)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47203341907872 (LWP 30035)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ece2ccb06 in qpid::broker::Broker::run ()
#4  0x0000000000406948 in qpid::log::Options::~Options ()
#5  0x0000000000405438 in __cxa_pure_virtual ()
#6  0x000000350b01d8b4 in __libc_start_main () from /lib64/libc.so.6
#7  0x0000000000404eb9 in __cxa_pure_virtual ()
#8  0x00007fff48407c28 in ?? ()
#9  0x0000000000000000 in ?? ()
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
Thread 13 (Thread 1089345856 (LWP 30045)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 12 (Thread 1099835712 (LWP 30048)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 11 (Thread 1110325568 (LWP 30049)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 10 (Thread 1120815424 (LWP 30061)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003ece388c0f in qpid::broker::Timer::run ()
#2  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 9 (Thread 1131305280 (LWP 30064)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 8 (Thread 1141795136 (LWP 30065)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 7 (Thread 1152284992 (LWP 30066)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 6 (Thread 1162774848 (LWP 30067)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 5 (Thread 1173264704 (LWP 30072)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 4 (Thread 1183754560 (LWP 30073)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 1194244416 (LWP 30074)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 1204734272 (LWP 30075)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47578037073824 (LWP 30037)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ece2ccb06 in qpid::broker::Broker::run ()
#4  0x0000000000406948 in qpid::log::Options::~Options ()
#5  0x0000000000405438 in __cxa_pure_virtual ()
#6  0x000000350b01d8b4 in __libc_start_main () from /lib64/libc.so.6
#7  0x0000000000404eb9 in __cxa_pure_virtual ()
#8  0x00007fff0aae0308 in ?? ()
#9  0x0000000000000000 in ?? ()
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6

[root@dhcp-lab-200 bz478874]# gps ./sender_mod | grep -v grep
root     30149  0.0  0.1  98464  5336 pts/13   Sl+  18:51   0:00 ./sender_mod -p 33865 --queue-cnt 10 --send-eos 9 --routing-key test-queue-0000-0000 --log-enable trace+
root     30171  0.0  0.1  98464  5336 pts/13   Sl+  18:51   0:00 ./sender_mod -p 48171 --queue-cnt 10 --send-eos 9 --routing-key test-queue-0000-0001 --log-enable trace+
root     30193  0.0  0.1  98460  5332 pts/13   Sl+  18:51   0:00 ./sender_mod -p 36305 --queue-cnt 10 --send-eos 9 --routing-key test-queue-0000-0002 --log-enable trace+
root     30215  0.0  0.1  98464  5336 pts/13   Sl+  18:51   0:00 ./sender_mod -p 52824 --queue-cnt 10 --send-eos 9 --routing-key test-queue-0000-0003 --log-enable trace+
root     30237  0.0  0.1  98464  5336 pts/13   Sl+  18:51   0:00 ./sender_mod -p 33865 --queue-cnt 10 --send-eos 9 --routing-key test-queue-0000-0004 --log-enable trace+
root     30259  0.0  0.1  98464  5336 pts/13   Sl+  18:51   0:00 ./sender_mod -p 48171 --queue-cnt 10 --send-eos 9 --routing-key test-queue-0000-0005 --log-enable trace+
root     30281  0.0  0.1  98464  5332 pts/13   Sl+  18:51   0:00 ./sender_mod -p 36305 --queue-cnt 10 --send-eos 9 --routing-key test-queue-0000-0006 --log-enable trace+
root     30303  0.0  0.1  98460  5332 pts/13   Sl+  18:51   0:00 ./sender_mod -p 52824 --queue-cnt 10 --send-eos 9 --routing-key test-queue-0000-0007 --log-enable trace+
root     30325  0.0  0.1  98464  5332 pts/13   Sl+  18:51   0:00 ./sender_mod -p 33865 --queue-cnt 10 --send-eos 9 --routing-key test-queue-0000-0008 --log-enable trace+
root     30347  0.0  0.1  98464  5336 pts/13   Sl+  18:51   0:00 ./sender_mod -p 48171 --queue-cnt 10 --send-eos 9 --routing-key test-queue-0000-0009 --log-enable trace+


[root@dhcp-lab-200 bz478874]# gps ./sender_mod | grep -v grep | awk '{printf("pstack %d\n",$2) }' | sh
Thread 2 (Thread 1093114176 (LWP 30156)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ece265feb in qpid::client::TCPConnector::run ()
#4  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47308996413376 (LWP 30149)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000003ece2a24d6 in qpid::client::StateManager::waitFor ()
#2  0x0000003ece250e11 in qpid::client::ConnectionHandler::waitForOpen ()
#3  0x0000003ece25bf7f in qpid::client::ConnectionImpl::open ()
#4  0x0000003ece24db95 in qpid::client::Connection::open ()
#5  0x0000003ece27e111 in qpid::client::FailoverManager::attempt ()
#6  0x0000003ece27e922 in qpid::client::FailoverManager::attempt ()
#7  0x0000003ece27eb72 in qpid::client::FailoverManager::connect ()
#8  0x0000003ece27f5d9 in qpid::client::FailoverManager::execute ()
#9  0x000000000040ab9b in main ()
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
Thread 2 (Thread 1114958144 (LWP 30178)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ece265feb in qpid::client::TCPConnector::run ()
#4  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47283390744512 (LWP 30171)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000003ece2a24d6 in qpid::client::StateManager::waitFor ()
#2  0x0000003ece250e11 in qpid::client::ConnectionHandler::waitForOpen ()
#3  0x0000003ece25bf7f in qpid::client::ConnectionImpl::open ()
#4  0x0000003ece24db95 in qpid::client::Connection::open ()
#5  0x0000003ece27e111 in qpid::client::FailoverManager::attempt ()
#6  0x0000003ece27e922 in qpid::client::FailoverManager::attempt ()
#7  0x0000003ece27eb72 in qpid::client::FailoverManager::connect ()
#8  0x0000003ece27f5d9 in qpid::client::FailoverManager::execute ()
#9  0x000000000040ab9b in main ()
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
Thread 2 (Thread 1095194944 (LWP 30200)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ece265feb in qpid::client::TCPConnector::run ()
#4  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47291830020032 (LWP 30193)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000003ece2a24d6 in qpid::client::StateManager::waitFor ()
#2  0x0000003ece250e11 in qpid::client::ConnectionHandler::waitForOpen ()
#3  0x0000003ece25bf7f in qpid::client::ConnectionImpl::open ()
#4  0x0000003ece24db95 in qpid::client::Connection::open ()
#5  0x0000003ece27e111 in qpid::client::FailoverManager::attempt ()
#6  0x0000003ece27e922 in qpid::client::FailoverManager::attempt ()
#7  0x0000003ece27eb72 in qpid::client::FailoverManager::connect ()
#8  0x0000003ece27f5d9 in qpid::client::FailoverManager::execute ()
#9  0x000000000040ab9b in main ()
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
Thread 2 (Thread 1108162880 (LWP 30222)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ece265feb in qpid::client::TCPConnector::run ()
#4  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47780071659456 (LWP 30215)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000003ece2a24d6 in qpid::client::StateManager::waitFor ()
#2  0x0000003ece250e11 in qpid::client::ConnectionHandler::waitForOpen ()
#3  0x0000003ece25bf7f in qpid::client::ConnectionImpl::open ()
#4  0x0000003ece24db95 in qpid::client::Connection::open ()
#5  0x0000003ece27e111 in qpid::client::FailoverManager::attempt ()
#6  0x0000003ece27e922 in qpid::client::FailoverManager::attempt ()
#7  0x0000003ece27eb72 in qpid::client::FailoverManager::connect ()
#8  0x0000003ece27f5d9 in qpid::client::FailoverManager::execute ()
#9  0x000000000040ab9b in main ()
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
Thread 2 (Thread 1116428608 (LWP 30244)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ece265feb in qpid::client::TCPConnector::run ()
#4  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47389895878592 (LWP 30237)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000003ece2a24d6 in qpid::client::StateManager::waitFor ()
#2  0x0000003ece250e11 in qpid::client::ConnectionHandler::waitForOpen ()
#3  0x0000003ece25bf7f in qpid::client::ConnectionImpl::open ()
#4  0x0000003ece24db95 in qpid::client::Connection::open ()
#5  0x0000003ece27e111 in qpid::client::FailoverManager::attempt ()
#6  0x0000003ece27e922 in qpid::client::FailoverManager::attempt ()
#7  0x0000003ece27eb72 in qpid::client::FailoverManager::connect ()
#8  0x0000003ece27f5d9 in qpid::client::FailoverManager::execute ()
#9  0x000000000040ab9b in main ()
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
Thread 2 (Thread 1104079168 (LWP 30266)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ece265feb in qpid::client::TCPConnector::run ()
#4  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47101935843264 (LWP 30259)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000003ece2a24d6 in qpid::client::StateManager::waitFor ()
#2  0x0000003ece250e11 in qpid::client::ConnectionHandler::waitForOpen ()
#3  0x0000003ece25bf7f in qpid::client::ConnectionImpl::open ()
#4  0x0000003ece24db95 in qpid::client::Connection::open ()
#5  0x0000003ece27e111 in qpid::client::FailoverManager::attempt ()
#6  0x0000003ece27e922 in qpid::client::FailoverManager::attempt ()
#7  0x0000003ece27eb72 in qpid::client::FailoverManager::connect ()
#8  0x0000003ece27f5d9 in qpid::client::FailoverManager::execute ()
#9  0x000000000040ab9b in main ()
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
Thread 2 (Thread 1084668224 (LWP 30288)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ece265feb in qpid::client::TCPConnector::run ()
#4  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47437152806848 (LWP 30281)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000003ece2a24d6 in qpid::client::StateManager::waitFor ()
#2  0x0000003ece250e11 in qpid::client::ConnectionHandler::waitForOpen ()
#3  0x0000003ece25bf7f in qpid::client::ConnectionImpl::open ()
#4  0x0000003ece24db95 in qpid::client::Connection::open ()
#5  0x0000003ece27e111 in qpid::client::FailoverManager::attempt ()
#6  0x0000003ece27e922 in qpid::client::FailoverManager::attempt ()
#7  0x0000003ece27eb72 in qpid::client::FailoverManager::connect ()
#8  0x0000003ece27f5d9 in qpid::client::FailoverManager::execute ()
#9  0x000000000040ab9b in main ()
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
Thread 2 (Thread 1104156992 (LWP 30310)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ece265feb in qpid::client::TCPConnector::run ()
#4  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47647979538368 (LWP 30303)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000003ece2a24d6 in qpid::client::StateManager::waitFor ()
#2  0x0000003ece250e11 in qpid::client::ConnectionHandler::waitForOpen ()
#3  0x0000003ece25bf7f in qpid::client::ConnectionImpl::open ()
#4  0x0000003ece24db95 in qpid::client::Connection::open ()
#5  0x0000003ece27e111 in qpid::client::FailoverManager::attempt ()
#6  0x0000003ece27e922 in qpid::client::FailoverManager::attempt ()
#7  0x0000003ece27eb72 in qpid::client::FailoverManager::connect ()
#8  0x0000003ece27f5d9 in qpid::client::FailoverManager::execute ()
#9  0x000000000040ab9b in main ()
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
Thread 2 (Thread 1090480448 (LWP 30332)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ece265feb in qpid::client::TCPConnector::run ()
#4  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47707430741952 (LWP 30325)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000003ece2a24d6 in qpid::client::StateManager::waitFor ()
#2  0x0000003ece250e11 in qpid::client::ConnectionHandler::waitForOpen ()
#3  0x0000003ece25bf7f in qpid::client::ConnectionImpl::open ()
#4  0x0000003ece24db95 in qpid::client::Connection::open ()
#5  0x0000003ece27e111 in qpid::client::FailoverManager::attempt ()
#6  0x0000003ece27e922 in qpid::client::FailoverManager::attempt ()
#7  0x0000003ece27eb72 in qpid::client::FailoverManager::connect ()
#8  0x0000003ece27f5d9 in qpid::client::FailoverManager::execute ()
#9  0x000000000040ab9b in main ()
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
Thread 2 (Thread 1088653632 (LWP 30353)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003ecdd72e8d in qpid::sys::Poller::wait ()
#2  0x0000003ecdd73c67 in qpid::sys::Poller::run ()
#3  0x0000003ece265feb in qpid::client::TCPConnector::run ()
#4  0x0000003ecdd6ac4a in qpid::sys::(anonymous namespace)::runRunnable ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47451936192448 (LWP 30347)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000003ece2a24d6 in qpid::client::StateManager::waitFor ()
#2  0x0000003ece250e11 in qpid::client::ConnectionHandler::waitForOpen ()
#3  0x0000003ece25bf7f in qpid::client::ConnectionImpl::open ()
#4  0x0000003ece24db95 in qpid::client::Connection::open ()
#5  0x0000003ece27e111 in qpid::client::FailoverManager::attempt ()
#6  0x0000003ece27e922 in qpid::client::FailoverManager::attempt ()
#7  0x0000003ece27eb72 in qpid::client::FailoverManager::connect ()
#8  0x0000003ece27f5d9 in qpid::client::FailoverManager::execute ()
#9  0x000000000040ab9b in main ()
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()

Comment 2 Gordon Sim 2009-04-06 19:03:32 UTC
This appears to be caused by multicast events being suppressed as all nodes are stuck in JOINER state for some reason.

Comment 3 Gordon Sim 2009-04-07 10:09:10 UTC
Two of the four nodes appear to be treated as joining simultaneously, i.e. the first config change they each receive contains both nodes:

2009-apr-06 13:25:27 debug 10.16.64.42:24961(INIT) config change: 10.16.64.42:24961 10.16.64.42:24963

2009-apr-06 13:25:27 debug 10.16.64.42:24963(INIT) config change: 10.16.64.42:24961 10.16.64.42:24963

This means that neither considers itself the first node and the cluster never gets into the READY state, all nodes are stuck in JOINER state and ignore update requests and all data events are held up indefinitely.

Comment 4 Gordon Sim 2009-04-07 11:56:42 UTC
I have verifed that a short delay between starting each node (e.g. 1 sec) avoids hitting this problem and consequently am lowering the priority and targetting for 1.2.

Comment 5 Alan Conway 2009-07-02 20:20:57 UTC
*** Bug 509439 has been marked as a duplicate of this bug. ***

Comment 6 Alan Conway 2009-07-02 20:42:03 UTC
qpidd makes the incorrect assumption that the first CPG config-change always
contains a single member. It is possible for the first config-change to have
multiple members if they join concurrently. If this happens, all members come
up as "JOINER" with nobody taking on the role of first member and the cluster
hangs.

We need an additional protocol on joining to handle the case of multiple members in the first config change, probably an extension of the existing request-update/offer-update protocol.

Comment 7 Alan Conway 2009-07-10 15:10:59 UTC
The reproducer described in bug 510504 is also quite good at reproducing this issue.

Comment 9 Frantisek Reznicek 2009-10-30 10:45:35 UTC
Created attachment 366783 [details]
bz494393 new reproducer

Retested (at the moment just short time) and found that issue is much less frequent, in fact I was not able to get stuck (trigger the issue) on RHEL 5.4 with latest qpidc and openais.

[root@mrg-qe-02 bz494393]# rpm -q openais
openais-0.80.6-8.el5
openais-0.80.6-8.el5
[root@mrg-qe-02 bz494393]# rpm -q qpidd
qpidd-0.5.752581-30.el5

There are two scripts
'./run.sh' will launch one test run with 4 nodes

'./looper.sh' will be calling ./run.sh until stuck or failure found

I believe this might help in reproducing the issue.

Comment 10 Alan Conway 2009-11-17 22:00:50 UTC
This should be fixed by changes in r881423, but since it's not on latest qpid/openais it's hard to confirm.

Comment 11 Frantisek Reznicek 2010-05-17 15:23:09 UTC
Current testing shows that persistant broker cluster has issues with startup
again, see the issue tracked as bug 592999.

592999 marked as blocker.

Comment 12 Frantisek Reznicek 2010-06-02 14:13:35 UTC
The issue has been fixed, verified in long test run (few hundreds of cluster
restarts, qpidd min/max logging, various cluster widths) on RHEL 5.5 i386 /
x86_64 using packages:
openais-0.80.6-16.el5_5.1
openais-debuginfo-0.80.6-16.el5_5.1
openais-devel-0.80.6-16.el5_5.1
python-qpid-0.7.946106-1.el5
qpid-cpp-client-0.7.946106-2.el5
qpid-cpp-client-devel-0.7.946106-2.el5
qpid-cpp-client-devel-docs-0.7.946106-2.el5
qpid-cpp-client-ssl-0.7.946106-2.el5
qpid-cpp-mrg-debuginfo-0.7.946106-2.el5
qpid-cpp-server-0.7.946106-2.el5
qpid-cpp-server-cluster-0.7.946106-2.el5
qpid-cpp-server-devel-0.7.946106-2.el5
qpid-cpp-server-ssl-0.7.946106-2.el5
qpid-cpp-server-store-0.7.946106-2.el5
qpid-cpp-server-xml-0.7.946106-2.el5
qpid-java-client-0.7.946106-3.el5
qpid-java-common-0.7.946106-3.el5
qpid-tests-0.7.946106-1.el5
qpid-tools-0.7.946106-4.el5
ruby-qpid-0.7.946106-1.el5

-> VERIFIED

Comment 13 Jaromir Hradilek 2010-10-07 15:17:25 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, it was possible for two brokers to join a cluster simultaneously. Consequent to this, none of the brokers was recognized as the first node, and both the qpidd service and clients stopped responding. With this update, one of the brokers always assumes the role of the first node, and both the qpidd service and clients now work as expected.

Comment 15 errata-xmlrpc 2010-10-14 15:58:42 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html