Description of problem: Verifying the bug 510475, I was able to trigger new issue. While running c++ subscriber clients before qpidd cluster is started up, then 4 node cluster is started and all running clients crash in: Thread 1 (Thread 15867): #0 0x0000003929860e73 in qpid::client::(anonymous namespace)::HeartbeatTask::fire (this=0x2e9c9d0) at qpid/client/ConnectionImpl.cpp:152 #1 0x00000039293faf60 in qpid::sys::Timer::run (this=0x3929ae0e40) at qpid/sys/Timer.cpp:119 #2 0x00000039293238da in qpid::sys::(anonymous namespace)::runRunnable ( p=0x2ea8188) at qpid/sys/posix/Thread.cpp:35 #3 0x00000035cd20673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000035cc6d3d1d in clone () from /lib64/libc.so.6 More details on scenario: - launch couple of subscribe clients - launch 4 node cluster (on one machine and different ports) - wait for broker status / client status The issue is seen on RHEL 5.5 x86_64, machines used for test run are VMs. Version-Release number of selected component (if applicable): qpid-tools-0.7.946106-4.el5 qpid-cpp-server-ssl-0.7.946106-2.el5 qpid-cpp-client-devel-docs-0.7.946106-2.el5 qpid-cpp-client-0.7.946106-2.el5 qpid-cpp-server-0.7.946106-2.el5 qpid-cpp-client-devel-0.7.946106-2.el5 python-qpid-0.7.946106-1.el5 qpid-java-common-0.7.946106-3.el5 qpid-tests-0.7.946106-1.el5 qpid-cpp-server-cluster-0.7.946106-2.el5 qpid-cpp-server-devel-0.7.946106-2.el5 qpid-java-client-0.7.946106-3.el5 qpid-cpp-client-ssl-0.7.946106-2.el5 qpid-cpp-server-xml-0.7.946106-2.el5 qpid-cpp-server-store-0.7.946106-2.el5 qpid-cpp-mrg-debuginfo-0.7.946106-2.el5 How reproducible: ~80% Steps to Reproduce: 1. ./run.sh 4 10 (on less powerfull machine, for instance VM) Actual results: The qpid c++ clients crash. Expected results: The qpid c++ clients should not crash. Additional info: [root@dhcp-30-90 bz506758]# cat dump_core.15870 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5_5.1) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /root/bz/bz506758/subscribe...done. [New Thread 15873] [New Thread 15870] Reading symbols from /usr/lib64/libqpidclient.so.2...Reading symbols from /usr/lib/debug/usr/lib64/libqpidclient.so.2.0.0.debug...done. done. ... Loaded symbols for /usr/lib64/sasl2/libplain.so.2 Core was generated by `./subscribe --queue queue-003 --retry-interval 1 --url amqp:tcp:127.0.0.1:5672,'. Program terminated with signal 11, Segmentation fault. #0 0x0000003929860e73 in qpid::client::(anonymous namespace)::HeartbeatTask::fire (this=0x1424f9d0) at qpid/client/ConnectionImpl.cpp:152 152 timeout.idleIn(); (gdb) rax 0x11b595ca67a56a70 1276090765984230000 rbx 0x39293f8d50 245505166672 rcx 0x0 0 rdx 0x35cf8ed2e0 231115510496 rsi 0x0 0 rdi 0x1425b188 338014600 rbp 0x432aaee0 0x432aaee0 rsp 0x432aa2f0 0x432aa2f0 r8 0x35cf8ed2e0 231115510496 r9 0x0 0 r10 0x35cc9529e0 231065594336 r11 0x35cd20a020 231074734112 r12 0x1424fa00 337967616 r13 0x14251fb0 337977264 r14 0x1424f680 337966720 r15 0x3929ae0ea0 245512408736 rip 0x3929860e73 0x3929860e73 <qpid::client::(anonymous namespace)::HeartbeatTask::fire()+435> eflags 0x10246 [ PF ZF IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x63 99 gs 0x0 0 st0 0 (raw 0x00000000000000000000) st1 0 (raw 0x00000000000000000000) st2 0 (raw 0x00000000000000000000) st3 0 (raw 0x00000000000000000000) st4 0 (raw 0x00000000000000000000) st5 0 (raw 0x00000000000000000000) st6 0 (raw 0x00000000000000000000) st7 0 (raw 0x00000000000000000000) fctrl 0x37f 895 fstat 0x0 0 ftag 0xffff 65535 fiseg 0x0 0 fioff 0x0 0 foseg 0x0 0 fooff 0x0 0 fop 0x0 0 (gdb) Using memory regions provided by the target. There are no memory regions defined. (gdb) From To Syms Read Shared Object Library 0x000000392983fef0 0x00000039298b8a18 Yes (*) /usr/lib64/libqpidclient.so.2 0x000000392a416a40 0x000000392a441018 Yes (*) /usr/lib64/libqmfconsole.so.2 0x00000035cf64f430 0x00000035cf6c3058 Yes (*) /usr/lib64/libstdc++.so.6 0x00000035cce03e60 0x00000035cce43e38 Yes (*) /lib64/libm.so.6 0x00000035cf201e50 0x00000035cf20b018 Yes (*) /lib64/libgcc_s.so.1 0x00000035cc61d780 0x00000035cc709ff8 Yes (*) /lib64/libc.so.6 0x000000392930ee00 0x000000392940f4f8 Yes (*) /usr/lib64/libqpidcommon.so.2 0x0000003456610aa0 0x000000345662dae8 Yes (*) /usr/lib64/libboost_program_options.so.2 0x00000035d1601500 0x00000035d1602918 Yes (*) /lib64/libuuid.so.1 0x00000035cc200a70 0x00000035cc21671e Yes (*) /lib64/ld-linux-x86-64.so.2 0x0000003457004810 0x000000345700cff8 Yes (*) /usr/lib64/libboost_filesystem.so.2 0x00000035cca00e10 0x00000035cca01a08 Yes (*) /lib64/libdl.so.2 0x00000035cda02220 0x00000035cda05cc8 Yes (*) /lib64/librt.so.1 0x00000035cfa046e0 0x00000035cfa13be8 Yes (*) /usr/lib64/libsasl2.so.2 0x00000035cd2051f0 0x00000035cd210258 Yes (*) /lib64/libpthread.so.0 0x00000035d06032a0 0x00000035d060e2d8 Yes (*) /lib64/libresolv.so.2 0x00000035ce6009f0 0x00000035ce606918 Yes (*) /lib64/libcrypt.so.1 0x00002ad22e4449f0 0x00002ad22e4512f8 Yes (*) /usr/lib64/qpid/client/sslconnector.so 0x00002ad22e672640 0x00002ad22e6890a8 Yes (*) /usr/lib64/libsslcommon.so.2 0x0000003000e183b0 0x0000003000ef6f48 Yes (*) /usr/lib64/libnss3.so 0x0000003001a085e0 0x0000003001a2b638 Yes (*) /usr/lib64/libssl3.so 0x00000035d020cf30 0x00000035d022b738 Yes (*) /usr/lib64/libnspr4.so 0x0000003001208340 0x0000003001212c38 Yes (*) /usr/lib64/libnssutil3.so 0x00000035cee01370 0x00000035cee02978 Yes (*) /usr/lib64/libplc4.so 0x00002ad22e890e10 0x00002ad22e891c08 Yes (*) /usr/lib64/libplds4.so 0x00000035cd601fd0 0x00000035cd60cac8 Yes (*) /usr/lib64/libz.so.1 0x00002aaaaaaadfb0 0x00002aaaaaaafbc8 Yes (*) /usr/lib64/sasl2/libanonymous.so.2 0x00002aaaaaccde60 0x00002aaaaad6b388 Yes (*) /usr/lib64/sasl2/libsasldb.so.2 0x00002aaaaaf8bfa0 0x00002aaaaaf8dd08 Yes (*) /usr/lib64/sasl2/liblogin.so.2 0x00002aaaab18ffb0 0x00002aaaab191d58 Yes (*) /usr/lib64/sasl2/libplain.so.2 (*): Shared library is missing debugging information. (gdb) 3 Thread 15870 0x00000035cc69a1a1 in nanosleep () from /lib64/libc.so.6 2 Thread 15873 0x00000035cc6d4108 in epoll_wait () from /lib64/libc.so.6 * 1 Thread 15874 0x0000003929860e73 in qpid::client::(anonymous namespace)::HeartbeatTask::fire (this=0x1424f9d0) at qpid/client/ConnectionImpl.cpp:152 (gdb) Thread 3 (Thread 15870): #0 0x00000035cc69a1a1 in nanosleep () from /lib64/libc.so.6 #1 0x00000035cc699fc4 in sleep () from /lib64/libc.so.6 #2 0x0000000000408f03 in main (argc=7, argv=0x7fffacab41c8) at subscribe.cpp:97 Thread 2 (Thread 15873): #0 0x00000035cc6d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x000000392932c33f in qpid::sys::Poller::wait (this=0x1424ec20, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524 #2 0x000000392932cd22 in qpid::sys::Poller::run (this=0x1424ec20) at qpid/sys/epoll/EpollPoller.cpp:479 #3 0x00000039293238da in qpid::sys::(anonymous namespace)::runRunnable (p=0x6) at qpid/sys/posix/Thread.cpp:35 #4 0x00000035cd20673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000035cc6d3d1d in clone () from /lib64/libc.so.6 Thread 1 (Thread 15874): #0 0x0000003929860e73 in qpid::client::(anonymous namespace)::HeartbeatTask::fire (this=0x1424f9d0) at qpid/client/ConnectionImpl.cpp:152 #1 0x00000039293faf60 in qpid::sys::Timer::run (this=0x3929ae0e40) at qpid/sys/Timer.cpp:119 #2 0x00000039293238da in qpid::sys::(anonymous namespace)::runRunnable ( p=0x1425b188) at qpid/sys/posix/Thread.cpp:35 #3 0x00000035cd20673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000035cc6d3d1d in clone () from /lib64/libc.so.6 (gdb) quit
Created attachment 422560 [details] The bz602268 reproducer Please run the reproducer following way (transcript from rhel5.5 x86_64 KVM guest): [root@dhcp-30-90 bz506758]# ./run.sh 4 10 aisexec (pid 15288) is running... Stopping OpenAIS daemon (aisexec): [ OK ] Starting OpenAIS daemon (aisexec): [ OK ] client (subscribe) compile, ecode:0 client[s] ready Client[s] compile .ready Launching the subscribers (before brokers):..........done starting brokers in the cluster:....done tcp 0 0 0.0.0.0:5672 0.0.0.0:* LISTEN 15911/qpidd tcp 0 0 0.0.0.0:10001 0.0.0.0:* LISTEN 15922/qpidd tcp 0 0 0.0.0.0:10002 0.0.0.0:* LISTEN 15932/qpidd tcp 0 0 0.0.0.0:10003 0.0.0.0:* LISTEN 15942/qpidd broker[s] running (pids:15911 15922 15932 15942 , ports:5672 10001 10002 10003 ) Waiting for clients... 10.10../run.sh: line 319: 15849 Segmentation fault (core dumped) ./${client_app} --queue "queue-${i_fmt}" --retry-interval 1 --url amqp:tcp:127.0.0.1:5672,tcp:127.0.0.1:10001,tcp:127.0.0.1:10002,tcp:127.0.0.1:10003,tcp:127.0.0.1:10000 > ${client_app}.${i_fmt}.log 2>&1 ./run.sh: line 319: 15865 Segmentation fault (core dumped) ./${client_app} --queue "queue-${i_fmt}" --retry-interval 1 --url amqp:tcp:127.0.0.1:5672,tcp:127.0.0.1:10001,tcp:127.0.0.1:10002,tcp:127.0.0.1:10003,tcp:127.0.0.1:10000 > ${client_app}.${i_fmt}.log 2>&1 ./run.sh: line 319: 15856 Segmentation fault (core dumped) ./${client_app} --queue "queue-${i_fmt}" --retry-interval 1 --url amqp:tcp:127.0.0.1:5672,tcp:127.0.0.1:10001,tcp:127.0.0.1:10002,tcp:127.0.0.1:10003,tcp:127.0.0.1:10000 > ${client_app}.${i_fmt}.log 2>&1 ./run.sh: line 319: 15870 Segmentation fault (core dumped) ./${client_app} --queue "queue-${i_fmt}" --retry-interval 1 --url amqp:tcp:127.0.0.1:5672,tcp:127.0.0.1:10001,tcp:127.0.0.1:10002,tcp:127.0.0.1:10003,tcp:127.0.0.1:10000 > ${client_app}.${i_fmt}.log 2>&1 ./run.sh: line 319: 15877 Segmentation fault (core dumped) ./${client_app} --queue "queue-${i_fmt}" --retry-interval 1 --url amqp:tcp:127.0.0.1:5672,tcp:127.0.0.1:10001,tcp:127.0.0.1:10002,tcp:127.0.0.1:10003,tcp:127.0.0.1:10000 > ${client_app}.${i_fmt}.log 2>&1 ./run.sh: line 319: 15884 Segmentation fault (core dumped) ./${client_app} --queue "queue-${i_fmt}" --retry-interval 1 --url amqp:tcp:127.0.0.1:5672,tcp:127.0.0.1:10001,tcp:127.0.0.1:10002,tcp:127.0.0.1:10003,tcp:127.0.0.1:10000 > ${client_app}.${i_fmt}.log 2>&1 ./run.sh: line 319: 15895 Segmentation fault (core dumped) ./${client_app} --queue "queue-${i_fmt}" --retry-interval 1 --url amqp:tcp:127.0.0.1:5672,tcp:127.0.0.1:10001,tcp:127.0.0.1:10002,tcp:127.0.0.1:10003,tcp:127.0.0.1:10000 > ${client_app}.${i_fmt}.log 2>&1 ./run.sh: line 319: 15898 Segmentation fault (core dumped) ./${client_app} --queue "queue-${i_fmt}" --retry-interval 1 --url amqp:tcp:127.0.0.1:5672,tcp:127.0.0.1:10001,tcp:127.0.0.1:10002,tcp:127.0.0.1:10003,tcp:127.0.0.1:10000 > ${client_app}.${i_fmt}.log 2>&1 ./run.sh: line 319: 15905 Segmentation fault (core dumped) ./${client_app} --queue "queue-${i_fmt}" --retry-interval 1 --url amqp:tcp:127.0.0.1:5672,tcp:127.0.0.1:10001,tcp:127.0.0.1:10002,tcp:127.0.0.1:10003,tcp:127.0.0.1:10000 > ${client_app}.${i_fmt}.log 2>&1 ./run.sh: line 319: 15914 Segmentation fault (core dumped) ./${client_app} --queue "queue-${i_fmt}" --retry-interval 1 --url amqp:tcp:127.0.0.1:5672,tcp:127.0.0.1:10001,tcp:127.0.0.1:10002,tcp:127.0.0.1:10003,tcp:127.0.0.1:10000 > ${client_app}.${i_fmt}.log 2>&1 test broker[s]:15911 15922 15932 15942 done .PASSED stopping brokers... 2010-06-09 15:39:16 stopping clients... 2010-06-09 15:39:16 subscribe: no process killed OK ERROR: Client[s] failed! (ecodes:139139139139139139139139139139) ERROR: Following core files found: core.15849: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'subscribe' core.15856: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'subscribe' core.15865: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'subscribe' core.15870: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'subscribe' core.15877: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'subscribe' core.15884: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'subscribe' core.15895: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'subscribe' core.15898: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'subscribe' core.15905: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'subscribe' core.15914: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'subscribe' Broker status: qpidd_0.ecode:0 qpidd_1.ecode:0 qpidd_2.ecode:0 qpidd_3.ecode:0 qpidd_0.ecode: 2010-06-09 15:39:16.000000000 +0200 qpidd_1.ecode: 2010-06-09 15:39:16.000000000 +0200 qpidd_2.ecode: 2010-06-09 15:39:16.000000000 +0200 qpidd_3.ecode: 2010-06-09 15:39:16.000000000 +0200 package qpidd is not installed package qpidd-cluster is not installed openais-0.80.6-16.el5_5.1 Test Summary: TEST FAILED !!! (2 failures) dur: 50 secs
*** This bug has been marked as a duplicate of bug 591292 ***