Running cluster_test in a loop with the store enabled (through the store run_cluster_test script), I noticed the following error occurs occasionally (about once in about 50 runs): fork1: 2009-07-01 14:50:58 critical 10.16.16.49:25327(UPDATEE) catch-up connection closed prematurely 10.16.16.49:25327-1(local,catchup) and the test fails. Core file attached. My trunk revision: 790164 Backtrace: #0 memcpy () at ../sysdeps/x86_64/memcpy.S:509 #1 0x00000030ad0a3075 in std::char_traits<char>::copy () at /usr/src/debug/gcc-4.3.2-20081105/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/char_traits.h:274 #2 std::string::_M_copy (__n=<value optimized out>, __s=<value optimized out>, __d=<value optimized out>) at /usr/src/debug/gcc-4.3.2-20081105/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.h:344 #3 std::string::append (this=0x7ffff78a9830, __str=@0x7f6db7836b88) at /usr/src/debug/gcc-4.3.2-20081105/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.tcc:331 #4 0x00007f6db7590aa0 in operator+<char, std::char_traits<char>, std::allocator<char> > () at /usr/lib/gcc/x86_64-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/basic_string.tcc:677 #5 qpid::management::ManagementAgent::raiseEvent (this=0x7f6db449a010, event=<value optimized out>, severity=<value optimized out>) at qpid/management/ManagementAgent.cpp:211 #6 0x00007f6db74fdade in ~Connection (this=0x17b1808) at qpid/broker/Connection.cpp:127 #7 0x00007f6db68a858f in ~Connection (this=0x17b1710) at qpid/cluster/Connection.cpp:111 #8 0x00007f6db689419c in qpid::RefCounted::release () at ./qpid/RefCounted.h:42 #9 intrusive_ptr_release (p=<value optimized out>) at ./qpid/RefCounted.h:57 #10 ~intrusive_ptr () at /usr/include/boost/intrusive_ptr.hpp:83 #11 ~pair () at /usr/lib/gcc/x86_64-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_pair.h:73 #12 __gnu_cxx::new_allocator<std::pair<qpid::cluster::ConnectionId const, boost::intrusive_ptr<qpid::cluster::Connection> > >::destroy () at /usr/lib/gcc/x86_64-redhat-linux/4.3.2/../../../../include/c++/4.3.2/ext/new_allocator.h:118 #13 std::_Rb_tree<qpid::cluster::ConnectionId, std::pair<qpid::cluster::ConnectionId const, boost::intrusive_ptr<qpid::cluster::Connection> >, std::_Select1st<std::pair<qpid::cluster::ConnectionId const, boost::intrusive_ptr<qpid::cluster::Connection> > >, std::less<qpid::cluster::ConnectionId>, std::allocator<std::pair<qpid::cluster::ConnectionId const, boost::intrusive_ptr<qpid::cluster::Connection> > > >::_M_destroy_node () at /usr/lib/gcc/x86_64-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_tree.h:390 #14 std::_Rb_tree<qpid::cluster::ConnectionId, std::pair<qpid::cluster::ConnectionId const, boost::intrusive_ptr<qpid::cluster::Connection> >, std::_Select1st<std::pair<qpid::cluster::ConnectionId const, boost::intrusive_ptr<qpid::cluster::Connection> > >, std::less<qpid::cluster::ConnectionId>, std::allocator<std::pair<qpid::cluster::ConnectionId const, boost::intrusive_ptr<qpid::cluster::Connection> > > >::_M_erase (this=0x1760350, __x=0x176e730) at /usr/lib/gcc/x86_64-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_tree.h:943 #15 0x00007f6db688bdc9 in ~_Rb_tree () at /usr/lib/gcc/x86_64-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_tree.h:585 #16 ~map () at /usr/lib/gcc/x86_64-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_map.h:92 #17 ~Cluster (this=0x175fbd0) at qpid/cluster/Cluster.cpp:219 #18 0x00007f6db6889e25 in qpid::cluster::Cluster::brokerShutdown (this=0x175fbd0) at qpid/cluster/Cluster.cpp:573 #19 0x00007f6db711ff37 in boost::function0<void, std::allocator<void> >::operator() (this=<value optimized out>) at /usr/include/boost/function/function_template.hpp:692 #20 0x00007f6db711fa3a in for_each<__gnu_cxx::__normal_iterator<boost::function<void ()(), std::allocator<void> >*, std::vector<boost::function<void ()(), std::allocator<void> >, std::allocator<boost::function<void ()(), std::allocator<void> > > > >, void (*)(boost::function<void ()(), std::allocator<void> >)> () at /usr/lib/gcc/x86_64-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_algo.h:3791 #21 qpid::Plugin::Target::finalize (this=0x175dfa8) at qpid/Plugin.cpp:45 #22 0x00007f6db74d28c4 in ~Broker (this=0x175dfa0) at qpid/broker/Broker.cpp:338 #23 0x000000309e636960 in __cxa_finalize (d=0x7f6db7825880) at cxa_finalize.c:56 #24 0x00007f6db749ba86 in __do_global_dtors_aux () from /home/kpvdr/mrg/qpid/cpp/src/.libs/libqpidbroker.so.0 #25 0x0000000000405a20 in std::basic_streambuf<char, std::char_traits<char> >::~basic_streambuf () #26 0x00007ffff78aa0e0 in ?? () #27 0x00007f6db759c281 in _fini () from /home/kpvdr/mrg/qpid/cpp/src/.libs/libqpidbroker.so.0 #28 0x0000000000000019 in ?? () #29 0x0000000001751230 in ?? () #30 0x0000000001752a50 in ?? () #31 0x0000000001754dc0 in ?? () #32 0x00007f6db7839778 in ?? () #33 0x00007f6db7402000 in ?? () #34 0x00007f6db7402508 in ?? () #35 0x00007f6db7402a08 in ?? () #36 0x00007f6db6b33000 in ?? () #37 0x00007f6db6b32000 in ?? () #38 0x00007f6db6b334c8 in ?? () #39 0x00007f6db6b33990 in ?? () #40 0x0000000001755370 in ?? () #41 0x00007f6db6b324d8 in ?? () #42 0x00007f6db6b329a8 in ?? () #43 0x00007f6db6b31000 in ?? () #44 0x00007f6db6b31990 in ?? () #45 0x00007f6db6b30128 in ?? () #46 0x00007f6db6b305f0 in ?? () #47 0x0000000001752110 in ?? () #48 0x00000000017525b0 in ?? () #49 0x00007f6db6b314c8 in ?? () #50 0x000000309d4204e8 in _rtld_local () from /lib64/ld-2.9.so #51 0x0000000001755840 in ?? () #52 0x0000000000000000 in ?? ()
Core file too big, did not successfully attach.
Further testing has shown that the above trace may NOT be connected with this failure; I have managed to run the test several times without a test failure, but several core files have been left behind with the same trace.
The backtrace looks like a seg-fault in ManagementAgent, reassigning to tross. The cluster error reported is consistent with a cluster member crashing due to a seg fault.
This backtrace has been transferred to Bug 509436. I have not yet isolated a core file for this bug, even after ~500 runs.
There was a cluster-shutdown bug (BZ490855) that looks like it might be causing this. BZ490855 was fixed upstream in version 823258. Is this still occurring or might it be marked duplicate? -Ted
*** This bug has been marked as a duplicate of bug 490855 ***