Hide Forgot
Description of problem: Due my testing of condor with qmf plugins I found some core files from qpidd - segmentation fault. Version-Release number of selected component (if applicable): qpid-cpp-client-devel-0.10-6.el6.x86_64 qpid-cpp-server-devel-0.10-6.el6.x86_64 qpid-cpp-server-store-0.10-6.el6.x86_64 qpid-cpp-server-0.10-6.el6.x86_64 qpid-cpp-client-rdma-0.10-6.el6.x86_64 qpid-cpp-server-rdma-0.10-6.el6.x86_64 rh-qpid-cpp-tests-0.10-6.el6.x86_64 qpid-cpp-server-xml-0.10-6.el6.x86_64 qpid-cpp-client-devel-docs-0.10-6.el6.noarch qpid-cpp-debuginfo-0.10-6.el6.x86_64 qpid-cpp-client-0.10-6.el6.x86_64 qpid-cpp-client-ssl-0.10-6.el6.x86_64 qpid-cpp-server-ssl-0.10-6.el6.x86_64 qpid-cpp-server-cluster-0.10-6.el6.x86_64 condor-7.6.1-0.10.el6.x86_64 condor-qmf-7.6.1-0.10.el6.x86_64 qpid-qmf-0.10-10.el6.x86_64 ruby-qpid-qmf-0.10-10.el6.x86_64 python-condorutils-1.5-3.el6.noarch condor-wallaby-tools-4.0-6.el6.noarch condor-classads-7.6.1-0.10.el6.x86_64 condor-aviary-7.6.1-0.10.el6.x86_64 condor-kbdd-7.6.1-0.10.el6.x86_64 condor-debuginfo-7.6.1-0.10.el6.x86_64 python-qpid-qmf-0.10-10.el6.x86_64 condor-wallaby-base-db-1.13-1.el6.noarch condor-wallaby-client-4.0-6.el6.noarch condor-vm-gahp-7.6.1-0.10.el6.x86_64 How reproducible: about 20% of restarts while condor is going down Steps to Reproduce: 1. setup condor with qmf (I'm not sure if it depends on) 2. toggle restart condor and qpidd 3. take a look at /var/lib/qpidd/.qpidd/core* -rw-------. 1 qpidd qpidd 61415424 Jun 8 18:42 /var/lib/qpidd/.qpidd/core.22486 -rw-------. 1 qpidd qpidd 55955456 Jun 8 18:10 /var/lib/qpidd/.qpidd/core.24074 -rw-------. 1 qpidd qpidd 75616256 Jun 8 18:14 /var/lib/qpidd/.qpidd/core.27705 -rw-------. 1 qpidd qpidd 54710272 Jun 8 18:16 /var/lib/qpidd/.qpidd/core.30336 -rw-------. 1 qpidd qpidd 68251648 Jun 8 18:56 /var/lib/qpidd/.qpidd/core.6791 -rw-------. 1 qpidd qpidd 81121280 Jun 8 18:28 /var/lib/qpidd/.qpidd/core.9356 Actual results: Qpidd segmentation fault after stop Expected results: No segfault Additional info: (gdb) info threads * 1 Thread 0x7fb416aa17a0 (LWP 22486) 0x00007fb4145ff76e in memcpy () from /lib64/libc.so.6 (gdb) bt #0 0x00007fb4145ff76e in memcpy () from /lib64/libc.so.6 #1 0x00007fb414e3f1e6 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_M_clone(std::allocator<char> const&, unsigned long) () from /usr/lib64/libstdc++.so.6 #2 0x00007fb414e3f28c in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6 #3 0x00007fb4165f6a60 in ObjectId (this=0x1854e60, __in_chrg=<value optimized out>) at ../include/qpid/management/ManagementObject.h:51 #4 getObjectId (this=0x1854e60, __in_chrg=<value optimized out>) at ../include/qpid/management/ManagementObject.h:199 #5 qpid::management::ManagementAgent::RemoteAgent::~RemoteAgent (this=0x1854e60, __in_chrg=<value optimized out>) at qpid/management/ManagementAgent.cpp:113 #6 0x00007fb4165f6c29 in qpid::management::ManagementAgent::RemoteAgent::~RemoteAgent (this=0x1854e60, __in_chrg=<value optimized out>) at qpid/management/ManagementAgent.cpp:115 #7 0x00007fb4165133c9 in release (this=<value optimized out>, __in_chrg=<value optimized out>) at /usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:145 #8 boost::detail::shared_count::~shared_count (this=<value optimized out>, __in_chrg=<value optimized out>) at /usr/include/boost/smart_ptr/detail/shared_count.hpp:217 #9 0x00007fb416610a7e in std::_Rb_tree<qpid::management::ObjectId, std::pair<qpid::management::ObjectId const, boost::shared_ptr<qpid::management::ManagementAgent::RemoteAgent> >, std::_Select1st<std::pair<qpid::management::ObjectId const, boost::shared_ptr<qpid::management::ManagementAgent::RemoteAgent> > >, std::less<qpid::management::ObjectId>, std::allocator<std::pair<qpid::management::ObjectId const, boost::shared_ptr<qpid::management::ManagementAgent::RemoteAgent> > > >::_M_erase(std::_Rb_tree_node<std::pair<qpid::management::ObjectId const, boost::shared_ptr<qpid::management::ManagementAgent::RemoteAgent> > >*) () from /usr/lib64/libqpidbroker.so.5.0.0 #10 0x00007fb416602750 in ~_Rb_tree (this=0x7fb416a66010, __in_chrg=<value optimized out>) at /usr/include/c++/4.4.5/bits/stl_tree.h:614 #11 ~map (this=0x7fb416a66010, __in_chrg=<value optimized out>) at /usr/include/c++/4.4.5/bits/stl_map.h:87 #12 qpid::management::ManagementAgent::~ManagementAgent (this=0x7fb416a66010, __in_chrg=<value optimized out>) at qpid/management/ManagementAgent.cpp:158 #13 0x00007fb416602939 in qpid::management::ManagementAgent::~ManagementAgent (this=0x7fb416a66010, __in_chrg=<value optimized out>) at qpid/management/ManagementAgent.cpp:158 #14 0x00007fb416520103 in ~auto_ptr (this=0x170d220, __in_chrg=<value optimized out>) at /usr/include/c++/4.4.5/backward/auto_ptr.h:168 #15 qpid::broker::Broker::~Broker (this=0x170d220, __in_chrg=<value optimized out>) at qpid/broker/Broker.cpp:405 #16 0x00007fb4165206b9 in qpid::broker::Broker::~Broker (this=0x170d220, __in_chrg=<value optimized out>) at qpid/broker/Broker.cpp:405 #17 0x000000000040ee5b in QpiddDaemon::child() () #18 0x00007fb41653be43 in qpid::broker::Daemon::fork (this=0x7fffff138020) at qpid/broker/Daemon.cpp:91 #19 0x000000000040ddfd in QpiddBroker::execute (this=<value optimized out>, options=<value optimized out>) at posix/QpiddBroker.cpp:179 #20 0x000000000040a1f2 in main (argc=4, argv=0x7fffff1385e8) at qpidd.cpp:80 Note: I finished only test over RHEL6/x86_64 and still waiting for other platforms. I will post a comment with additional info from other platforms.
I tested on KVM virtual guest with x86_64, RHEL6.1 (Santiago), 1 core, 512MB RAM. My other tests on real systems (RHEL5/6, both platforms) with same RHEL and CPU >= 2 were all negative. It looks to be harder to reproduce. In 100 restarts I can find 7 core dumps in virtual system at same code while qpidd is shutting down.
I can't reproduce this after new installation of qpidd and about 10000 tries. If you see this again please reopen.