Description of problem: "Very frequent" restarting qpidd service daemon while wallaby daemon is running terminates qpidd with segmentation fault. #0 0x2e656863 in ?? () #1 0x0562a64a in qpid::management::ManagementAgent::DeletedObject::DeletedObject (this=0x852ea70, src=0xb5f57df8, v1=true, v2=true) at qpid/management/ManagementAgent.cpp:2822 #2 0x05635e59 in qpid::management::ManagementAgent::moveNewObjectsLH ( this=0xb6e23008) at qpid/management/ManagementAgent.cpp:679 #3 0x056491f9 in qpid::management::ManagementAgent::~ManagementAgent ( this=0xb6e23008, __in_chrg=<value optimized out>) at qpid/management/ManagementAgent.cpp:153 #4 0x05649863 in qpid::management::ManagementAgent::~ManagementAgent ( this=0xb6e23008, __in_chrg=<value optimized out>) at qpid/management/ManagementAgent.cpp:162 #5 0x055338e2 in ~auto_ptr (this=0x84ed4f8, __in_chrg=<value optimized out>) at /usr/include/c++/4.4.6/backward/auto_ptr.h:168 #6 qpid::broker::Broker::~Broker (this=0x84ed4f8, __in_chrg=<value optimized out>) at qpid/broker/Broker.cpp:426 #7 0x05533f53 in qpid::broker::Broker::~Broker (this=0x84ed4f8, __in_chrg=<value optimized out>) at qpid/broker/Broker.cpp:426 #8 0x05522e16 in qpid::RefCounted::released (this=0x84ed510) at qpid/RefCounted.h:48 #9 0x08055ec4 in release (this=0xbfabe60c) at qpid/RefCounted.h:42 #10 intrusive_ptr_release<qpid::broker::Broker> (this=0xbfabe60c) ---Type <return> to continue, or q <return> to quit--- at qpid/RefCounted.h:59 #11 ~intrusive_ptr (this=0xbfabe60c) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:101 #12 QpiddDaemon::child (this=0xbfabe60c) at posix/QpiddBroker.cpp:144 #13 0x055513f0 in qpid::broker::Daemon::fork (this=0xbfabe60c) at qpid/broker/Daemon.cpp:91 #14 0x08054778 in QpiddBroker::execute (this=0xbfabe87f, options=0x84e8a20) at posix/QpiddBroker.cpp:182 #15 0x08050d79 in run_broker (argc=4, argv=0xbfabe964, hidden=false) at qpidd.cpp:83 #16 0x08054074 in main (argc=4, argv=0xbfabe964) at posix/QpiddBroker.cpp:202 Version-Release number of selected component (if applicable): rpm -qa | grep -P '(wallaby|qpid|sesame|condor|qmf)' | sort -u condor-7.6.5-0.12.el6.i686 condor-classads-7.6.5-0.12.el6.i686 condor-qmf-7.6.5-0.12.el6.i686 condor-wallaby-base-db-1.19-1.el6.noarch condor-wallaby-client-4.1.2-1.el6.noarch condor-wallaby-tools-4.1.2-1.el6.noarch python-condorutils-1.5-4.el6.noarch python-qpid-0.14-2.el6.noarch python-qpid-qmf-0.14-3.el6.i686 python-wallaby-0.12.5-1.el6.noarch python-wallabyclient-4.1.2-1.el6.noarch qpid-cpp-client-0.14-6.el6.i686 qpid-cpp-debuginfo-0.14-6.el6.i686 qpid-cpp-server-0.14-6.el6.i686 qpid-qmf-0.14-3.el6.i686 qpid-tools-0.14-1.el6.noarch ruby-qpid-qmf-0.14-3.el6.i686 ruby-wallaby-0.12.5-1.el6.noarch sesame-1.0-2.el6.i686 wallaby-0.12.5-1.el6.noarch wallaby-utils-0.12.5-1.el6.noarch How reproducible: 100% Steps to Reproduce: 0. enable coredump generation 1. >/var/lib/qpidd/qpidd.log ; service qpidd restart; service wallaby restart;sleep 3.5; echo -ne 'Y\ny\n*\n*\nlocalhost\nn\ny\n' | condor_configure_pool -a -n `hostname` -f Master,ExecuteNode,NodeAccess; service qpidd restart 2. service qpidd restat 3. service wallaby start 4. watch 'service qpidd restart' 5. ls -lSrh /var/lib/qpidd/.qpidd/ (or see steps taken here https://bugzilla.redhat.com/show_bug.cgi?id=756446 ) Actual results: high ration of service qpidd restart, produsec coredumps Expected results: no cores Additional info:
Ted, please do a brief assessment.
*** Bug 845368 has been marked as a duplicate of this bug. ***
This is a failure on shutdown. There is a fix, but it's too disruptive to include in 2.3 (it involves using a new pointer abstraction). As a result, I'm bumping this to the next release.
It's possible this is resolved. Ken, do the 2.3 qmf fixes have any effect on this?
MMmmmmm... I can't say that it does: the locking pattern in the 0.18 code (which has the latest fix) is very different from the 0.14 stuff. As far as I can tell, the 0.14 stuff does the locking correctly and didn't have the bug we fixed in 0.18. So I'd err in favor of caution and say "no". But the code has changed enough under 0.18 that trying to repro against the 0.18 release would be wise.
Just a followup - I can repro this easily using the above packages on RHEL6.4 - same crash dump signature. But, no luck with 0.18, or upstream 0.22 prerelease - same system. Given the extent of the code changes to that path, it's likely the defect no longer exists in the later releases. So I think it is no longer an issue - but I'd like QE to verify it no longer presents for 0.18 onward, just to be sure.
Verified rhel 6.4 i686 rpm -qa | grep -P '(qpid|condor|qmf|wallaby|sesame)' | sort -n condor-7.8.8-0.4.1.el6.i686 condor-classads-7.8.8-0.4.1.el6.i686 condor-qmf-7.8.8-0.4.1.el6.i686 condor-wallaby-base-db-1.25-1.el6_3.noarch condor-wallaby-client-5.0.5-2.el6.noarch condor-wallaby-tools-5.0.5-2.el6.noarch python-condorutils-1.5-6.el6.noarch python-qpid-0.18-4.el6.noarch python-qpid-qmf-0.18-15.el6.i686 python-wallaby-0.16.3-1.el6.noarch python-wallabyclient-5.0.5-2.el6.noarch qpid-cpp-client-0.18-14.el6.i686 qpid-cpp-client-devel-docs-0.22-9.el6.noarch qpid-cpp-server-0.18-14.el6.i686 qpid-qmf-0.18-15.el6.i686 qpid-tools-0.18-8.el6.noarch ruby-condor-wallaby-5.0.5-2.el6.noarch ruby-qpid-qmf-0.18-15.el6.i686 ruby-wallaby-0.16.3-1.el6.noarch sesame-1.0-8.el6.i686 wallaby-0.16.3-1.el6.noarch wallaby-utils-0.16.3-1.el6.noarch Verified rhel 6.4 x86_64 condor-7.8.8-0.4.1.el6.x86_64 condor-classads-7.8.8-0.4.1.el6.x86_64 condor-wallaby-base-db-1.25-1.el6_3.noarch condor-wallaby-tools-5.0.5-2.el6.noarch python-qpid-0.18-4.el6.noarch python-qpid-qmf-0.18-15.el6.x86_64 qpid-cpp-client-0.18-14.el6.x86_64 qpid-cpp-client-devel-0.18-14.el6.x86_64 qpid-cpp-client-devel-docs-0.18-14.el6.noarch qpid-cpp-client-rdma-0.18-14.el6.x86_64 qpid-cpp-client-ssl-0.18-14.el6.x86_64 qpid-cpp-debuginfo-0.14-22.el6_3.x86_64 qpid-cpp-server-0.18-14.el6.x86_64 qpid-cpp-server-cluster-0.18-14.el6.x86_64 qpid-cpp-server-devel-0.18-14.el6.x86_64 qpid-cpp-server-rdma-0.18-14.el6.x86_64 qpid-cpp-server-ssl-0.18-14.el6.x86_64 qpid-cpp-server-store-0.18-14.el6.x86_64 qpid-cpp-server-xml-0.18-14.el6.x86_64 qpid-java-client-0.18-7.el6.noarch qpid-java-common-0.18-7.el6.noarch qpid-java-example-0.18-7.el6.noarch qpid-jca-0.18-8.el6.noarch qpid-jca-xarecovery-0.18-8.el6.noarch qpid-proton-c-0.4-2.2.el6.x86_64 qpid-proton-c-devel-0.4-2.2.el6.x86_64 qpid-qmf-0.18-15.el6.x86_64 qpid-qmf-debuginfo-0.14-14.el6_3.x86_64 qpid-qmf-devel-0.18-15.el6.x86_64 qpid-tests-0.18-2.el6.noarch qpid-tools-0.18-8.el6.noarch ruby-condor-wallaby-5.0.5-2.el6.noarch ruby-qpid-qmf-0.18-15.el6.x86_64 ruby-wallaby-0.16.3-1.el6.noarch sesame-1.0-8.el6.x86_64 wallaby-0.16.3-1.el6.noarch wallaby-utils-0.16.3-1.el6.noarch Reproduced rhel 6.4 i686 condor-7.6.5-0.22.el6.i686 condor-classads-7.6.5-0.22.el6.i686 condor-qmf-7.6.5-0.22.el6.i686 condor-wallaby-base-db-1.23-1.el6.noarch condor-wallaby-client-4.1.3-1.el6.noarch condor-wallaby-tools-4.1.3-1.el6.noarch python-condorutils-1.5-4.el6.noarch python-qpid-0.14-11.el6_3.noarch python-qpid-qmf-0.14-14.el6_3.i686 python-wallaby-0.12.5-1.el6.noarch python-wallabyclient-4.1.3-1.el6.noarch qpid-cpp-client-0.14-22.el6_3.i686 qpid-cpp-server-0.14-22.el6_3.i686 qpid-qmf-0.14-14.el6_3.i686 ruby-qpid-qmf-0.14-14.el6_3.i686 ruby-wallaby-0.12.5-1.el6.noarch sesame-1.0-6.el6.i686 wallaby-0.12.5-1.el6.noarch wallaby-utils-0.12.5-1.el6.noarch Verification/reproduction method: - enable cores for daemons - install condor, wallaby, sesame, qpid - set qpidd.conf to auth=no - service start qpidd; service start wallaby - condor_configure_store -a -f Master,ExecuteNode,NodeAccess -n `hostname` - condor_configure_pool -a -n `hostname` -f Master,ExecuteNode,NodeAccess - service qpidd restart - service wallaby restart - watch 'service qpidd restart' - look for cores after 10 minutes Fail->cores Success->no cores
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1296.html