Bug 711833

Summary: qpidd segfault with signal 11, condor with qmf tested (case: qpid stopped, then condor daemons with qmf plugins stopped)
Product: Red Hat Enterprise MRG Reporter: Tomas Rusnak <trusnak>
Component: qpid-cppAssignee: messaging-bugs <messaging-bugs>
Status: CLOSED INSUFFICIENT_DATA QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: high    
Version: DevelopmentCC: freznice, jross, matt, tross
Target Milestone: 2.1   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-20 12:14:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Tomas Rusnak 2011-06-08 17:06:35 UTC
Description of problem:
Due my testing of condor with qmf plugins I found some core files from qpidd - segmentation fault.

Version-Release number of selected component (if applicable):
qpid-cpp-client-devel-0.10-6.el6.x86_64
qpid-cpp-server-devel-0.10-6.el6.x86_64
qpid-cpp-server-store-0.10-6.el6.x86_64
qpid-cpp-server-0.10-6.el6.x86_64
qpid-cpp-client-rdma-0.10-6.el6.x86_64
qpid-cpp-server-rdma-0.10-6.el6.x86_64
rh-qpid-cpp-tests-0.10-6.el6.x86_64
qpid-cpp-server-xml-0.10-6.el6.x86_64
qpid-cpp-client-devel-docs-0.10-6.el6.noarch
qpid-cpp-debuginfo-0.10-6.el6.x86_64
qpid-cpp-client-0.10-6.el6.x86_64
qpid-cpp-client-ssl-0.10-6.el6.x86_64
qpid-cpp-server-ssl-0.10-6.el6.x86_64
qpid-cpp-server-cluster-0.10-6.el6.x86_64
condor-7.6.1-0.10.el6.x86_64
condor-qmf-7.6.1-0.10.el6.x86_64
qpid-qmf-0.10-10.el6.x86_64
ruby-qpid-qmf-0.10-10.el6.x86_64
python-condorutils-1.5-3.el6.noarch
condor-wallaby-tools-4.0-6.el6.noarch
condor-classads-7.6.1-0.10.el6.x86_64
condor-aviary-7.6.1-0.10.el6.x86_64
condor-kbdd-7.6.1-0.10.el6.x86_64
condor-debuginfo-7.6.1-0.10.el6.x86_64
python-qpid-qmf-0.10-10.el6.x86_64
condor-wallaby-base-db-1.13-1.el6.noarch
condor-wallaby-client-4.0-6.el6.noarch
condor-vm-gahp-7.6.1-0.10.el6.x86_64

How reproducible:
about 20% of restarts while condor is going down

Steps to Reproduce:
1. setup condor with qmf (I'm not sure if it depends on)
2. toggle restart condor and qpidd 
3. take a look at /var/lib/qpidd/.qpidd/core*

-rw-------. 1 qpidd qpidd 61415424 Jun  8 18:42 /var/lib/qpidd/.qpidd/core.22486
-rw-------. 1 qpidd qpidd 55955456 Jun  8 18:10 /var/lib/qpidd/.qpidd/core.24074
-rw-------. 1 qpidd qpidd 75616256 Jun  8 18:14 /var/lib/qpidd/.qpidd/core.27705
-rw-------. 1 qpidd qpidd 54710272 Jun  8 18:16 /var/lib/qpidd/.qpidd/core.30336
-rw-------. 1 qpidd qpidd 68251648 Jun  8 18:56 /var/lib/qpidd/.qpidd/core.6791
-rw-------. 1 qpidd qpidd 81121280 Jun  8 18:28 /var/lib/qpidd/.qpidd/core.9356
  
Actual results:
Qpidd segmentation fault after stop

Expected results:
No segfault

Additional info:
(gdb) info threads
* 1 Thread 0x7fb416aa17a0 (LWP 22486)  0x00007fb4145ff76e in memcpy () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fb4145ff76e in memcpy () from /lib64/libc.so.6
#1  0x00007fb414e3f1e6 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_M_clone(std::allocator<char> const&, unsigned long) () from /usr/lib64/libstdc++.so.6
#2  0x00007fb414e3f28c in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6
#3  0x00007fb4165f6a60 in ObjectId (this=0x1854e60, __in_chrg=<value optimized out>) at ../include/qpid/management/ManagementObject.h:51
#4  getObjectId (this=0x1854e60, __in_chrg=<value optimized out>) at ../include/qpid/management/ManagementObject.h:199
#5  qpid::management::ManagementAgent::RemoteAgent::~RemoteAgent (this=0x1854e60, __in_chrg=<value optimized out>) at qpid/management/ManagementAgent.cpp:113
#6  0x00007fb4165f6c29 in qpid::management::ManagementAgent::RemoteAgent::~RemoteAgent (this=0x1854e60, __in_chrg=<value optimized out>) at qpid/management/ManagementAgent.cpp:115
#7  0x00007fb4165133c9 in release (this=<value optimized out>, __in_chrg=<value optimized out>) at /usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:145
#8  boost::detail::shared_count::~shared_count (this=<value optimized out>, __in_chrg=<value optimized out>) at /usr/include/boost/smart_ptr/detail/shared_count.hpp:217
#9  0x00007fb416610a7e in std::_Rb_tree<qpid::management::ObjectId, std::pair<qpid::management::ObjectId const, boost::shared_ptr<qpid::management::ManagementAgent::RemoteAgent> >, std::_Select1st<std::pair<qpid::management::ObjectId const, boost::shared_ptr<qpid::management::ManagementAgent::RemoteAgent> > >, std::less<qpid::management::ObjectId>, std::allocator<std::pair<qpid::management::ObjectId const, boost::shared_ptr<qpid::management::ManagementAgent::RemoteAgent> > > >::_M_erase(std::_Rb_tree_node<std::pair<qpid::management::ObjectId const, boost::shared_ptr<qpid::management::ManagementAgent::RemoteAgent> > >*) () from /usr/lib64/libqpidbroker.so.5.0.0
#10 0x00007fb416602750 in ~_Rb_tree (this=0x7fb416a66010, __in_chrg=<value optimized out>) at /usr/include/c++/4.4.5/bits/stl_tree.h:614
#11 ~map (this=0x7fb416a66010, __in_chrg=<value optimized out>) at /usr/include/c++/4.4.5/bits/stl_map.h:87
#12 qpid::management::ManagementAgent::~ManagementAgent (this=0x7fb416a66010, __in_chrg=<value optimized out>) at qpid/management/ManagementAgent.cpp:158
#13 0x00007fb416602939 in qpid::management::ManagementAgent::~ManagementAgent (this=0x7fb416a66010, __in_chrg=<value optimized out>) at qpid/management/ManagementAgent.cpp:158
#14 0x00007fb416520103 in ~auto_ptr (this=0x170d220, __in_chrg=<value optimized out>) at /usr/include/c++/4.4.5/backward/auto_ptr.h:168
#15 qpid::broker::Broker::~Broker (this=0x170d220, __in_chrg=<value optimized out>) at qpid/broker/Broker.cpp:405
#16 0x00007fb4165206b9 in qpid::broker::Broker::~Broker (this=0x170d220, __in_chrg=<value optimized out>) at qpid/broker/Broker.cpp:405
#17 0x000000000040ee5b in QpiddDaemon::child() ()
#18 0x00007fb41653be43 in qpid::broker::Daemon::fork (this=0x7fffff138020) at qpid/broker/Daemon.cpp:91
#19 0x000000000040ddfd in QpiddBroker::execute (this=<value optimized out>, options=<value optimized out>) at posix/QpiddBroker.cpp:179
#20 0x000000000040a1f2 in main (argc=4, argv=0x7fffff1385e8) at qpidd.cpp:80

Note:
I finished only test over RHEL6/x86_64 and still waiting for other platforms. I will post a comment with additional info from other platforms.

Comment 1 Tomas Rusnak 2011-06-09 09:55:15 UTC
I tested on KVM virtual guest with x86_64, RHEL6.1 (Santiago), 1 core, 512MB RAM. My other tests on real systems (RHEL5/6, both platforms) with same RHEL and CPU >= 2 were all negative. 
It looks to be harder to reproduce. In 100 restarts I can find 7 core dumps in virtual system at same code while qpidd is shutting down.

Comment 2 Tomas Rusnak 2011-06-20 12:14:58 UTC
I can't reproduce this after new installation of qpidd and about 10000 tries. If you see this again please reopen.