Bug 790390 - Qpid segfault while wallaby is connected
Summary: Qpid segfault while wallaby is connected
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-qmf
Version: Development
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: 3.0
: ---
Assignee: Ken Giusti
QA Contact: Ernie
URL:
Whiteboard:
: 845368 (view as bug list)
Depends On:
Blocks: 756446
TreeView+ depends on / blocked
 
Reported: 2012-02-14 12:17 UTC by ppecka
Modified: 2014-09-24 15:03 UTC (History)
5 users (show)

Fixed In Version: qpid-0.18
Doc Type: Bug Fix
Doc Text:
When the broker performed a shutdown, it cleaned up all resources it maintained for each connected client. If a client issued a management request to the broker during shutdown processing, a race condition existed which caused request processing to conflict with the shutdown process. This caused the internal resources of the broker to become corrupted, which could have resulted in a crash. Locking was added to prevent the request processing thread from accessing resources that were being deleted by the shutdown process. The internal resources remain consistent during the clean up process, and no corruption occurs.
Clone Of:
Environment:
Last Closed: 2014-09-24 15:03:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:1296 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging 3.0 Release 2014-09-24 19:00:06 UTC

Description ppecka 2012-02-14 12:17:05 UTC
Description of problem:
"Very frequent" restarting qpidd service daemon while wallaby daemon is running terminates
qpidd with segmentation fault.


#0  0x2e656863 in ?? ()
#1  0x0562a64a in
qpid::management::ManagementAgent::DeletedObject::DeletedObject
(this=0x852ea70, src=0xb5f57df8, v1=true, v2=true)
    at qpid/management/ManagementAgent.cpp:2822
#2  0x05635e59 in qpid::management::ManagementAgent::moveNewObjectsLH (
    this=0xb6e23008) at qpid/management/ManagementAgent.cpp:679
#3  0x056491f9 in qpid::management::ManagementAgent::~ManagementAgent (
    this=0xb6e23008, __in_chrg=<value optimized out>)
    at qpid/management/ManagementAgent.cpp:153
#4  0x05649863 in qpid::management::ManagementAgent::~ManagementAgent (
    this=0xb6e23008, __in_chrg=<value optimized out>)
    at qpid/management/ManagementAgent.cpp:162
#5  0x055338e2 in ~auto_ptr (this=0x84ed4f8, __in_chrg=<value optimized out>)
    at /usr/include/c++/4.4.6/backward/auto_ptr.h:168
#6  qpid::broker::Broker::~Broker (this=0x84ed4f8, 
    __in_chrg=<value optimized out>) at qpid/broker/Broker.cpp:426
#7  0x05533f53 in qpid::broker::Broker::~Broker (this=0x84ed4f8, 
    __in_chrg=<value optimized out>) at qpid/broker/Broker.cpp:426
#8  0x05522e16 in qpid::RefCounted::released (this=0x84ed510)
    at qpid/RefCounted.h:48
#9  0x08055ec4 in release (this=0xbfabe60c) at qpid/RefCounted.h:42
#10 intrusive_ptr_release<qpid::broker::Broker> (this=0xbfabe60c)
---Type <return> to continue, or q <return> to quit---
    at qpid/RefCounted.h:59
#11 ~intrusive_ptr (this=0xbfabe60c)
    at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:101
#12 QpiddDaemon::child (this=0xbfabe60c) at posix/QpiddBroker.cpp:144
#13 0x055513f0 in qpid::broker::Daemon::fork (this=0xbfabe60c)
    at qpid/broker/Daemon.cpp:91
#14 0x08054778 in QpiddBroker::execute (this=0xbfabe87f, options=0x84e8a20)
    at posix/QpiddBroker.cpp:182
#15 0x08050d79 in run_broker (argc=4, argv=0xbfabe964, hidden=false)
    at qpidd.cpp:83
#16 0x08054074 in main (argc=4, argv=0xbfabe964) at posix/QpiddBroker.cpp:202




Version-Release number of selected component (if applicable):
rpm -qa | grep -P '(wallaby|qpid|sesame|condor|qmf)' | sort -u
condor-7.6.5-0.12.el6.i686
condor-classads-7.6.5-0.12.el6.i686
condor-qmf-7.6.5-0.12.el6.i686
condor-wallaby-base-db-1.19-1.el6.noarch
condor-wallaby-client-4.1.2-1.el6.noarch
condor-wallaby-tools-4.1.2-1.el6.noarch
python-condorutils-1.5-4.el6.noarch
python-qpid-0.14-2.el6.noarch
python-qpid-qmf-0.14-3.el6.i686
python-wallaby-0.12.5-1.el6.noarch
python-wallabyclient-4.1.2-1.el6.noarch
qpid-cpp-client-0.14-6.el6.i686
qpid-cpp-debuginfo-0.14-6.el6.i686
qpid-cpp-server-0.14-6.el6.i686
qpid-qmf-0.14-3.el6.i686
qpid-tools-0.14-1.el6.noarch
ruby-qpid-qmf-0.14-3.el6.i686
ruby-wallaby-0.12.5-1.el6.noarch
sesame-1.0-2.el6.i686
wallaby-0.12.5-1.el6.noarch
wallaby-utils-0.12.5-1.el6.noarch

How reproducible:
100%

Steps to Reproduce:
0. enable coredump generation  
1. >/var/lib/qpidd/qpidd.log ; service qpidd restart; service wallaby
restart;sleep 3.5; echo -ne 'Y\ny\n*\n*\nlocalhost\nn\ny\n' |
condor_configure_pool -a -n `hostname` -f Master,ExecuteNode,NodeAccess;
service qpidd restart
2. service qpidd restat
3. service wallaby start
4. watch 'service qpidd restart'
5. ls -lSrh /var/lib/qpidd/.qpidd/ 

(or see steps taken here https://bugzilla.redhat.com/show_bug.cgi?id=756446 )

Actual results:
high ration of service qpidd restart, produsec coredumps

Expected results:
no cores


Additional info:

Comment 2 Justin Ross 2012-02-22 16:30:46 UTC
Ted, please do a brief assessment.

Comment 6 Ken Giusti 2012-10-18 16:24:55 UTC
*** Bug 845368 has been marked as a duplicate of this bug. ***

Comment 7 Justin Ross 2012-10-24 19:58:48 UTC
This is a failure on shutdown.  There is a fix, but it's too disruptive to include in 2.3 (it involves using a new pointer abstraction).  As a result, I'm bumping this to the next release.

Comment 8 Justin Ross 2013-02-22 18:41:34 UTC
It's possible this is resolved.  Ken, do the 2.3 qmf fixes have any effect on this?

Comment 9 Ken Giusti 2013-02-22 19:40:39 UTC
MMmmmmm... I can't say that it does: the locking pattern in the 0.18 code (which has the latest fix) is very different from the 0.14 stuff.  As far as I can tell, the 0.14 stuff does the locking correctly and didn't have the bug we fixed in 0.18.

So I'd err in favor of caution and say "no".   But the code has changed enough under 0.18 that trying to repro against the 0.18 release would be wise.

Comment 11 Ken Giusti 2013-04-15 14:46:29 UTC
Just a followup - 

I can repro this easily using the above packages on RHEL6.4 - same crash dump signature.

But, no luck with 0.18, or upstream 0.22 prerelease - same system.  Given the extent of the code changes to that path, it's likely the defect no longer exists in the later releases.

So I think it is no longer an issue - but I'd like QE to verify it no longer presents for 0.18 onward, just to be sure.

Comment 12 Ernie 2013-08-12 20:23:09 UTC
Verified rhel 6.4 i686
rpm -qa | grep -P '(qpid|condor|qmf|wallaby|sesame)' | sort -n
condor-7.8.8-0.4.1.el6.i686
condor-classads-7.8.8-0.4.1.el6.i686
condor-qmf-7.8.8-0.4.1.el6.i686
condor-wallaby-base-db-1.25-1.el6_3.noarch
condor-wallaby-client-5.0.5-2.el6.noarch
condor-wallaby-tools-5.0.5-2.el6.noarch
python-condorutils-1.5-6.el6.noarch
python-qpid-0.18-4.el6.noarch
python-qpid-qmf-0.18-15.el6.i686
python-wallaby-0.16.3-1.el6.noarch
python-wallabyclient-5.0.5-2.el6.noarch
qpid-cpp-client-0.18-14.el6.i686
qpid-cpp-client-devel-docs-0.22-9.el6.noarch
qpid-cpp-server-0.18-14.el6.i686
qpid-qmf-0.18-15.el6.i686
qpid-tools-0.18-8.el6.noarch
ruby-condor-wallaby-5.0.5-2.el6.noarch
ruby-qpid-qmf-0.18-15.el6.i686
ruby-wallaby-0.16.3-1.el6.noarch
sesame-1.0-8.el6.i686
wallaby-0.16.3-1.el6.noarch
wallaby-utils-0.16.3-1.el6.noarch

Verified rhel 6.4 x86_64
condor-7.8.8-0.4.1.el6.x86_64
condor-classads-7.8.8-0.4.1.el6.x86_64
condor-wallaby-base-db-1.25-1.el6_3.noarch
condor-wallaby-tools-5.0.5-2.el6.noarch
python-qpid-0.18-4.el6.noarch
python-qpid-qmf-0.18-15.el6.x86_64
qpid-cpp-client-0.18-14.el6.x86_64
qpid-cpp-client-devel-0.18-14.el6.x86_64
qpid-cpp-client-devel-docs-0.18-14.el6.noarch
qpid-cpp-client-rdma-0.18-14.el6.x86_64
qpid-cpp-client-ssl-0.18-14.el6.x86_64
qpid-cpp-debuginfo-0.14-22.el6_3.x86_64
qpid-cpp-server-0.18-14.el6.x86_64
qpid-cpp-server-cluster-0.18-14.el6.x86_64
qpid-cpp-server-devel-0.18-14.el6.x86_64
qpid-cpp-server-rdma-0.18-14.el6.x86_64
qpid-cpp-server-ssl-0.18-14.el6.x86_64
qpid-cpp-server-store-0.18-14.el6.x86_64
qpid-cpp-server-xml-0.18-14.el6.x86_64
qpid-java-client-0.18-7.el6.noarch
qpid-java-common-0.18-7.el6.noarch
qpid-java-example-0.18-7.el6.noarch
qpid-jca-0.18-8.el6.noarch
qpid-jca-xarecovery-0.18-8.el6.noarch
qpid-proton-c-0.4-2.2.el6.x86_64
qpid-proton-c-devel-0.4-2.2.el6.x86_64
qpid-qmf-0.18-15.el6.x86_64
qpid-qmf-debuginfo-0.14-14.el6_3.x86_64
qpid-qmf-devel-0.18-15.el6.x86_64
qpid-tests-0.18-2.el6.noarch
qpid-tools-0.18-8.el6.noarch
ruby-condor-wallaby-5.0.5-2.el6.noarch
ruby-qpid-qmf-0.18-15.el6.x86_64
ruby-wallaby-0.16.3-1.el6.noarch
sesame-1.0-8.el6.x86_64
wallaby-0.16.3-1.el6.noarch
wallaby-utils-0.16.3-1.el6.noarch


Reproduced rhel 6.4 i686
condor-7.6.5-0.22.el6.i686
condor-classads-7.6.5-0.22.el6.i686
condor-qmf-7.6.5-0.22.el6.i686
condor-wallaby-base-db-1.23-1.el6.noarch
condor-wallaby-client-4.1.3-1.el6.noarch
condor-wallaby-tools-4.1.3-1.el6.noarch
python-condorutils-1.5-4.el6.noarch
python-qpid-0.14-11.el6_3.noarch
python-qpid-qmf-0.14-14.el6_3.i686
python-wallaby-0.12.5-1.el6.noarch
python-wallabyclient-4.1.3-1.el6.noarch
qpid-cpp-client-0.14-22.el6_3.i686
qpid-cpp-server-0.14-22.el6_3.i686
qpid-qmf-0.14-14.el6_3.i686
ruby-qpid-qmf-0.14-14.el6_3.i686
ruby-wallaby-0.12.5-1.el6.noarch
sesame-1.0-6.el6.i686
wallaby-0.12.5-1.el6.noarch
wallaby-utils-0.12.5-1.el6.noarch

Verification/reproduction method:
- enable cores for daemons
- install condor, wallaby, sesame, qpid
- set qpidd.conf to auth=no
- service start qpidd; service start wallaby
- condor_configure_store -a -f Master,ExecuteNode,NodeAccess -n `hostname`
- condor_configure_pool -a -n `hostname` -f Master,ExecuteNode,NodeAccess
- service qpidd restart
- service wallaby restart
- watch 'service qpidd restart'
- look for cores after 10 minutes
Fail->cores
Success->no cores

Comment 15 errata-xmlrpc 2014-09-24 15:03:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1296.html


Note You need to log in before you can comment on or make changes to this bug.