Bug 748455

Summary: Client crashes periodically while receiving messages
Product: Red Hat Enterprise MRG Reporter: Jason Dillaman <jdillama>
Component: qpid-cppAssignee: Gordon Sim <gsim>
Status: CLOSED CURRENTRELEASE QA Contact: Leonid Zhaldybin <lzhaldyb>
Severity: high Docs Contact:
Priority: high    
Version: 1.3CC: esammons, gsim, iboverma, jross, lzhaldyb
Target Milestone: 2.1.2   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-cpp-0.14 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-11-14 20:11:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 698367, 803771    
Attachments:
Description Flags
Backtrace in crash thread none

Description Jason Dillaman 2011-10-24 14:04:38 UTC
Description of problem:
I have encountered several crashes within AcceptTracker::delivered() while adding a a message to the aggregateState unaccepted message set.  It appears that IncomingMessages::retrieve() can call AcceptTracker::delivered() while not holding a lock.

Version-Release number of selected component (if applicable):
qpid-cpp-client-0.7.946106-32_ptc_hotfix_8.el5

How reproducible:
Rare

Steps to Reproduce:
1. Retrieve messages in thread 1
2. Accept previously retrieved messages in thread 2
  
Actual results:
Client application crashes when AcceptTracker's aggregateState is corrupted.

Expected results:
Client application does not crash.

Additional info:

Comment 1 Jason Dillaman 2011-10-24 14:06:57 UTC
Created attachment 529877 [details]
Backtrace in crash thread

Comment 2 Gordon Sim 2011-10-31 10:04:13 UTC
I was unable to reproduce a crash despite trying a fair bit. However I have fixed the locking to prevent concurrent modification of the accept tracker structures which are indeed the likely cause of this. 

Fixed upstream by http://svn.apache.org/viewvc?view=rev&rev=1195385

Comment 6 Leonid Zhaldybin 2012-05-31 13:35:03 UTC
I tested this on RHEL5.8 and RHEL6.2 (i386 and x86_64). The reproducer was running for a few days without a single crash on these platforms.

Packages used for testing:
RHEL6.2
qpid-cpp-client-0.14-14.el6_2
qpid-cpp-client-devel-0.14-14.el6_2
qpid-cpp-client-devel-docs-0.14-14.el6_2
qpid-cpp-server-0.14-14.el6_2
qpid-cpp-server-cluster-0.14-14.el6_2
qpid-cpp-server-devel-0.14-14.el6_2
qpid-cpp-server-store-0.14-14.el6_2
qpid-cpp-server-xml-0.14-14.el6_2
qpid-java-client-0.14-3.el6
qpid-java-common-0.14-3.el6
qpid-java-example-0.14-3.el6
qpid-qmf-0.14-7.el6_2
qpid-tools-0.14-2.el6_2
RHEL5.8
qpid-cpp-client-0.14-14.el5
qpid-cpp-client-devel-0.14-14.el5
qpid-cpp-client-devel-docs-0.14-14.el5
qpid-cpp-client-ssl-0.14-14.el5
qpid-cpp-server-0.14-14.el5
qpid-cpp-server-cluster-0.14-14.el5
qpid-cpp-server-devel-0.14-14.el5
qpid-cpp-server-ssl-0.14-14.el5
qpid-cpp-server-store-0.14-14.el5
qpid-cpp-server-xml-0.14-14.el5
qpid-dotnet-0.10-2.el5
qpid-java-client-0.14-3.el5
qpid-java-common-0.14-3.el5
qpid-java-example-0.14-3.el5
qpid-qmf-0.14-9.el5
qpid-tools-0.14-2.el5

-> VERIFIED