Bug 695716

Summary: C++ broker crash in DTX handling under stress test.
Product: Red Hat Enterprise MRG Reporter: Ken Giusti <kgiusti>
Component: qpid-cppAssignee: Gordon Sim <gsim>
Status: CLOSED ERRATA QA Contact: Petr Matousek <pematous>
Severity: unspecified Docs Contact:
Priority: high    
Version: DevelopmentCC: gsim, iboverma, jneedle, pematous, tross
Target Milestone: 2.0   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-cpp-mrg-0.10-4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 703839 (view as bug list) Environment:
Last Closed: 2011-06-23 15:45:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
gdb dump of active threads at crash. none

Description Ken Giusti 2011-04-12 14:02:25 UTC
Created attachment 491491 [details]
gdb dump of active threads at crash.

Description of problem:

While running the stress test in order to reproduce 

https://bugzilla.redhat.com/show_bug.cgi?id=695263

a crash occurred that appears to be unrelated to BZ695263

Version-Release number of selected component (if applicable):

Qpid trunk

How reproducible:
Very hard - test was running for over 2 hours.

Steps to Reproduce:
1. See BZ695263

  
Actual results:


Expected results:


Additional info:

Running on mrg10, svn release info:

URL: https://svn.apache.org/repos/asf/qpid/trunk/qpid
Repository Root: https://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 1091125


[root@mrg10 ~]# uname -a
Linux mrg10.lab.bos.redhat.com 2.6.18-238.el5 #1 SMP Sun Dec 19 14:22:44 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@mrg10 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 5.6 (Tikanga)

Comment 1 Gordon Sim 2011-04-12 15:07:42 UTC
See https://issues.apache.org/jira/browse/QPID-3201, now fixed upstream as http://svn.apache.org/viewvc?rev=1091443&view=rev

This bug only affects cases that use dtx on durable queues and durable messages where there is no store loaded. It is not a regression. That said I think it is also low risk if we want to pull it in for the 2.0 MRG release.

Comment 2 Gordon Sim 2011-04-12 15:09:49 UTC
To test, running:

 while ./src/tests/qpid-txtest --queues 5 --dtx yes --messages-per-tx 20 --tx-count 10 --total-messages 5000; do true; done

against a broker with no store loaded crashed fairly easily for me. This is essentially what the stress test is doing (along with many other things). Increasing the concurrency might make it even faster to reproduce.

Comment 5 Petr Matousek 2011-05-10 12:45:18 UTC
Another broker crash occurred while testing this bug on RHEL6. This BZ cannot be verified until the BZ703466 is solved. Dependency created. Please see bug 703466.

Comment 6 Petr Matousek 2011-05-11 12:15:04 UTC
This issue has been fixed in qpid-cpp-mrg-0.10-4 for RHEL5, but not yet
available in any RHEL6 package. 

The bug was cloned for RHEL6: please see bug 703839

Verified on RHEL5.6 architectures: i386, x86_64

Tested on mrg4.lab.bos.redhat.com and mrg5.lab.bos.redhat.com according to comment 2.
Test loops: 180000
Test duration: over 50 hours.
RHEL5 i386 test performed on VM.

packages installed:
python-qpid-0.10-1.el5
python-qpid-qmf-0.10-6.el5
qpid-cpp-client-0.10-4.el5
qpid-cpp-client-rdma-0.10-4.el5
qpid-cpp-client-ssl-0.10-4.el5
qpid-cpp-mrg-debuginfo-0.10-4.el5
qpid-cpp-server-0.10-4.el5
qpid-cpp-server-cluster-0.10-4.el5
qpid-cpp-server-rdma-0.10-4.el5
qpid-cpp-server-ssl-0.10-4.el5
qpid-cpp-server-store-0.10-4.el5
qpid-cpp-server-xml-0.10-4.el5
qpid-java-client-0.10-4.el5
qpid-java-common-0.10-4.el5
qpid-java-example-0.10-4.el5
qpid-java-jca-0.10-4.el5
qpid-qmf-0.10-6.el5
qpid-qmf-debuginfo-0.10-6.el5
qpid-tests-0.10-1.el5
qpid-tools-0.10-4.el5
rh-qpid-cpp-tests-0.10-4.el5

-> VERIFIED

Comment 7 errata-xmlrpc 2011-06-23 15:45:23 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0890.html