Bug 800028
Summary: | clustered qpidd aborts in qpid::broker::ConnectionHandler::handle() -> operator () -> __cxa_pure_virtual () -> terminate () ... | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Frantisek Reznicek <freznice> | ||||||||
Component: | qpid-cpp | Assignee: | Ken Giusti <kgiusti> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Petr Matousek <pematous> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | Development | CC: | esammons, iboverma, jross, pematous | ||||||||
Target Milestone: | 2.1.2 | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | qpid-cpp-mrg-0.14-14 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2012-12-07 17:43:22 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 804001 | ||||||||||
Bug Blocks: | 806279 | ||||||||||
Attachments: |
|
Description
Frantisek Reznicek
2012-03-05 16:07:38 UTC
Created attachment 567681 [details] The full backtrace and unit test code Attachment consist of: - unit test code (test_queue_nonexclusive_timed_autodelete_w_msgs_on_sclose.py) - full backtrace of two detected aborts (bz800028.txt) Created attachment 568653 [details]
clustered brokers logs
Reproduced on RHEL5.8 x86_64 on the latest 0.14-10 packages, attaching broker logs.
broker logs indicates that both the clustered brokers are aborted with following exception at the same time:
2012-03-08 09:31:21 error Unexpected exception: Task already exists with name DelayedAutoDeletion (qpid/cluster/ClusterTimer.cpp:71)
# stat -c %x%y%z core.3671 core.3672
2012-03-08 09:31:21.000000000 -05002012-03-08 09:31:21.000000000 -05002012-03-08 09:31:21.000000000 -0500
2012-03-08 09:31:21.000000000 -05002012-03-08 09:31:21.000000000 -05002012-03-08 09:31:21.000000000 -0500
Root cause for this crash appears to be in ClusterTimer::add(), which throws an exception should two timers be created with the same name. Upstream JIRA created, with a workaround patch that "fixes" the auto delete queue crash: https://issues.apache.org/jira/browse/QPID-3896 Created attachment 570656 [details]
clustered brokers logs
retested on RHEL6 i686/x86_64, broker crash is fixed, but running the test causes all cluster nodes except one (that one remain standing) to exit the cluster with the following cause:
critical Error delivering frames: Cluster timer wakeup non-existent task DelayedAutoDeletion:642922c3-6cb2-4c6b-b332-d2929ab95c34 (qpid/cluster/ClusterTimer.cpp:112)
Waiting for qpid-qmf rebuild for RHEL5 retest.
The above mentioned issue is probably the root cause of this new crash, moving back to assigned. -> ASSIGNED Updated the upstream with a new patch that should actually work in a cluster. See https://issues.apache.org/jira/secure/attachment/12518923/qpid-3896.patch Retested on RHEL6 i686 on qpid-0.14-12 with the patch from Comment 10 applied, the issue seems to be fixed. Test from Comment 1 is passing with clustered brokers, no broker failure detected. Waiting for packages for overall retest. Fix posted upstream trunk: http://svn.apache.org/viewvc?view=rev&rev=1303035 This issue has been fixed for rhel5, verified on rhel5.8 i386 & x86_64 - qpid-cpp-mrg-0.14-14.el5. Waiting for rhel6 packages for retest. Issue fixed, tested on rhel5.8 / 6.2 i/x on packages: qpid-cpp-mrg-0.14-14.el5 qpid-cpp-0.14-14.el6_2 -> VERIFIED |