| Summary: | clustered qpidd aborts in qpid::broker::ConnectionHandler::handle() -> operator () -> __cxa_pure_virtual () -> terminate () ... | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Frantisek Reznicek <freznice> | ||||||||
| Component: | qpid-cpp | Assignee: | Ken Giusti <kgiusti> | ||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Petr Matousek <pematous> | ||||||||
| Severity: | urgent | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | Development | CC: | esammons, iboverma, jross, pematous | ||||||||
| Target Milestone: | 2.1.2 | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | qpid-cpp-mrg-0.14-14 | Doc Type: | Bug Fix | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2012-12-07 17:43:22 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Bug Depends On: | 804001 | ||||||||||
| Bug Blocks: | 806279 | ||||||||||
| Attachments: |
|
||||||||||
Created attachment 567681 [details] The full backtrace and unit test code Attachment consist of: - unit test code (test_queue_nonexclusive_timed_autodelete_w_msgs_on_sclose.py) - full backtrace of two detected aborts (bz800028.txt) Created attachment 568653 [details]
clustered brokers logs
Reproduced on RHEL5.8 x86_64 on the latest 0.14-10 packages, attaching broker logs.
broker logs indicates that both the clustered brokers are aborted with following exception at the same time:
2012-03-08 09:31:21 error Unexpected exception: Task already exists with name DelayedAutoDeletion (qpid/cluster/ClusterTimer.cpp:71)
# stat -c %x%y%z core.3671 core.3672
2012-03-08 09:31:21.000000000 -05002012-03-08 09:31:21.000000000 -05002012-03-08 09:31:21.000000000 -0500
2012-03-08 09:31:21.000000000 -05002012-03-08 09:31:21.000000000 -05002012-03-08 09:31:21.000000000 -0500
Root cause for this crash appears to be in ClusterTimer::add(), which throws an exception should two timers be created with the same name. Upstream JIRA created, with a workaround patch that "fixes" the auto delete queue crash: https://issues.apache.org/jira/browse/QPID-3896 Created attachment 570656 [details]
clustered brokers logs
retested on RHEL6 i686/x86_64, broker crash is fixed, but running the test causes all cluster nodes except one (that one remain standing) to exit the cluster with the following cause:
critical Error delivering frames: Cluster timer wakeup non-existent task DelayedAutoDeletion:642922c3-6cb2-4c6b-b332-d2929ab95c34 (qpid/cluster/ClusterTimer.cpp:112)
Waiting for qpid-qmf rebuild for RHEL5 retest.
The above mentioned issue is probably the root cause of this new crash, moving back to assigned. -> ASSIGNED Updated the upstream with a new patch that should actually work in a cluster. See https://issues.apache.org/jira/secure/attachment/12518923/qpid-3896.patch Retested on RHEL6 i686 on qpid-0.14-12 with the patch from Comment 10 applied, the issue seems to be fixed. Test from Comment 1 is passing with clustered brokers, no broker failure detected. Waiting for packages for overall retest. Fix posted upstream trunk: http://svn.apache.org/viewvc?view=rev&rev=1303035 This issue has been fixed for rhel5, verified on rhel5.8 i386 & x86_64 - qpid-cpp-mrg-0.14-14.el5. Waiting for rhel6 packages for retest. Issue fixed, tested on rhel5.8 / 6.2 i/x on packages: qpid-cpp-mrg-0.14-14.el5 qpid-cpp-0.14-14.el6_2 -> VERIFIED |
Description of problem: Custom unit test custom_py_units.queue.LLAPITests. \ test_queue_nonexclusive_timed_autodelete_w_msgs_on_sclose triggers clustered broker abort[s] in qpid::broker::ConnectionHandler::handle() -> operator () -> __cxa_pure_virtual () -> terminate () ... Thread 1 (Thread 0x43b28940 (LWP 16777)): #0 0x000000312b230265 in raise () from /lib64/libc.so.6 #1 0x000000312b231d10 in abort () from /lib64/libc.so.6 #2 0x000000312fabed14 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6 #3 0x000000312fabce16 in ?? () from /usr/lib64/libstdc++.so.6 #4 0x000000312fabce43 in std::terminate() () from /usr/lib64/libstdc++.so.6 #5 0x000000312fabd34f in __cxa_pure_virtual () from /usr/lib64/libstdc++.so.6 #6 0x000000312e35cee1 in operator() (this=0x6bfb1e0, frame=...) at qpid/framing/Handler.h:42 #7 qpid::broker::ConnectionHandler::handle (this=0x6bfb1e0, frame=...) at qpid/broker/ConnectionHandler.cpp:87 #8 0x000000312e3520a8 in qpid::broker::Connection::received (this=0x6bfb000, frame=...) at qpid/broker/Connection.cpp:159 #9 0x00002b0543e56941 in qpid::cluster::Connection::deliveredFrame ( this=0x6bf7850, f=...) at qpid/cluster/Connection.cpp:248 #10 0x00002b0543e24ad9 in qpid::cluster::Cluster::processFrame ( this=0x2aed1b0, e=..., l=<value optimized out>) at qpid/cluster/Cluster.cpp:548 ... Unfortunately there is no log file available atm from above aborts. I believe broker ends with "pure virtual method called" indicated as: #4 0x000000312fabce43 in std::terminate() () from /usr/lib64/libstdc++.so.6 #5 0x000000312fabd34f in __cxa_pure_virtual () This abort is fatal condition and should be fixed. See unit test code attached below. Version-Release number of selected component (if applicable): python-qpid-0.14-4.el5 python-qpid-qmf-0.14-3.el5 qpid-cpp-client-0.14-7.el5 qpid-cpp-client-devel-0.14-7.el5 qpid-cpp-client-devel-docs-0.14-7.el5 qpid-cpp-client-rdma-0.14-7.el5 qpid-cpp-client-ssl-0.14-7.el5 qpid-cpp-mrg-debuginfo-0.14-7.el5 qpid-cpp-server-0.14-7.el5 qpid-cpp-server-cluster-0.14-7.el5 qpid-cpp-server-devel-0.14-7.el5 qpid-cpp-server-rdma-0.14-7.el5 qpid-cpp-server-ssl-0.14-7.el5 qpid-cpp-server-store-0.14-7.el5 qpid-cpp-server-xml-0.14-7.el5 qpid-java-client-0.14-3.el5 qpid-java-common-0.14-3.el5 qpid-java-example-0.14-3.el5 qpid-qmf-0.14-3.el5 qpid-qmf-debuginfo-0.14-3.el5 qpid-qmf-devel-0.14-3.el5 qpid-tests-0.14-1.el5 qpid-tools-0.14-1.el5 rh-qpid-cpp-tests-0.14-7.el5 ruby-qpid-qmf-0.14-3.el5 How reproducible: 80% Steps to Reproduce: 1. set-up clustered broker 2. run test_queue_nonexclusive_timed_autodelete_w_msgs_on_sclose unit Actual results: Clustered broker[s] abort. Expected results: Clustered broker[s] should survive the test. Additional info: Aborts were detected on both RHEL5.7 and 5.8