Bug 800028 - clustered qpidd aborts in qpid::broker::ConnectionHandler::handle() -> operator () -> __cxa_pure_virtual () -> terminate () ...
clustered qpidd aborts in qpid::broker::ConnectionHandler::handle() -> operat...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
Development
Unspecified Unspecified
high Severity urgent
: 2.1.2
: ---
Assigned To: Ken Giusti
Petr Matousek
:
Depends On: 804001
Blocks: 806279
  Show dependency treegraph
 
Reported: 2012-03-05 11:07 EST by Frantisek Reznicek
Modified: 2015-11-15 20:14 EST (History)
4 users (show)

See Also:
Fixed In Version: qpid-cpp-mrg-0.14-14
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-12-07 12:43:22 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
The full backtrace and unit test code (10.63 KB, application/x-tbz)
2012-03-05 11:13 EST, Frantisek Reznicek
no flags Details
clustered brokers logs (42.82 KB, application/x-gnome-theme-package)
2012-03-08 09:47 EST, Petr Matousek
no flags Details
clustered brokers logs (23.56 KB, application/x-gnome-theme-package)
2012-03-16 13:04 EDT, Petr Matousek
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Apache JIRA QPID-3896 None None None Never

  None (edit)
Description Frantisek Reznicek 2012-03-05 11:07:38 EST
Description of problem:

Custom unit test custom_py_units.queue.LLAPITests. \
                 test_queue_nonexclusive_timed_autodelete_w_msgs_on_sclose
triggers clustered broker abort[s] in qpid::broker::ConnectionHandler::handle() -> operator () -> __cxa_pure_virtual () -> terminate () ...

  Thread 1 (Thread 0x43b28940 (LWP 16777)):
  #0  0x000000312b230265 in raise () from /lib64/libc.so.6
  #1  0x000000312b231d10 in abort () from /lib64/libc.so.6
  #2  0x000000312fabed14 in __gnu_cxx::__verbose_terminate_handler() ()
     from /usr/lib64/libstdc++.so.6
  #3  0x000000312fabce16 in ?? () from /usr/lib64/libstdc++.so.6
  #4  0x000000312fabce43 in std::terminate() () from /usr/lib64/libstdc++.so.6
  #5  0x000000312fabd34f in __cxa_pure_virtual () from /usr/lib64/libstdc++.so.6
  #6  0x000000312e35cee1 in operator() (this=0x6bfb1e0, frame=...)
      at qpid/framing/Handler.h:42
  #7  qpid::broker::ConnectionHandler::handle (this=0x6bfb1e0, frame=...)
      at qpid/broker/ConnectionHandler.cpp:87
  #8  0x000000312e3520a8 in qpid::broker::Connection::received (this=0x6bfb000, 
      frame=...) at qpid/broker/Connection.cpp:159
  #9  0x00002b0543e56941 in qpid::cluster::Connection::deliveredFrame (
      this=0x6bf7850, f=...) at qpid/cluster/Connection.cpp:248
  #10 0x00002b0543e24ad9 in qpid::cluster::Cluster::processFrame (
      this=0x2aed1b0, e=..., l=<value optimized out>)
      at qpid/cluster/Cluster.cpp:548
  ...

Unfortunately there is no log file available atm from above aborts.

I believe broker ends with "pure virtual method called" indicated as:
  #4  0x000000312fabce43 in std::terminate() () from /usr/lib64/libstdc++.so.6
  #5  0x000000312fabd34f in __cxa_pure_virtual () 

This abort is fatal condition and should be fixed.
See unit test code attached below.

Version-Release number of selected component (if applicable):
  python-qpid-0.14-4.el5
  python-qpid-qmf-0.14-3.el5
  qpid-cpp-client-0.14-7.el5
  qpid-cpp-client-devel-0.14-7.el5
  qpid-cpp-client-devel-docs-0.14-7.el5
  qpid-cpp-client-rdma-0.14-7.el5
  qpid-cpp-client-ssl-0.14-7.el5
  qpid-cpp-mrg-debuginfo-0.14-7.el5
  qpid-cpp-server-0.14-7.el5
  qpid-cpp-server-cluster-0.14-7.el5
  qpid-cpp-server-devel-0.14-7.el5
  qpid-cpp-server-rdma-0.14-7.el5
  qpid-cpp-server-ssl-0.14-7.el5
  qpid-cpp-server-store-0.14-7.el5
  qpid-cpp-server-xml-0.14-7.el5
  qpid-java-client-0.14-3.el5
  qpid-java-common-0.14-3.el5
  qpid-java-example-0.14-3.el5
  qpid-qmf-0.14-3.el5
  qpid-qmf-debuginfo-0.14-3.el5
  qpid-qmf-devel-0.14-3.el5
  qpid-tests-0.14-1.el5
  qpid-tools-0.14-1.el5
  rh-qpid-cpp-tests-0.14-7.el5
  ruby-qpid-qmf-0.14-3.el5


How reproducible:
80%

Steps to Reproduce:
1. set-up clustered broker
2. run test_queue_nonexclusive_timed_autodelete_w_msgs_on_sclose unit
  

Actual results:
  Clustered broker[s] abort.

Expected results:
  Clustered broker[s] should survive the test.


Additional info:

  Aborts were detected on both RHEL5.7 and 5.8
Comment 1 Frantisek Reznicek 2012-03-05 11:13:49 EST
Created attachment 567681 [details]
The full backtrace and unit test code

Attachment consist of:
- unit test code
  (test_queue_nonexclusive_timed_autodelete_w_msgs_on_sclose.py)
- full backtrace of two detected aborts (bz800028.txt)
Comment 5 Petr Matousek 2012-03-08 09:47:22 EST
Created attachment 568653 [details]
clustered brokers logs

Reproduced on RHEL5.8 x86_64 on the latest 0.14-10 packages, attaching broker logs.

broker logs indicates that both the clustered brokers are aborted with following exception at the same time:
2012-03-08 09:31:21 error Unexpected exception: Task already exists with name DelayedAutoDeletion (qpid/cluster/ClusterTimer.cpp:71)

# stat -c %x%y%z core.3671 core.3672
2012-03-08 09:31:21.000000000 -05002012-03-08 09:31:21.000000000 -05002012-03-08 09:31:21.000000000 -0500
2012-03-08 09:31:21.000000000 -05002012-03-08 09:31:21.000000000 -05002012-03-08 09:31:21.000000000 -0500
Comment 6 Ken Giusti 2012-03-13 11:38:45 EDT
Root cause for this crash appears to be in ClusterTimer::add(), which throws an exception should two timers be created with the same name.

Upstream JIRA created, with a workaround patch that "fixes" the auto delete queue crash:

https://issues.apache.org/jira/browse/QPID-3896
Comment 8 Petr Matousek 2012-03-16 13:04:37 EDT
Created attachment 570656 [details]
clustered brokers logs

retested on RHEL6 i686/x86_64, broker crash is fixed, but running the test causes all cluster nodes except one (that one remain standing) to exit the cluster with the following cause:

critical Error delivering frames: Cluster timer wakeup non-existent task DelayedAutoDeletion:642922c3-6cb2-4c6b-b332-d2929ab95c34 (qpid/cluster/ClusterTimer.cpp:112)
  
Waiting for qpid-qmf rebuild for RHEL5 retest.
Comment 9 Petr Matousek 2012-03-16 13:29:12 EDT
The above mentioned issue is probably the root cause of this new crash, moving back to assigned.

-> ASSIGNED
Comment 10 Ken Giusti 2012-03-19 15:06:08 EDT
Updated the upstream with a new patch that should actually work in a cluster.

See https://issues.apache.org/jira/secure/attachment/12518923/qpid-3896.patch
Comment 11 Petr Matousek 2012-03-20 12:47:33 EDT
Retested on RHEL6 i686 on qpid-0.14-12 with the patch from Comment 10 applied, the issue seems to be  fixed. Test from Comment 1 is passing with clustered brokers, no broker failure detected.

Waiting for packages for overall retest.
Comment 12 Ken Giusti 2012-03-20 13:48:12 EDT
Fix posted upstream trunk:

http://svn.apache.org/viewvc?view=rev&rev=1303035
Comment 14 Petr Matousek 2012-03-26 05:16:04 EDT
This issue has been fixed for rhel5, verified on rhel5.8 i386 & x86_64 - qpid-cpp-mrg-0.14-14.el5.

Waiting for rhel6 packages for retest.
Comment 16 Petr Matousek 2012-04-03 05:00:21 EDT
Issue fixed, tested on rhel5.8 / 6.2 i/x on packages:
qpid-cpp-mrg-0.14-14.el5
qpid-cpp-0.14-14.el6_2

-> VERIFIED

Note You need to log in before you can comment on or make changes to this bug.