Bug 795352

Summary: qpid high level API client (qpid-send / qpid-receive) sometimes terminates / aborts during reconnection / failover with 'pure virtual method called' from qpid::sys::DispatchHandle::processEvent()
Product: Red Hat Enterprise MRG Reporter: Frantisek Reznicek <freznice>
Component: qpid-cppAssignee: Andrew Stitcher <astitcher>
Status: CLOSED DUPLICATE QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: high    
Version: DevelopmentCC: esammons, jross
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-29 15:19:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Frantisek Reznicek 2012-02-20 10:19:20 UTC
Description of problem:

qpid high level API client (modified qpid-send, see it attached) very rarely terminates during reconnection / failover when multiple sender clients are reconnecting / failing over at the same moment.

There was detected single case when one of the qpid-send clients exited during reconnection following way:

ERROR:10.34.37.124:'~/mrg_cluster_tests/qpid-send -b "10.34.37.124:5672" --failover-updates --content-string "msg_%06d" --content-inde
  x-offset 0 --connection-options="{ reconnect: True,  reconnect_limit: 10, reconnect_timeout: 10, username: guest, password:guest }" --address "te
  st_cluster_failover_NtoN_rs_mq_0_0; {create: sender, delete:receiver, node:{ durable: False}}"  --durable no -m 58609', result:?, dur.:1.90
  [11:07:25] DEBUG:stdout:
  
  [11:07:25] DEBUG:stderr:
    2012-02-18 05:07:16 warning Connection [10.34.37.124:36179-10.34.37.124:5672] closed
    2012-02-18 05:07:16 warning Connection [10.34.37.124:36183-10.34.37.124:5672] closed
    pure virtual method called
    terminate called without an active exception


The 'pure virtual method called' highlights that client has tried to call  virtual function/method from a constructor or destructor which is fatal condition.

I believe that probability of above described case is high only during concurrent reconnect of many clients.

No interesting entries in broker logs.


Version-Release number of selected component (if applicable):
  cman-2.0.115-96.el5
  cman-devel-2.0.115-96.el5
  cman-devel-2.0.115-96.el5
  openais-0.80.6-36.el5
  openais-debuginfo-0.80.6-36.el5
  openais-devel-0.80.6-36.el5
  python-qpid-0.14-1.el5
  python-qpid-qmf-0.14-2.el5
  python-saslwrapper-0.10-4.el5
  qpid-cpp-client-0.14-5.el5
  qpid-cpp-client-devel-0.14-5.el5
  qpid-cpp-client-devel-docs-0.14-5.el5
  qpid-cpp-client-rdma-0.14-5.el5
  qpid-cpp-client-ssl-0.14-5.el5
  qpid-cpp-mrg-debuginfo-0.14-5.el5
  qpid-cpp-server-0.14-5.el5
  qpid-cpp-server-cluster-0.14-5.el5
  qpid-cpp-server-devel-0.14-5.el5
  qpid-cpp-server-rdma-0.14-5.el5
  qpid-cpp-server-ssl-0.14-5.el5
  qpid-cpp-server-store-0.14-5.el5
  qpid-cpp-server-xml-0.14-5.el5
  qpid-java-client-0.14-1.el5
  qpid-java-common-0.14-1.el5
  qpid-java-example-0.14-1.el5
  qpid-qmf-0.14-2.el5
  qpid-qmf-debuginfo-0.14-2.el5
  qpid-qmf-devel-0.14-2.el5
  qpid-tests-0.14-1.el5
  qpid-tools-0.14-1.el5
  rgmanager-2.0.52-28.el5
  rh-qpid-cpp-tests-0.14-5.el5
  ruby-qpid-qmf-0.14-2.el5
  ruby-saslwrapper-0.10-4.el5
  saslwrapper-0.10-4.el5
  saslwrapper-debuginfo-0.10-4.el5
  saslwrapper-devel-0.10-4.el5
  sesame-1.0-2.el5
  sesame-debuginfo-1.0-2.el5


How reproducible:
< 1 %

Steps to Reproduce:
1. following cluster_test ctests.py run on 4 node RHEL 5.7 x86_64 cluster
  
2. Error found in test_cluster_failover_NtoN_rs

  
Actual results:
Sending client stopped reconnect due to an unknow error

Expected results:
Sending client should always reconnect.

Additional info:

Test log:

  [11:07:25] DEBUG:test_cluster_failover_NtoN_rs(): all (4) sender[s] finished (0.0 sec[s])
  [11:07:25] ERROR:10.34.37.124:'~/mrg_cluster_tests/qpid-send -b "10.34.37.124:5672" --failover-updates --content-string "msg_%06d" --content-inde
  x-offset 0 --connection-options="{ reconnect: True,  reconnect_limit: 10, reconnect_timeout: 10, username: guest, password:guest }" --address "te
  st_cluster_failover_NtoN_rs_mq_0_0; {create: sender, delete:receiver, node:{ durable: False}}"  --durable no -m 58609', result:-1, dur.:1.90
  [11:07:25] DEBUG:stdout:
  
  [11:07:25] DEBUG:stderr:
    2012-02-18 05:07:16 warning Connection [10.34.37.124:36179-10.34.37.124:5672] closed
    2012-02-18 05:07:16 warning Connection [10.34.37.124:36183-10.34.37.124:5672] closed
    pure virtual method called
    terminate called without an active exception
  [11:07:25] WARNING:10.34.37.124:'~/mrg_cluster_tests/qpid-send -b "10.34.37.124:5672" --failover-updates --content-string "msg_%06d" --content-in
  dex-offset 0 --connection-options="{ reconnect: True,  reconnect_limit: 10, reconnect_timeout: 10, username: guest, password:guest }" --address "
  test_cluster_failover_NtoN_rs_mq_0_1; {create: sender, delete:receiver, node:{ durable: False}}"  --durable no -m 58609', result:0, dur.:4.81
  [11:07:25] DEBUG:stderr:
    2012-02-18 05:07:16 warning Connection [10.34.37.124:36181-10.34.37.124:5672] closed
    2012-02-18 05:07:16 warning Connection [10.34.37.124:36186-10.34.37.124:5672] closed
    2012-02-18 05:07:16 warning Connect failed: Connection refused
    2012-02-18 05:07:16 warning Connection  closed
    2012-02-18 05:07:16 warning Connect failed: Connection refused
    2012-02-18 05:07:16 warning Connection  closed
  [11:07:25] WARNING:10.34.37.124:'~/mrg_cluster_tests/qpid-send -b "10.34.37.124:5672" --failover-updates --content-string "msg_%06d" --content-in
  dex-offset 0 --connection-options="{ reconnect: True,  reconnect_limit: 10, reconnect_timeout: 10, username: guest, password:guest }" --address "
  test_cluster_failover_NtoN_rs_mq_0_2; {create: sender, delete:receiver, node:{ durable: False}}"  --durable no -m 58609', result:0, dur.:5.71
  [11:07:25] DEBUG:stderr:
    2012-02-18 05:07:16 warning Connection [10.34.37.124:36182-10.34.37.124:5672] closed
    2012-02-18 05:07:16 warning Connection [10.34.37.124:36184-10.34.37.124:5672] closed
    2012-02-18 05:07:16 warning Connect failed: Connection refused
    2012-02-18 05:07:16 warning Connection  closed
    2012-02-18 05:07:16 warning Connect failed: Connection refused
    2012-02-18 05:07:16 warning Connection  closed
  [11:07:25] WARNING:10.34.37.124:'~/mrg_cluster_tests/qpid-send -b "10.34.37.124:5672" --failover-updates --content-string "msg_%06d" --content-in
  dex-offset 0 --connection-options="{ reconnect: True,  reconnect_limit: 10, reconnect_timeout: 10, username: guest, password:guest }" --address "
  test_cluster_failover_NtoN_rs_mq_0_3; {create: sender, delete:receiver, node:{ durable: False}}"  --durable no -m 58609', result:0, dur.:4.70
  [11:07:25] DEBUG:stderr:
    2012-02-18 05:07:16 warning Connection [10.34.37.124:36180-10.34.37.124:5672] closed
    2012-02-18 05:07:16 warning Connection [10.34.37.124:36185-10.34.37.124:5672] closed
    2012-02-18 05:07:16 warning Connect failed: Connection refused
    2012-02-18 05:07:16 warning Connection  closed
    2012-02-18 05:07:16 warning Connect failed: Connection refused
    2012-02-18 05:07:16 warning Connection  closed
  [11:07:25] DEBUG:QMF obj. query corresponding to queues test_cluster_failover_NtoN_rs_mq_0_%d started
  [11:10:46] DEBUG:test_cluster_failover_NtoN_rs(): waiting for queue QMF data timeouted (to:200)
  [11:10:46] DEBUG:QMF object corresponding to queue test_cluster_failover_NtoN_rs_mq_0_0: org.apache.qpid.broker:queue[0-32-1-0-org.apache.qpid.br
  oker:queue:test_cluster_failover_NtoN_rs_mq_0_0] org.apache.qpid.broker:queue:test_cluster_failover_NtoN_rs_mq_0_0
  [11:10:46] ERROR:localhost:'Queue test_cluster_failover_NtoN_rs_mq_0_0 has msg-depth 23338 expected >= 58609', result:False, dur.:-1.00
  [11:10:47] DEBUG:QMF object corresponding to queue test_cluster_failover_NtoN_rs_mq_0_1: org.apache.qpid.broker:queue[0-32-1-0-org.apache.qpid.br
  oker:queue:test_cluster_failover_NtoN_rs_mq_0_1] org.apache.qpid.broker:queue:test_cluster_failover_NtoN_rs_mq_0_1
  [11:10:47] INFO:localhost:'Queue test_cluster_failover_NtoN_rs_mq_0_1 has msg-depth 58609 expected >= 58609', result:True, dur.:-1.00
  [11:10:48] DEBUG:QMF object corresponding to queue test_cluster_failover_NtoN_rs_mq_0_2: org.apache.qpid.broker:queue[0-32-1-0-org.apache.qpid.br
  oker:queue:test_cluster_failover_NtoN_rs_mq_0_2] org.apache.qpid.broker:queue:test_cluster_failover_NtoN_rs_mq_0_2
  [11:10:48] INFO:localhost:'Queue test_cluster_failover_NtoN_rs_mq_0_2 has msg-depth 58783 expected >= 58609', result:True, dur.:-1.00
  [11:10:49] DEBUG:QMF object corresponding to queue test_cluster_failover_NtoN_rs_mq_0_3: org.apache.qpid.broker:queue[0-32-1-0-org.apache.qpid.br
  oker:queue:test_cluster_failover_NtoN_rs_mq_0_3] org.apache.qpid.broker:queue:test_cluster_failover_NtoN_rs_mq_0_3
  [11:10:49] INFO:localhost:'Queue test_cluster_failover_NtoN_rs_mq_0_3 has msg-depth 58609 expected >= 58609', result:True, dur.:-1.00
  [11:10:53] DEBUG:Cluster state: 4 of 4 up and running

Comment 2 Frantisek Reznicek 2012-02-20 12:22:50 UTC
Similar case detected also on RHEL 6.2 x86_64 4 node cluster where qpid-receive aborted in similar single sender / receiver reconnect case (test_cluster_failover_1to1_rs):

  [12:52:55] INFO:localhost:'Cluster is fully up (4 of 4 up)', result:True, dur.:-1.00
  [12:53:05] DEBUG:test_cluster_failover_1to1_rs(): all (1) receiver[s] finished (0.0 sec[s])
  [12:53:05] ERROR:10.34.37.181:'~/mrg_cluster_tests/qpid-receive -b "10.34.37.181:5672" --failover-updates  --connection-options="{ reconnect: Tru
  e,  reconnect_limit: 10, reconnect_timeout: 10, username: guest, password:guest }" --address "test_cluster_failover_1to1_rs_mq_0_0 ; {create: sen
  der, delete:receiver, node:{ durable: False }}" > /tmp/qpid-receive.0.log', result:134, dur.:1.92
  [12:53:05] DEBUG:stdout:
  
  [12:53:05] DEBUG:stderr:
    2012-02-18 06:52:57 warning Connection [10.34.37.181:34773-10.34.37.181:5672] closed
    2012-02-18 06:52:57 warning Connection [10.34.37.181:34774-10.34.37.181:5672] closed
    pure virtual method called
    terminate called without an active exception
    bash: line 1: 21789 Aborted                 (core dumped) ~/mrg_cluster_tests/qpid-receive -b "10.34.37.181:5672" --failover-updates --connecti
  on-options="{ reconnect: True,  reconnect_limit: 10, reconnect_timeout: 10, username: guest, password:guest }" --address "test_cluster_failover_1
  to1_rs_mq_0_0 ; {create: sender, delete:receiver, node:{ durable: False }}" > /tmp/qpid-receive.0.log
  [12:53:06] INFO:10.34.37.178:'pidof qpid-send && kill `pidof qpid-send`', result:1, dur.:0.17
  [12:53:06] INFO:10.34.37.181:'pidof qpid-receive && kill `pidof qpid-receive`', result:1, dur.:0.13
  [12:53:07] INFO:10.34.37.181:'sort -u /tmp/qpid-receive.0.log|egrep "^msg_[0-9]{6}$"|wc -l', result:0, dur.:0.23
  [12:53:07] DEBUG:stdout:
    36777
  [12:53:07] ERROR:localhost:'Receiver 0 data check (36777 exp. 98589)', result:False, dur.:-1.00

Although above listing shows that core file should be available, it is unfortunately not the case.

Using packages:
  corosync-1.4.1-4.el6.x86_64
  corosynclib-1.4.1-4.el6.x86_64
  python-qpid-0.14-3.el6.noarch
  python-qpid-qmf-0.14-4.el6.x86_64
  python-saslwrapper-0.10-2.el6.x86_64
  qpid-cpp-client-0.14-6.el6.x86_64
  qpid-cpp-client-devel-0.14-6.el6.x86_64
  qpid-cpp-client-devel-docs-0.14-6.el6.noarch
  qpid-cpp-client-rdma-0.14-6.el6.x86_64
  qpid-cpp-client-ssl-0.14-6.el6.x86_64
  qpid-cpp-debuginfo-0.14-6.el6.x86_64
  qpid-cpp-server-0.14-6.el6.x86_64
  qpid-cpp-server-cluster-0.14-6.el6.x86_64
  qpid-cpp-server-devel-0.14-6.el6.x86_64
  qpid-cpp-server-rdma-0.14-6.el6.x86_64
  qpid-cpp-server-ssl-0.14-6.el6.x86_64
  qpid-cpp-server-store-0.14-6.el6.x86_64
  qpid-cpp-server-xml-0.14-6.el6.x86_64
  qpid-java-client-0.14-1.el6.noarch
  qpid-java-common-0.14-1.el6.noarch
  qpid-java-example-0.14-1.el6.noarch
  qpid-qmf-0.14-4.el6.x86_64
  qpid-qmf-debuginfo-0.14-4.el6.x86_64
  qpid-qmf-devel-0.14-4.el6.x86_64
  qpid-tests-0.14-1.el6.noarch
  qpid-tools-0.14-1.el6.noarch
  rh-qpid-cpp-tests-0.14-6.el6.x86_64
  ruby-qpid-qmf-0.14-4.el6.x86_64
  ruby-saslwrapper-0.10-2.el6.x86_64
  saslwrapper-0.10-2.el6.x86_64
  saslwrapper-debuginfo-0.10-2.el6.x86_64
  saslwrapper-devel-0.10-2.el6.x86_64
  sesame-1.0-2.el6.x86_64
  sesame-debuginfo-1.0-2.el6.x86_64


Raising reproducibility to 10% (triggered after 19 hours of testing)

I think one of the latest patches should cause this behavior as I did not see this on previous rpmset.

Comment 3 Justin Ross 2012-02-21 15:52:41 UTC
Gordon, please do a brief assessment.

Comment 4 Gordon Sim 2012-02-21 19:12:40 UTC
Do we have a backtrace from any of the previous failures? That would help pinpoint the issue.

Comment 5 Frantisek Reznicek 2012-02-22 08:49:32 UTC
(In reply to comment #4)
> Do we have a backtrace from any of the previous failures? That would help
> pinpoint the issue.

Not yet, I'm re-running the test now to be able to provide that info.

Comment 6 Frantisek Reznicek 2012-02-22 13:48:21 UTC
Requested data available from cluster_test (rhel5.7 cluster, 4 nodes, approx. 4 hours to trigger)


Test log:
  
  ...
  [14:14:21] ERROR:10.34.37.124:'~/mrg_cluster_tests/qpid-send -b "10.34.37.124:5672" --failover-updates --content-string "msg_%06d" --content-inde
  x-offset 0 --connection-options="{ reconnect: True,  reconnect_limit: 10, reconnect_timeout: 10, username: guest, password:guest }" --address "te
  st_cluster_failover_NtoN_rs_mq_0_3; {create: sender, delete:receiver, node:{ durable: False}}"  --durable no -m 90072', result:-1, dur.:1.09
  [14:14:21] DEBUG:stdout:
  
  [14:14:21] DEBUG:stderr:
    2012-02-22 08:14:10 warning Connection [10.34.37.124:54236-10.34.37.124:5672] closed
    2012-02-22 08:14:10 warning Connection [10.34.37.124:54237-10.34.37.124:5672] closed
    pure virtual method called
    terminate called without an active exception


Abort analysis:

  [root@dhcp-37-124 mrg_cluster_tests]# ls -lat /root/core.*
  -rw------- 1 root root 45199360 Feb 22 08:14 /root/core.17786
  [root@dhcp-37-124 mrg_cluster_tests]# gdb ./qpid-send /root/core.17786
  GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-42.el5)
  
  ...
  
  Core was generated by `/root/mrg_cluster_tests/qpid-send -b 10.34.37.124:5672 --failover-updates --con'.
  Program terminated with signal 6, Aborted.
  #0  0x0000003a28030265 in raise () from /lib64/libc.so.6
  (gdb) info threads
    5 Thread 0x2b2ef8330660 (LWP 17786)  0x0000003a2880af59 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    4 Thread 17800  0x0000003a28079b70 in strlen () from /lib64/libc.so.6
    3 Thread 17801  0x0000003a2880af59 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    2 Thread 17831  0x0000003a280d4971 in clone () from /lib64/libc.so.6
  * 1 Thread 0x41219940 (LWP 17827)  0x0000003a28030265 in raise () from /lib64/libc.so.6
  (gdb) thread apply all bt
  
  Thread 5 (Thread 0x2b2ef8330660 (LWP 17786)):
  #0  0x0000003a2880af59 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  #1  0x0000003a2bc9dee9 in wait (this=0x2aaaac016948, desired=Traceback (most recent call last):
    File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 322, in to_string
      return '%s with %d elements' % (self.typename,
  OverflowError: __len__() should return 0 <= outcome < 2**31
  Cannot access memory at address 0x1638
  ) at ../include/qpid/sys/posix/Condition.h:63
  #2  wait (this=0x2aaaac016948, desired=Traceback (most recent call last):
    File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 322, in to_string
      return '%s with %d elements' % (self.typename,
  OverflowError: __len__() should return 0 <= outcome < 2**31
  Cannot access memory at address 0x1638
  ) at ../include/qpid/sys/Monitor.h:41
  #3  qpid::client::StateManager::waitFor (this=0x2aaaac016948, desired=Traceback (most recent call last):
    File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 322, in to_string
      return '%s with %d elements' % (self.typename,
  OverflowError: __len__() should return 0 <= outcome < 2**31
  Cannot access memory at address 0x1638
  ) at qpid/client/StateManager.cpp:51
  #4  0x0000003a2bc584c9 in qpid::client::ConnectionHandler::waitForOpen (this=0x2aaaac0168d8) at qpid/client/ConnectionHandler.cpp:145
  #5  0x0000003a2bc66a38 in qpid::client::ConnectionImpl::open (this=0x2aaaac0167f0) at qpid/client/ConnectionImpl.cpp:284
  #6  0x0000003a2bc5639b in qpid::client::Connection::open (this=0x12ce7ae0, settings=...) at qpid/client/Connection.cpp:125
  #7  0x0000003a2bc56908 in qpid::client::Connection::open (this=0x12ce7ae0, url=..., settings=...) at qpid/client/Connection.cpp:81
  #8  0x00002b2ef80e8b19 in qpid::client::amqp0_10::ConnectionImpl::tryConnect (this=0x12ce7a20) at qpid/client/amqp0_10/ConnectionImpl.cpp:280
  #9  0x00002b2ef80e94df in qpid::client::amqp0_10::ConnectionImpl::connect (this=0x12ce7a20, started=...)
      at qpid/client/amqp0_10/ConnectionImpl.cpp:250
  #10 0x00002b2ef80e98ef in qpid::client::amqp0_10::ConnectionImpl::open (this=0x12ce7a20) at qpid/client/amqp0_10/ConnectionImpl.cpp:224
  #11 0x00002b2ef80e482c in qpid::client::amqp0_10::ConnectionImpl::reopen (this=0x2aaaac01697c) at qpid/client/amqp0_10/ConnectionImpl.cpp:235
  #12 0x00002b2ef810c14e in qpid::client::amqp0_10::SessionImpl::execute<qpid::client::amqp0_10::SenderImpl::Send> (this=0x12d360c0,
      f=<value optimized out>) at qpid/client/amqp0_10/SessionImpl.h:102
  #13 0x00002b2ef8109d21 in qpid::client::amqp0_10::SenderImpl::send (this=0x12d378f0, message=<value optimized out>, sync=false)
      at qpid/client/amqp0_10/SenderImpl.cpp:44
  #14 0x000000000040fe0a in main ()
  
  Thread 4 (Thread 17800):
  #0  0x0000003a28079b70 in strlen () from /lib64/libc.so.6
  #1  0x0000003a2a89c500 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) () from /usr/lib64/libstdc++.so.6
  #2  0x0000003a2992f3f5 in qpid::sys::strError (err=111) at qpid/sys/posix/StrError.cpp:33
  #3  0x0000003a29923384 in qpid::sys::posix::AsynchConnector::connComplete (this=0x2aaaac017460, h=...) at qpid/sys/posix/AsynchIO.cpp:217
  #4  0x0000003a29a0b13a in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() (
      this=0x3a28120eae, a0=...) at /usr/include/boost/function/function_template.hpp:576
  #5  0x0000003a29a0a438 in qpid::sys::DispatchHandle::processEvent (this=0x2aaaac017468, type=DISCONNECTED) at qpid/sys/DispatchHandle.cpp:291
  #6  0x0000003a29934ed4 in process (this=0x12ce8330) at qpid/sys/Poller.h:131
  #7  qpid::sys::Poller::run (this=0x12ce8330) at qpid/sys/epoll/EpollPoller.cpp:524
  #8  0x0000003a2992bfaa in qpid::sys::(anonymous namespace)::runRunnable (p=0x3a28120eae) at qpid/sys/posix/Thread.cpp:35
  #9  0x0000003a2880677d in start_thread () from /lib64/libpthread.so.0
  #10 0x0000003a280d49ad in clone () from /lib64/libc.so.6
  
  Thread 3 (Thread 17801):
  #0  0x0000003a2880af59 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  #1  0x00002b2ef80e9697 in acquire (this=0x12ce7a20) at ../include/qpid/sys/posix/Condition.h:63
  #2  lock (this=0x12ce7a20) at qpid/sys/Semaphore.h:32
  #3  ScopedLock (this=0x12ce7a20) at ../include/qpid/sys/Mutex.h:33
  #4  qpid::client::amqp0_10::ConnectionImpl::open (this=0x12ce7a20) at qpid/client/amqp0_10/ConnectionImpl.cpp:222
  #5  0x00002b2ef80e482c in qpid::client::amqp0_10::ConnectionImpl::reopen (this=0x12ce7a7c) at qpid/client/amqp0_10/ConnectionImpl.cpp:235
  #6  0x00002b2ef80f7605 in qpid::client::amqp0_10::SessionImpl::execute<qpid::client::amqp0_10::ReceiverImpl::Fetch> (this=0x12d34cc0,
      f=<value optimized out>) at qpid/client/amqp0_10/SessionImpl.h:102
  #7  0x00002b2ef80f65ac in qpid::client::amqp0_10::ReceiverImpl::fetch (this=0x12ce8f70, message=<value optimized out>, timeout=...)
      at qpid/client/amqp0_10/ReceiverImpl.cpp:71
  #8  0x00002b2ef80ccd9d in qpid::messaging::Receiver::fetch (this=<value optimized out>, message=..., timeout=...) at qpid/messaging/Receiver.cpp:38
  #9  0x00002b2ef80ce399 in qpid::messaging::FailoverUpdatesImpl::run (this=0x12d349b0) at qpid/messaging/FailoverUpdates.cpp:63
  #10 0x0000003a2992bfaa in qpid::sys::(anonymous namespace)::runRunnable (p=0x12ce7a7c) at qpid/sys/posix/Thread.cpp:35
  #11 0x0000003a2880677d in start_thread () from /lib64/libpthread.so.0
  #12 0x0000003a280d49ad in clone () from /lib64/libc.so.6
  
  Thread 2 (Thread 17831):
  #0  0x0000003a280d4971 in clone () from /lib64/libc.so.6
  #1  0x0000003a288066a0 in ?? () from /lib64/libpthread.so.0
  #2  0x00000000433d1940 in ?? ()
  #3  0x0000000000000000 in ?? ()
  
  Thread 1 (Thread 0x41219940 (LWP 17827)):
  #0  0x0000003a28030265 in raise () from /lib64/libc.so.6
  #1  0x0000003a28031d10 in abort () from /lib64/libc.so.6
  #2  0x0000003a2a8bed14 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
  #3  0x0000003a2a8bce16 in ?? () from /usr/lib64/libstdc++.so.6
  #4  0x0000003a2a8bce43 in std::terminate() () from /usr/lib64/libstdc++.so.6
  #5  0x0000003a2a8bd34f in __cxa_pure_virtual () from /usr/lib64/libstdc++.so.6
  #6  0x0000003a29929c3a in boost::function2<void, qpid::sys::AsynchIO&, qpid::sys::Socket const&, std::allocator<boost::function_base> >::operator()
      (this=0x0, a0=..., a1=...) at /usr/include/boost/function/function_template.hpp:576
  #7  0x0000003a29a0b13a in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() (this=0x0, a0=...)
      at /usr/include/boost/function/function_template.hpp:576
  #8  0x0000003a29a0a438 in qpid::sys::DispatchHandle::processEvent (this=0x12ce9818, type=DISCONNECTED) at qpid/sys/DispatchHandle.cpp:291
  #9  0x0000003a29934ed4 in process (this=0x12ce8330) at qpid/sys/Poller.h:131
  #10 qpid::sys::Poller::run (this=0x12ce8330) at qpid/sys/epoll/EpollPoller.cpp:524
  #11 0x0000003a2992bfaa in qpid::sys::(anonymous namespace)::runRunnable (p=0x457a) at qpid/sys/posix/Thread.cpp:35
  #12 0x0000003a2880677d in start_thread () from /lib64/libpthread.so.0
  #13 0x0000003a280d49ad in clone () from /lib64/libc.so.6
  (gdb) quit

Comment 7 Gordon Sim 2012-02-22 14:07:32 UTC
Thanks Frantisek!

Justin: Its an IO thread that aborts, and its at a fairly low level. There may be some way of working around it in the messaging API impl, but my guess is that this is some case we have missed so far in the correct cleanup of things at the TCPConnector/AsynchIO level.

Comment 8 Frantisek Reznicek 2012-04-10 07:40:05 UTC
This issue is from time to time seen in cluster test on qpid-cpp*-0.14-14.el6_2 + qpid-qmf-*0.14-6.el6_2...

Test raw log:
  [11:55:44] ERROR:10.34.37.179:'~/mrg_cluster_tests/qpid-receive -b "10.34.37.179:5672" --failover-updates  --connection-options="{ reconnect: Tru
  e,  reconnect_limit: 10, reconnect_timeout: 10, username: guest, password:guest }" --address "test_cluster_failover_NtoN_rs_mq_0_2 ; {create: sen
  der, delete:receiver, node:{ durable: False }}" > /tmp/qpid-receive.2.log', result:134, dur.:2.57
  [11:55:44] DEBUG:stdout:
  
  [11:55:44] DEBUG:stderr:
    2012-04-07 05:55:33 warning Connection [10.34.37.179:38765-10.34.37.179:5672] closed
    2012-04-07 05:55:33 warning Connection [10.34.37.179:38769-10.34.37.179:5672] closed
    pure virtual method called
    terminate called without an active exception
    bash: line 1: 27904 Aborted                 (core dumped) ~/mrg_cluster_tests/qpid-receive -b "10.34.37.179:5672" --failover-updates --connecti
  on-options="{ reconnect: True,  reconnect_limit: 10, reconnect_timeout: 10, username: guest, password:guest }" --address "test_cluster_failover_N
  toN_rs_mq_0_2 ; {create: sender, delete:receiver, node:{ durable: False }}" > /tmp/qpid-receive.2.log


Core file analysis:
  Core was generated by `/root/mrg_cluster_tests/qpid-receive -b 10.34.37.179:5672 --failover-updates --'.
  Program terminated with signal 6, Aborted.
  #0  0x0000003c64432885 in raise () from /lib64/libc.so.6
  Missing separate debuginfos, use: debuginfo-install boost-filesystem-1.41.0-11.el6_1.2.x86_64 boost-program-options-1.41.0-11.el6_1.2.x86_64 boost-system-1.41.0-11.el6_1.2.x86_64 cyrus-sasl-gssapi-2.1.23-13.el6.x86_64 cyrus-sasl-lib-2.1.23-13.el6.x86_64 cyrus-sasl-md5-2.1.23-13.el6.x86_64 cyrus-sasl-plain-2.1.23-13.el6.x86_64 db4-4.7.25-16.el6.x86_64 glibc-2.12-1.47.el6_2.9.x86_64 keyutils-libs-1.4-3.el6.x86_64 krb5-libs-1.9-22.el6_2.1.x86_64 libcom_err-1.41.12-11.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 libibverbs-1.1.5-3.el6.x86_64 librdmacm-1.0.14.1-3.el6.x86_64 libselinux-2.0.94-5.2.el6.x86_64 libstdc++-4.4.6-3.el6.x86_64 libuuid-2.17.2-12.4.el6.x86_64 nspr-4.8.9-3.el6_2.x86_64 nss-3.13.1-7.el6_2.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 nss-util-3.13.1-3.el6_2.x86_64 openssl-1.0.0-20.el6_2.3.x86_64 zlib-1.2.3-27.el6.x86_64
  (gdb) info threads
    5 Thread 0x7f4d01314700 (LWP 27971)  0x0000003c644e2877 in mprotect () from /lib64/libc.so.6
    4 Thread 0x7f4d02716700 (LWP 27937)  0x0000003c64c0b3dc in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib64/libpthread.so.0
    3 Thread 0x7f4d044da700 (LWP 27928)  0x0000003c644e62c3 in epoll_wait () from /lib64/libc.so.6
    2 Thread 0x7f4d053e07a0 (LWP 27904)  0x0000003c64c0b3dc in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib64/libpthread.so.0
  * 1 Thread 0x7f4d01d15700 (LWP 27966)  0x0000003c64432885 in raise () from /lib64/libc.so.6
  (gdb) t a a bt
  
  Thread 5 (Thread 0x7f4d01314700 (LWP 27971)):
  #0  0x0000003c644e2877 in mprotect () from /lib64/libc.so.6
  #1  0x0000003c6447922f in _int_malloc () from /lib64/libc.so.6
  #2  0x0000003c644797d1 in malloc () from /lib64/libc.so.6
  #3  0x0000003c67cbd0bd in operator new(unsigned long) () from /usr/lib64/libstdc++.so.6
  #4  0x0000003c67cbd1d9 in operator new[](unsigned long) () from /usr/lib64/libstdc++.so.6
  #5  0x00000031172936f7 in Buff (this=0x7f4cf4002880, aio_=<value optimized out>) at qpid/client/TCPConnector.cpp:50
  #6  qpid::client::TCPConnector::start (this=0x7f4cf4002880, aio_=<value optimized out>)
      at qpid/client/TCPConnector.cpp:121
  #7  0x000000311729424e in qpid::client::TCPConnector::connected (this=0x7f4cf4002880)
      at qpid/client/TCPConnector.cpp:113
  #8  0x000000311693d2ff in operator() (this=0x7f4cf4002c30, h=...)
      at /usr/include/boost/function/function_template.hpp:1013
  #9  qpid::sys::posix::AsynchConnector::connComplete (this=0x7f4cf4002c30, h=...) at qpid/sys/posix/AsynchIO.cpp:200
  #10 0x0000003116a09503 in boost::function1<void, qpid::sys::DispatchHandle&>::operator() (
      this=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/function/function_template.hpp:1013
  #11 0x0000003116a063ce in qpid::sys::DispatchHandle::processEvent (this=0x7f4cf4002c38,
      type=qpid::sys::Poller::WRITABLE) at qpid/sys/DispatchHandle.cpp:287
  #12 0x0000003116948d4d in process (this=0xe2cca0) at qpid/sys/Poller.h:131
  #13 qpid::sys::Poller::run (this=0xe2cca0) at qpid/sys/epoll/EpollPoller.cpp:524
  #14 0x000000311694066a in qpid::sys::(anonymous namespace)::runRunnable (p=<value optimized out>)
      at qpid/sys/posix/Thread.cpp:35
  #15 0x0000003c64c077f1 in start_thread () from /lib64/libpthread.so.0
  #16 0x0000003c644e5ccd in clone () from /lib64/libc.so.6
  
  Thread 4 (Thread 0x7f4d02716700 (LWP 27937)):
  #0  0x0000003c64c0b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  #1  0x00007f4d0543bd70 in qpid::sys::Condition::wait (this=<value optimized out>, mutex=<value optimized out>)
      at ../include/qpid/sys/posix/Condition.h:63
  #2  0x000000311728b17b in wait (this=0x7f4cf40021d8, desired=std::set with 3 elements = {...})
      at ../include/qpid/sys/Monitor.h:41
  #3  qpid::client::StateManager::waitFor (this=0x7f4cf40021d8, desired=std::set with 3 elements = {...})
      at qpid/client/StateManager.cpp:51
  #4  0x000000311725e1dc in qpid::client::ConnectionHandler::waitForOpen (this=0x7f4cf4002168)
      at qpid/client/ConnectionHandler.cpp:145
  #5  0x0000003117265e6d in qpid::client::ConnectionImpl::open (this=0x7f4cf4002080)
      at qpid/client/ConnectionImpl.cpp:284
  #6  0x000000311725c0a0 in qpid::client::Connection::open (this=0xe2dbc0, settings=<value optimized out>)
      at qpid/client/Connection.cpp:125
  #7  0x000000311725ca8e in qpid::client::Connection::open (this=0xe2dbc0, url=..., settings=...)
      at qpid/client/Connection.cpp:81
  #8  0x00007f4d05438f13 in qpid::client::amqp0_10::ConnectionImpl::tryConnect (this=0xe2db00)
      at qpid/client/amqp0_10/ConnectionImpl.cpp:280
  #9  0x00007f4d054397ed in qpid::client::amqp0_10::ConnectionImpl::connect (this=0xe2db00, started=...)
      at qpid/client/amqp0_10/ConnectionImpl.cpp:250
  #10 0x00007f4d0543a423 in qpid::client::amqp0_10::ConnectionImpl::open (this=0xe2db00)
      at qpid/client/amqp0_10/ConnectionImpl.cpp:224
  #11 0x00007f4d05436e16 in qpid::client::amqp0_10::ConnectionImpl::reopen (this=<value optimized out>)
      at qpid/client/amqp0_10/ConnectionImpl.cpp:235
  #12 0x00007f4d05443324 in qpid::client::amqp0_10::SessionImpl::execute<qpid::client::amqp0_10::ReceiverImpl::Fetch>
      (this=0xe2d1e0, f=<value optimized out>) at qpid/client/amqp0_10/SessionImpl.h:102
  #13 0x00007f4d0544206c in qpid::client::amqp0_10::ReceiverImpl::fetch (this=0xe2d900,
      message=<value optimized out>, timeout=<value optimized out>) at qpid/client/amqp0_10/ReceiverImpl.cpp:71
  #14 0x00007f4d05427649 in qpid::messaging::FailoverUpdatesImpl::run (this=0xe2ceb0)
      at qpid/messaging/FailoverUpdates.cpp:63
  #15 0x000000311694066a in qpid::sys::(anonymous namespace)::runRunnable (p=<value optimized out>)
      at qpid/sys/posix/Thread.cpp:35
  #16 0x0000003c64c077f1 in start_thread () from /lib64/libpthread.so.0
  #17 0x0000003c644e5ccd in clone () from /lib64/libc.so.6
  
  Thread 3 (Thread 0x7f4d044da700 (LWP 27928)):
  #0  0x0000003c644e62c3 in epoll_wait () from /lib64/libc.so.6
  #1  0x000000311694867a in qpid::sys::Poller::wait (this=0xe2cca0, timeout=...) at qpid/sys/epoll/EpollPoller.cpp:568
  #2  0x0000003116948d27 in qpid::sys::Poller::run (this=0xe2cca0) at qpid/sys/epoll/EpollPoller.cpp:520
  #3  0x000000311694066a in qpid::sys::(anonymous namespace)::runRunnable (p=<value optimized out>)
      at qpid/sys/posix/Thread.cpp:35
  #4  0x0000003c64c077f1 in start_thread () from /lib64/libpthread.so.0
  #5  0x0000003c644e5ccd in clone () from /lib64/libc.so.6
  
  Thread 2 (Thread 0x7f4d053e07a0 (LWP 27904)):
  #0  0x0000003c64c0b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  #1  0x00007f4d0543bd70 in qpid::sys::Condition::wait (this=<value optimized out>, mutex=<value optimized out>)
      at ../include/qpid/sys/posix/Condition.h:63
  #2  0x00007f4d0543a3eb in wait (this=0xe2db00) at ../include/qpid/sys/Monitor.h:41
  #3  acquire (this=0xe2db00) at qpid/sys/Semaphore.h:50
  #4  lock (this=0xe2db00) at qpid/sys/Semaphore.h:32
  #5  ScopedLock (this=0xe2db00) at ../include/qpid/sys/Mutex.h:33
  #6  qpid::client::amqp0_10::ConnectionImpl::open (this=0xe2db00) at qpid/client/amqp0_10/ConnectionImpl.cpp:222
  #7  0x00007f4d05436e16 in qpid::client::amqp0_10::ConnectionImpl::reopen (this=<value optimized out>)
      at qpid/client/amqp0_10/ConnectionImpl.cpp:235
  #8  0x00007f4d05443324 in qpid::client::amqp0_10::SessionImpl::execute<qpid::client::amqp0_10::ReceiverImpl::Fetch>
      (this=0xe2e840, f=<value optimized out>) at qpid/client/amqp0_10/SessionImpl.h:102
  #9  0x00007f4d0544206c in qpid::client::amqp0_10::ReceiverImpl::fetch (this=0xe2f950,
      message=<value optimized out>, timeout=<value optimized out>) at qpid/client/amqp0_10/ReceiverImpl.cpp:71
  #10 0x000000000040d71c in main ()
  
  Thread 1 (Thread 0x7f4d01d15700 (LWP 27966)):
  #0  0x0000003c64432885 in raise () from /lib64/libc.so.6
  #1  0x0000003c64434065 in abort () from /lib64/libc.so.6
  #2  0x0000003c67cbea7d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
  #3  0x0000003c67cbcc06 in ?? () from /usr/lib64/libstdc++.so.6
  #4  0x0000003c67cbcc33 in std::terminate() () from /usr/lib64/libstdc++.so.6
  #5  0x0000003c67cbd55f in __cxa_pure_virtual () from /usr/lib64/libstdc++.so.6
  #6  0x000000311693b1b6 in operator() (this=0x7f4cfc000980, h=<value optimized out>)
      at /usr/include/boost/function/function_template.hpp:1013
  #7  qpid::sys::posix::AsynchIO::close (this=0x7f4cfc000980, h=<value optimized out>)
      at qpid/sys/posix/AsynchIO.cpp:597
  #8  0x000000311693c88f in qpid::sys::posix::AsynchIO::disconnected (this=0x7f4cfc000980, h=...)
      at qpid/sys/posix/AsynchIO.cpp:587
  #9  0x0000003116a09503 in boost::function1<void, qpid::sys::DispatchHandle&>::operator() (
      this=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/function/function_template.hpp:1013
  #10 0x0000003116a06385 in qpid::sys::DispatchHandle::processEvent (this=0x7f4cfc000988,
      type=qpid::sys::Poller::DISCONNECTED) at qpid/sys/DispatchHandle.cpp:291
  #11 0x0000003116948d4d in process (this=0xe2cca0) at qpid/sys/Poller.h:131
  #12 qpid::sys::Poller::run (this=0xe2cca0) at qpid/sys/epoll/EpollPoller.cpp:524
  #13 0x000000311694066a in qpid::sys::(anonymous namespace)::runRunnable (p=<value optimized out>)
      at qpid/sys/posix/Thread.cpp:35
  #14 0x0000003c64c077f1 in start_thread () from /lib64/libpthread.so.0
  #15 0x0000003c644e5ccd in clone () from /lib64/libc.so.6

Comment 10 Andrew Stitcher 2013-04-29 15:19:03 UTC
This looks like another instance of BZ883469.

I'll close this bug as duplicate in favour of that bug, as that bug is active.

*** This bug has been marked as a duplicate of bug 883469 ***