Description of problem: broker and c++ client occasionally crahsing during perftesting: [gordon@mrg15 cpp]$ ./src/qpidd --auth no -p 5673 2009-06-01 05:53:54 notice SASL disabled: No Authentication Performed 2009-06-01 05:53:54 notice Listening on TCP port 5673 2009-06-01 05:53:54 notice Broker running 2009-06-01 06:48:39 warning CLOSING [127.0.0.1:55531] unsent data (probably due to client disconnect) lt-qpidd: qpid/sys/DispatchHandle.cpp:260: virtual void qpid::sys::DispatchHandle::processEvent(qpid::sys::Poller::EventType): Assertion `state!=CALLING' failed. Aborted (core dumped) And on the client side: Core was generated by `/home/gordon/qpid/cpp/src/tests/.libs/lt-perftest --qt 1 --summary'. Program terminated with signal 6, Aborted. [New process 31016] [New process 31002] #0 0x0000003c52e30215 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x0000003c52e30215 in raise () from /lib64/libc.so.6 #1 0x0000003c52e31cc0 in abort () from /lib64/libc.so.6 #2 0x0000003c52e29696 in __assert_fail () from /lib64/libc.so.6 #3 0x00002b30282fc331 in qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::markForDeletion (handle=0x18bafd50) at ./qpid/sys/posix/Mutex.h:116 #4 0x00002b30282f7969 in ~Poller (this=0x18bafb70) at qpid/sys/epoll/EpollPoller.cpp:189 #5 0x00002b3027efdf83 in ~TCPConnector (this=0x18baf490) at /usr/include/boost/detail/sp_counted_base_gcc_x86.hpp:145 #6 0x00002b3027ef308a in ~ConnectionImpl (this=0x18baec70) at /usr/include/boost/checked_delete.hpp:34 #7 0x00002b3027eff1bf in qpid::client::TCPConnector::run (this=0x18baf490) at /usr/include/boost/detail/sp_counted_base_gcc_x86.hpp:145 #8 0x00002b30282f0a6a in runRunnable (p=0x791a) at qpid/sys/posix/Thread.cpp:35 #9 0x0000003c53a06367 in start_thread () from /lib64/libpthread.so.0 #10 0x0000003c52ed30ad in clone () from /lib64/libc.so.6 (gdb) Version-Release number of selected component (if applicable): r779262 of qpid trunk How reproducible: Fairly rare
Also see QPID-1984 for what looks like a much more frequent occurence of the same assertion as listed above for the broker.
*** Bug 514036 has been marked as a duplicate of this bug. ***
I don't think the stack trace here makes much sense. However the stack trace in Bug 514036 makes sense. I'm treating this bug as if it had that stack trace!
I can't reproduce if tracing is enabled, but can reliably reproduce like this 2009-07-27 13:55:45 notice SASL disabled: No Authentication Performed 2009-07-27 13:55:45 notice Listening on TCP port 5672 2009-07-27 13:55:45 notice Broker running lt-qpidd: qpid/sys/DispatchHandle.cpp:259: virtual void qpid::sys::DispatchHandle::processEvent(qpid::sys::Poller::EventType): Assertion `state!=CALLING' failed. Aborted (core dumped) from trunk revision 798241 to reproduce: ./qpidd --auth no --tcp-nodelay [localhost tests]$ ./latencytest --rate 1000000 --tcp-nodelay Latency(ms): min=0.560546, max=523.206, avg=305.966 Latency(ms): min=508.735, max=1009.36, avg=782.974 Latency(ms): min=998.361, max=1501.98, avg=1299.21 ^C [localhost tests]$ ./latencytest --rate 100 --tcp-nodelay Connection refused: localhost:5672 (qpid/sys/posix/Socket.cpp:162) Note that setting tcp-nodelay on latencytest is key. setting or not setting on broker both exhibit the issue. Carl.
This bug turned out to be an issue with both the DispatchHandle and Poller implementations and so has been fixed in both of them. The PollerTest program has been updated so that it now would have picked up the original Poller bug.
The issue has been fixed, tested on RHEL 5.5 / 4.8 i386 / x86_64 on packages: qpid-tools-0.7.946106-4.el5 qpid-cpp-server-0.7.946106-2.el5 python-qpid-0.7.946106-1.el5 qpid-cpp-client-devel-0.7.946106-2.el5 qpid-cpp-client-0.7.946106-2.el5 qpid-cpp-server-cluster-0.7.946106-2.el5 -> VERIFIED