Bug 503760

Summary: qpid::sys::DispatchHandle::processEvent(qpid::sys::Poller::EventType): Assertion `state!=CALLING' failed
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-cppAssignee: Andrew Stitcher <astitcher>
Status: CLOSED ERRATA QA Contact: Frantisek Reznicek <freznice>
Severity: urgent Docs Contact:
Priority: urgent    
Version: DevelopmentCC: cctrieloff, esammons, freznice
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gordon Sim 2009-06-02 15:29:19 UTC
Description of problem:

broker and c++ client occasionally crahsing during perftesting:


[gordon@mrg15 cpp]$ ./src/qpidd --auth no -p 5673
2009-06-01 05:53:54 notice SASL disabled: No Authentication Performed
2009-06-01 05:53:54 notice Listening on TCP port 5673
2009-06-01 05:53:54 notice Broker running
2009-06-01 06:48:39 warning CLOSING [127.0.0.1:55531] unsent data (probably due to client disconnect)
lt-qpidd: qpid/sys/DispatchHandle.cpp:260: virtual void qpid::sys::DispatchHandle::processEvent(qpid::sys::Poller::EventType): Assertion `state!=CALLING' failed.
Aborted (core dumped)

And on the client side:

Core was generated by `/home/gordon/qpid/cpp/src/tests/.libs/lt-perftest --qt 1 --summary'.
Program terminated with signal 6, Aborted.
[New process 31016]
[New process 31002]
#0  0x0000003c52e30215 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003c52e30215 in raise () from /lib64/libc.so.6
#1  0x0000003c52e31cc0 in abort () from /lib64/libc.so.6
#2  0x0000003c52e29696 in __assert_fail () from /lib64/libc.so.6
#3  0x00002b30282fc331 in qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::markForDeletion (handle=0x18bafd50)
    at ./qpid/sys/posix/Mutex.h:116
#4  0x00002b30282f7969 in ~Poller (this=0x18bafb70) at qpid/sys/epoll/EpollPoller.cpp:189
#5  0x00002b3027efdf83 in ~TCPConnector (this=0x18baf490) at /usr/include/boost/detail/sp_counted_base_gcc_x86.hpp:145
#6  0x00002b3027ef308a in ~ConnectionImpl (this=0x18baec70) at /usr/include/boost/checked_delete.hpp:34
#7  0x00002b3027eff1bf in qpid::client::TCPConnector::run (this=0x18baf490)
    at /usr/include/boost/detail/sp_counted_base_gcc_x86.hpp:145
#8  0x00002b30282f0a6a in runRunnable (p=0x791a) at qpid/sys/posix/Thread.cpp:35
#9  0x0000003c53a06367 in start_thread () from /lib64/libpthread.so.0
#10 0x0000003c52ed30ad in clone () from /lib64/libc.so.6
(gdb)


Version-Release number of selected component (if applicable):

r779262 of qpid trunk

How reproducible:

Fairly rare

Comment 1 Gordon Sim 2009-07-15 06:51:28 UTC
Also see QPID-1984 for what looks like a much more frequent occurence of the same assertion as listed above for the broker.

Comment 2 Andrew Stitcher 2009-07-30 15:29:55 UTC
*** Bug 514036 has been marked as a duplicate of this bug. ***

Comment 3 Andrew Stitcher 2009-07-30 16:11:25 UTC
I don't think the stack trace here makes much sense. However the stack trace in Bug 514036 makes sense. I'm treating this bug as if it had that stack trace!

Comment 4 Carl Trieloff 2009-07-31 14:45:40 UTC

I can't reproduce if tracing is enabled, but can reliably reproduce like this

2009-07-27 13:55:45 notice SASL disabled: No Authentication Performed
2009-07-27 13:55:45 notice Listening on TCP port 5672
2009-07-27 13:55:45 notice Broker running
lt-qpidd: qpid/sys/DispatchHandle.cpp:259: virtual void
qpid::sys::DispatchHandle::processEvent(qpid::sys::Poller::EventType):
Assertion `state!=CALLING' failed.
Aborted (core dumped)

from trunk revision 798241

to reproduce:
 ./qpidd --auth no --tcp-nodelay

[localhost tests]$ ./latencytest --rate 1000000 --tcp-nodelay 
Latency(ms): min=0.560546, max=523.206, avg=305.966
Latency(ms): min=508.735, max=1009.36, avg=782.974
Latency(ms): min=998.361, max=1501.98, avg=1299.21
^C
[localhost tests]$ ./latencytest --rate 100 --tcp-nodelay 
Connection refused: localhost:5672 (qpid/sys/posix/Socket.cpp:162)

Note that setting tcp-nodelay on latencytest is key. setting or not setting on broker both exhibit the issue.

Carl.

Comment 5 Andrew Stitcher 2009-08-12 18:43:40 UTC
This bug turned out to be an issue with both the DispatchHandle and Poller implementations and so has been fixed in both of them.

The PollerTest program has been updated so that it now would have picked up the original Poller bug.

Comment 7 Frantisek Reznicek 2010-06-09 15:27:46 UTC
The issue has been fixed, tested on RHEL 5.5 / 4.8 i386 / x86_64 on packages:
qpid-tools-0.7.946106-4.el5
qpid-cpp-server-0.7.946106-2.el5
python-qpid-0.7.946106-1.el5
qpid-cpp-client-devel-0.7.946106-2.el5
qpid-cpp-client-0.7.946106-2.el5
qpid-cpp-server-cluster-0.7.946106-2.el5

-> VERIFIED