Description of problem: The pattern used is for a timer task to be reset whenever there is traffic. However as the task is on the queue at this point, this may violate some assumptions and can cause earlier events to be blocked behind later events. Likely need a similar fix as r785733, made to the broker. The severity in this case is somehwat lower as it would simply delay failover on occasion rather than preventing heartbeats from functioning correctly. Need a test for this case.
I'm moving this to 1.2; there is a theoretical issue but in some basic testing I have not been able to observe any problem where all connections in the process use the same heartbeat interval so I don't think it is a 1.1.6 blocker.
This should have been finally fixed in 1.2 by the r799271 svn checkin
Hello Andrew, could you possibly define testing scenario for this defect, please? It seems to be clear that heartbeat has to be enabled, and c++ client should create multiple connections. The unclear point is how the client can be exited due to timeout? Moreover, should every connection open a / multiple session[s]? Does it necessary to use function when failing over to another clustered broker, or it is cluster independent. Raising NEEDINFO.
This is a potential broker bug in the timer code where it might wrongly disconnect a heartbeat connection late if there are other heartbeat connections at the same time. So I think to test you'd need multiple connections all of them with a similar length heartbeat. say 2-3s. They shouldn't actually do anything else. Then you'd SIGSTOP one of the clients for more than 2x the heartbeat and check that it was disconnected at the correct time without being delayed. [The SIGSTOP stops the client from being able to send heartbeats without disconnecting the connection] IIRC The heartbeat disconnects when 2 heartbeats are missed, so if you set heartbeat = 2s then you'd expect disconnect after SIGSTOP from approx 2-4s later.
fixed in qpid-cpp-server-0.7.946106-17, daemon waits 10mins. validated on RHEL5.5 / RHEL4.8 i386 / x86_64 packages: # rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u openais-0.80.6-16.el5_5.7 openais-devel-0.80.6-16.el5_5.7 python-qpid-0.7.946106-14.el5 qpid-cpp-client-0.7.946106-17.el5 qpid-cpp-client-devel-0.7.946106-17.el5 qpid-cpp-client-devel-docs-0.7.946106-17.el5 qpid-cpp-client-ssl-0.7.946106-17.el5 qpid-cpp-mrg-debuginfo-0.7.946106-14.el5 qpid-cpp-server-0.7.946106-17.el5 qpid-cpp-server-cluster-0.7.946106-17.el5 qpid-cpp-server-devel-0.7.946106-17.el5 qpid-cpp-server-ssl-0.7.946106-17.el5 qpid-cpp-server-store-0.7.946106-17.el5 qpid-cpp-server-xml-0.7.946106-17.el5 qpid-java-client-0.7.946106-10.el5 qpid-java-common-0.7.946106-10.el5 qpid-tools-0.7.946106-11.el5 rhm-docs-0.7.946106-5.el5 rh-tests-distribution-MRG-Messaging-qpid_common-1.6-53 ->VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The C++ client may not have timed out accurately when multiple connections existed in the process. This could cause earlier events to be blocked behind later events. With this update, any timeouts are handled properly and work as expected.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html