Bug 506556 - c++ client may not timeout accurately where multiple connections exist in the process
Summary: c++ client may not timeout accurately where multiple connections exist in the...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1.1
Hardware: All
OS: Linux
high
medium
Target Milestone: 1.3
: ---
Assignee: Andrew Stitcher
QA Contact: Jiri Kolar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-17 18:24 UTC by Gordon Sim
Modified: 2011-08-12 16:21 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The C++ client may not have timed out accurately when multiple connections existed in the process. This could cause earlier events to be blocked behind later events. With this update, any timeouts are handled properly and work as expected.
Clone Of:
Environment:
Last Closed: 2010-10-14 16:04:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0773 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 15:56:44 UTC

Description Gordon Sim 2009-06-17 18:24:00 UTC
Description of problem:

The pattern used is for a timer task to be reset whenever there is traffic. However as the task is on the queue at this point, this may violate some assumptions and can cause earlier events to be blocked behind later events.

Likely need a similar fix as r785733, made to the broker. The severity in this case is somehwat lower as it would simply delay failover on occasion rather than
preventing heartbeats from functioning correctly.

Need a test for this case.

Comment 1 Gordon Sim 2009-06-24 18:05:32 UTC
I'm moving this to 1.2; there is a theoretical issue but in some basic testing I have not been able to observe any problem where all connections in the process use the same heartbeat interval so I don't think it is a 1.1.6 blocker.

Comment 2 Andrew Stitcher 2009-07-30 15:06:42 UTC
This should have been finally fixed in 1.2 by the r799271 svn checkin

Comment 5 Frantisek Reznicek 2010-10-06 10:34:06 UTC
Hello Andrew,
could you possibly define testing scenario for this defect, please?

It seems to be clear that heartbeat has to be enabled, and c++ client should create multiple connections.
The unclear point is how the client can be exited due to timeout?
Moreover, should every connection open a / multiple session[s]? Does it necessary to use function when failing over to another clustered broker, or it is cluster independent.

Raising NEEDINFO.

Comment 6 Andrew Stitcher 2010-10-06 13:41:48 UTC
This is a potential broker bug in the timer code where it might wrongly disconnect a heartbeat connection late if there are other heartbeat connections at the same time.

So I think to test you'd need multiple connections all of them with a similar length heartbeat. say 2-3s. They shouldn't actually do anything else.

Then you'd SIGSTOP one of the clients for more than 2x the heartbeat and check that it was disconnected at the correct time without being delayed.

[The SIGSTOP stops the client from being able to send heartbeats without disconnecting the connection]

IIRC The heartbeat disconnects when 2 heartbeats are missed, so if you set heartbeat = 2s then you'd expect disconnect after SIGSTOP from approx 2-4s later.

Comment 7 Jiri Kolar 2010-10-07 16:44:52 UTC
fixed in qpid-cpp-server-0.7.946106-17, daemon waits 10mins.

validated on RHEL5.5 / RHEL4.8  i386 / x86_64  

packages:
# rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u

openais-0.80.6-16.el5_5.7
openais-devel-0.80.6-16.el5_5.7
python-qpid-0.7.946106-14.el5
qpid-cpp-client-0.7.946106-17.el5
qpid-cpp-client-devel-0.7.946106-17.el5
qpid-cpp-client-devel-docs-0.7.946106-17.el5
qpid-cpp-client-ssl-0.7.946106-17.el5
qpid-cpp-mrg-debuginfo-0.7.946106-14.el5
qpid-cpp-server-0.7.946106-17.el5
qpid-cpp-server-cluster-0.7.946106-17.el5
qpid-cpp-server-devel-0.7.946106-17.el5
qpid-cpp-server-ssl-0.7.946106-17.el5
qpid-cpp-server-store-0.7.946106-17.el5
qpid-cpp-server-xml-0.7.946106-17.el5
qpid-java-client-0.7.946106-10.el5
qpid-java-common-0.7.946106-10.el5
qpid-tools-0.7.946106-11.el5
rhm-docs-0.7.946106-5.el5
rh-tests-distribution-MRG-Messaging-qpid_common-1.6-53


->VERIFIED

Comment 8 Martin Prpič 2010-10-10 09:52:50 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The C++ client may not have timed out accurately when multiple connections existed in the process. This could cause earlier events to be blocked behind later events. With this update, any timeouts are handled properly and work as expected.

Comment 10 errata-xmlrpc 2010-10-14 16:04:24 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html


Note You need to log in before you can comment on or make changes to this bug.