Bug 506556 - c++ client may not timeout accurately where multiple connections exist in the process
c++ client may not timeout accurately where multiple connections exist in the...
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.1.1
All Linux
high Severity medium
: 1.3
: ---
Assigned To: Andrew Stitcher
Jiri Kolar
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-06-17 14:24 EDT by Gordon Sim
Modified: 2011-08-12 12:21 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The C++ client may not have timed out accurately when multiple connections existed in the process. This could cause earlier events to be blocked behind later events. With this update, any timeouts are handled properly and work as expected.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-14 12:04:24 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Gordon Sim 2009-06-17 14:24:00 EDT
Description of problem:

The pattern used is for a timer task to be reset whenever there is traffic. However as the task is on the queue at this point, this may violate some assumptions and can cause earlier events to be blocked behind later events.

Likely need a similar fix as r785733, made to the broker. The severity in this case is somehwat lower as it would simply delay failover on occasion rather than
preventing heartbeats from functioning correctly.

Need a test for this case.
Comment 1 Gordon Sim 2009-06-24 14:05:32 EDT
I'm moving this to 1.2; there is a theoretical issue but in some basic testing I have not been able to observe any problem where all connections in the process use the same heartbeat interval so I don't think it is a 1.1.6 blocker.
Comment 2 Andrew Stitcher 2009-07-30 11:06:42 EDT
This should have been finally fixed in 1.2 by the r799271 svn checkin
Comment 5 Frantisek Reznicek 2010-10-06 06:34:06 EDT
Hello Andrew,
could you possibly define testing scenario for this defect, please?

It seems to be clear that heartbeat has to be enabled, and c++ client should create multiple connections.
The unclear point is how the client can be exited due to timeout?
Moreover, should every connection open a / multiple session[s]? Does it necessary to use function when failing over to another clustered broker, or it is cluster independent.

Raising NEEDINFO.
Comment 6 Andrew Stitcher 2010-10-06 09:41:48 EDT
This is a potential broker bug in the timer code where it might wrongly disconnect a heartbeat connection late if there are other heartbeat connections at the same time.

So I think to test you'd need multiple connections all of them with a similar length heartbeat. say 2-3s. They shouldn't actually do anything else.

Then you'd SIGSTOP one of the clients for more than 2x the heartbeat and check that it was disconnected at the correct time without being delayed.

[The SIGSTOP stops the client from being able to send heartbeats without disconnecting the connection]

IIRC The heartbeat disconnects when 2 heartbeats are missed, so if you set heartbeat = 2s then you'd expect disconnect after SIGSTOP from approx 2-4s later.
Comment 7 Jiri Kolar 2010-10-07 12:44:52 EDT
fixed in qpid-cpp-server-0.7.946106-17, daemon waits 10mins.

validated on RHEL5.5 / RHEL4.8  i386 / x86_64  

packages:
# rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u

openais-0.80.6-16.el5_5.7
openais-devel-0.80.6-16.el5_5.7
python-qpid-0.7.946106-14.el5
qpid-cpp-client-0.7.946106-17.el5
qpid-cpp-client-devel-0.7.946106-17.el5
qpid-cpp-client-devel-docs-0.7.946106-17.el5
qpid-cpp-client-ssl-0.7.946106-17.el5
qpid-cpp-mrg-debuginfo-0.7.946106-14.el5
qpid-cpp-server-0.7.946106-17.el5
qpid-cpp-server-cluster-0.7.946106-17.el5
qpid-cpp-server-devel-0.7.946106-17.el5
qpid-cpp-server-ssl-0.7.946106-17.el5
qpid-cpp-server-store-0.7.946106-17.el5
qpid-cpp-server-xml-0.7.946106-17.el5
qpid-java-client-0.7.946106-10.el5
qpid-java-common-0.7.946106-10.el5
qpid-tools-0.7.946106-11.el5
rhm-docs-0.7.946106-5.el5
rh-tests-distribution-MRG-Messaging-qpid_common-1.6-53


->VERIFIED
Comment 8 Martin Prpič 2010-10-10 05:52:50 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The C++ client may not have timed out accurately when multiple connections existed in the process. This could cause earlier events to be blocked behind later events. With this update, any timeouts are handled properly and work as expected.
Comment 10 errata-xmlrpc 2010-10-14 12:04:24 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html

Note You need to log in before you can comment on or make changes to this bug.