Description of problem: If a network address is disabled (e.g. if a cable became disconnected), the client will block trying to connect until the tcp implementation times it out. It would be desirable to be able to control this (either by using the existing heartbeat setting or a distinct connection timeout setting). Version-Release number of selected component (if applicable): 1.1.6 How reproducible: Easily Steps to Reproduce: 1. modify e.g. replaying_sender from failover example to have a heartbeat of 2 seconds 2. run it against a remote broker (ideally with QPID_LOG_ENABLE=info+) 3. then disable that network connection (either pull the cable or shutdown the driver via ifconfig down) 4. retry step 2 and note the time taken to fail Actual results: Takes 12 seconds to fail for me. Expected results: I had expected it to fail n 4 seconds (2*heartbeat interval) Additional info: The heartbeat timer task is not actually enabled until after the connect() call returns.
Created attachment 363455 [details] One possible fix The attached patch is one possible fix. It simply moves the initialisation of the heartbeat timer to just before the connect call. Note that even as the code stands without this patch, any change to the heartbeat that might happen due to negotiation would be ignored which is not strictly correct (though I think it would only be an issue when using qpidd if the heartbeat specified for client was greater than the brokers maximum value). A more desirable fix would probably be to connect in non-blocking mode.
Agreed, non-blocking connect + start the heartbeat time after it (immediately) returns + poll for connection completion seems to be the right fix.
Agreed, non-blocking connect + start the heartbeat timer after it (immediately) returns + poll for connection completion seems to be the right fix.
Fixed this using the existing non-blocking code There is one change to the previous connect failure behaviour: The exception that gets thrown when Connection::open() fails no longer has any useful error text. The error text does now get logged as a warning though.
Added line 'settings.heartbeat = 2;' to replaying_sender.cpp and compiled it on RHEL 4 and 5, i386 and x86_64, once with old qpidc-devel (0.5.752581-26.el5) where the long delay occured. Everything works fine on latest (0.5.752581-28.el5) versions.
Created attachment 365480 [details] the one-line patched replaying_sender which triggers the bug
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Connecting to a disabled network address is now failing quickly (526868)
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,10 @@ -Connecting to a disabled network address is now failing quickly (526868)+Messaging bug fix: + +C: If a network address becomes disabled, the +client will block trying to connect until the tcp implementation times it out. +C: The time taken for the network connection to fail was excessive. +F: The connection behavior was adjusted +R: Network timeouts will now fail in a much shorter time frame. + +If a network address becomes disabled, the +client would block trying to connect until the TCP implementation timed out. The time taken for the network connection to fail was excessive. The connection behavior was adjusted so that network timeouts now fail promptly.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -6,5 +6,4 @@ F: The connection behavior was adjusted R: Network timeouts will now fail in a much shorter time frame. -If a network address becomes disabled, the +If a network address becomes disabled, the client would block trying to connect until the TCP implementation timed out. The time taken for the network connection to fail was excessive. The connection behavior was adjusted so that network timeouts now fail promptly.-client would block trying to connect until the TCP implementation timed out. The time taken for the network connection to fail was excessive. The connection behavior was adjusted so that network timeouts now fail promptly.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-1633.html