Bug 526868

Summary: connecting to a disabled network address takes too long to fail
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-cppAssignee: Andrew Stitcher <astitcher>
Status: CLOSED ERRATA QA Contact: Jan Sarenik <jsarenik>
Severity: high Docs Contact:
Priority: urgent    
Version: 1.1.6CC: acme, iboverma, jsarenik, lbrindle, tross
Target Milestone: 1.2   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Messaging bug fix: C: If a network address becomes disabled, the client will block trying to connect until the tcp implementation times it out. C: The time taken for the network connection to fail was excessive. F: The connection behavior was adjusted R: Network timeouts will now fail in a much shorter time frame. If a network address becomes disabled, the client would block trying to connect until the TCP implementation timed out. The time taken for the network connection to fail was excessive. The connection behavior was adjusted so that network timeouts now fail promptly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-03 09:15:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 527551    
Attachments:
Description Flags
One possible fix
none
the one-line patched replaying_sender which triggers the bug none

Description Gordon Sim 2009-10-02 08:21:42 UTC
Description of problem:

If a network address is disabled (e.g. if a cable became disconnected), the client will block trying to connect until the tcp implementation times it out. 

It would be desirable to be able to control this (either by using the existing heartbeat setting or a distinct connection timeout setting).

Version-Release number of selected component (if applicable):

1.1.6

How reproducible:

Easily

Steps to Reproduce:
1. modify e.g. replaying_sender from failover example to have a heartbeat of 2 seconds
2. run it against a remote broker (ideally with QPID_LOG_ENABLE=info+)
3. then disable that network connection (either pull the cable or shutdown the driver via ifconfig down)
4. retry step 2 and note the time taken to fail
  
Actual results:

Takes 12 seconds to fail for me.

Expected results:

I had expected it to fail n 4 seconds (2*heartbeat interval)

Additional info:

The heartbeat timer task is not actually enabled until after the connect() call returns.

Comment 1 Gordon Sim 2009-10-02 10:05:12 UTC
Created attachment 363455 [details]
One possible fix

The attached patch is one possible fix. It simply moves the initialisation of the heartbeat timer to just before the connect call.

Note that even as the code stands without this patch, any change to the heartbeat that might happen due to negotiation would be ignored which is not strictly correct (though I think it would only be an issue when using qpidd if the heartbeat specified for client was greater than the brokers maximum value). 

A more desirable fix would probably be to connect in non-blocking mode.

Comment 2 Arnaldo Carvalho de Melo 2009-10-02 12:39:27 UTC
Agreed, non-blocking connect + start the heartbeat time after it (immediately) returns + poll for connection completion seems to be the right fix.

Comment 3 Arnaldo Carvalho de Melo 2009-10-02 12:39:46 UTC
Agreed, non-blocking connect + start the heartbeat timer after it (immediately) returns + poll for connection completion seems to be the right fix.

Comment 4 Andrew Stitcher 2009-10-06 20:55:40 UTC
Fixed this using the existing non-blocking code

There is one change to the previous connect failure behaviour:

The exception that gets thrown when Connection::open() fails no longer has any useful error text.

The error text does now get logged as a warning though.

Comment 5 Jan Sarenik 2009-10-16 09:12:19 UTC
Added line 'settings.heartbeat  = 2;' to replaying_sender.cpp
and compiled it on RHEL 4 and 5, i386 and x86_64,
once with old qpidc-devel (0.5.752581-26.el5) where
the long delay occured.

Everything works fine on latest (0.5.752581-28.el5) versions.

Comment 6 Jan Sarenik 2009-10-21 09:48:27 UTC
Created attachment 365480 [details]
the one-line patched replaying_sender which triggers the bug

Comment 7 Irina Boverman 2009-10-22 17:49:18 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Connecting to a disabled network address is now failing quickly (526868)

Comment 8 Lana Brindley 2009-11-26 23:44:24 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,10 @@
-Connecting to a disabled network address is now failing quickly (526868)+Messaging bug fix:
+
+C: If a network address becomes disabled, the
+client will block trying to connect until the tcp implementation times it out. 
+C: The time taken for the network connection to fail was excessive.
+F: The connection behavior was adjusted
+R: Network timeouts will now fail in a much shorter time frame.
+
+If a network address becomes disabled, the
+client would block trying to connect until the TCP implementation timed out. The time taken for the network connection to fail was excessive. The connection behavior was adjusted so that network timeouts now fail promptly.

Comment 9 Lana Brindley 2009-11-26 23:44:59 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -6,5 +6,4 @@
 F: The connection behavior was adjusted
 R: Network timeouts will now fail in a much shorter time frame.
 
-If a network address becomes disabled, the
+If a network address becomes disabled, the client would block trying to connect until the TCP implementation timed out. The time taken for the network connection to fail was excessive. The connection behavior was adjusted so that network timeouts now fail promptly.-client would block trying to connect until the TCP implementation timed out. The time taken for the network connection to fail was excessive. The connection behavior was adjusted so that network timeouts now fail promptly.

Comment 10 errata-xmlrpc 2009-12-03 09:15:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html