Bug 526868 - connecting to a disabled network address takes too long to fail
Summary: connecting to a disabled network address takes too long to fail
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1.6
Hardware: All
OS: Linux
urgent
high
Target Milestone: 1.2
: ---
Assignee: Andrew Stitcher
QA Contact: Jan Sarenik
URL:
Whiteboard:
Depends On:
Blocks: 527551
TreeView+ depends on / blocked
 
Reported: 2009-10-02 08:21 UTC by Gordon Sim
Modified: 2011-08-12 16:21 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Messaging bug fix: C: If a network address becomes disabled, the client will block trying to connect until the tcp implementation times it out. C: The time taken for the network connection to fail was excessive. F: The connection behavior was adjusted R: Network timeouts will now fail in a much shorter time frame. If a network address becomes disabled, the client would block trying to connect until the TCP implementation timed out. The time taken for the network connection to fail was excessive. The connection behavior was adjusted so that network timeouts now fail promptly.
Clone Of:
Environment:
Last Closed: 2009-12-03 09:15:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
One possible fix (821 bytes, patch)
2009-10-02 10:05 UTC, Gordon Sim
no flags Details | Diff
the one-line patched replaying_sender which triggers the bug (4.15 KB, text/plain)
2009-10-21 09:48 UTC, Jan Sarenik
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:1633 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid Version 1.2 2009-12-03 09:15:33 UTC

Description Gordon Sim 2009-10-02 08:21:42 UTC
Description of problem:

If a network address is disabled (e.g. if a cable became disconnected), the client will block trying to connect until the tcp implementation times it out. 

It would be desirable to be able to control this (either by using the existing heartbeat setting or a distinct connection timeout setting).

Version-Release number of selected component (if applicable):

1.1.6

How reproducible:

Easily

Steps to Reproduce:
1. modify e.g. replaying_sender from failover example to have a heartbeat of 2 seconds
2. run it against a remote broker (ideally with QPID_LOG_ENABLE=info+)
3. then disable that network connection (either pull the cable or shutdown the driver via ifconfig down)
4. retry step 2 and note the time taken to fail
  
Actual results:

Takes 12 seconds to fail for me.

Expected results:

I had expected it to fail n 4 seconds (2*heartbeat interval)

Additional info:

The heartbeat timer task is not actually enabled until after the connect() call returns.

Comment 1 Gordon Sim 2009-10-02 10:05:12 UTC
Created attachment 363455 [details]
One possible fix

The attached patch is one possible fix. It simply moves the initialisation of the heartbeat timer to just before the connect call.

Note that even as the code stands without this patch, any change to the heartbeat that might happen due to negotiation would be ignored which is not strictly correct (though I think it would only be an issue when using qpidd if the heartbeat specified for client was greater than the brokers maximum value). 

A more desirable fix would probably be to connect in non-blocking mode.

Comment 2 Arnaldo Carvalho de Melo 2009-10-02 12:39:27 UTC
Agreed, non-blocking connect + start the heartbeat time after it (immediately) returns + poll for connection completion seems to be the right fix.

Comment 3 Arnaldo Carvalho de Melo 2009-10-02 12:39:46 UTC
Agreed, non-blocking connect + start the heartbeat timer after it (immediately) returns + poll for connection completion seems to be the right fix.

Comment 4 Andrew Stitcher 2009-10-06 20:55:40 UTC
Fixed this using the existing non-blocking code

There is one change to the previous connect failure behaviour:

The exception that gets thrown when Connection::open() fails no longer has any useful error text.

The error text does now get logged as a warning though.

Comment 5 Jan Sarenik 2009-10-16 09:12:19 UTC
Added line 'settings.heartbeat  = 2;' to replaying_sender.cpp
and compiled it on RHEL 4 and 5, i386 and x86_64,
once with old qpidc-devel (0.5.752581-26.el5) where
the long delay occured.

Everything works fine on latest (0.5.752581-28.el5) versions.

Comment 6 Jan Sarenik 2009-10-21 09:48:27 UTC
Created attachment 365480 [details]
the one-line patched replaying_sender which triggers the bug

Comment 7 Irina Boverman 2009-10-22 17:49:18 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Connecting to a disabled network address is now failing quickly (526868)

Comment 8 Lana Brindley 2009-11-26 23:44:24 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,10 @@
-Connecting to a disabled network address is now failing quickly (526868)+Messaging bug fix:
+
+C: If a network address becomes disabled, the
+client will block trying to connect until the tcp implementation times it out. 
+C: The time taken for the network connection to fail was excessive.
+F: The connection behavior was adjusted
+R: Network timeouts will now fail in a much shorter time frame.
+
+If a network address becomes disabled, the
+client would block trying to connect until the TCP implementation timed out. The time taken for the network connection to fail was excessive. The connection behavior was adjusted so that network timeouts now fail promptly.

Comment 9 Lana Brindley 2009-11-26 23:44:59 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -6,5 +6,4 @@
 F: The connection behavior was adjusted
 R: Network timeouts will now fail in a much shorter time frame.
 
-If a network address becomes disabled, the
+If a network address becomes disabled, the client would block trying to connect until the TCP implementation timed out. The time taken for the network connection to fail was excessive. The connection behavior was adjusted so that network timeouts now fail promptly.-client would block trying to connect until the TCP implementation timed out. The time taken for the network connection to fail was excessive. The connection behavior was adjusted so that network timeouts now fail promptly.

Comment 10 errata-xmlrpc 2009-12-03 09:15:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html


Note You need to log in before you can comment on or make changes to this bug.