Bug 1058887 - HotRod client keep trying recover connections to a failed cluster for a long time
Summary: HotRod client keep trying recover connections to a failed cluster for a long ...
Keywords:
Status: VERIFIED
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan
Version: 6.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: CR1
: 6.2.1
Assignee: Tristan Tarrant
QA Contact: Martin Gencur
URL:
Whiteboard:
Depends On:
Blocks: 1060199 1075061 1060655
TreeView+ depends on / blocked
 
Reported: 2014-01-28 17:09 UTC by wfink
Modified: 2022-03-03 09:11 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
If a cluster is no longer reachable for some reason, i.e. network disconnect, the hot-rod client tries to re-establish the lost connections. The client library will retry this by a fixed calculation based on the max numbers of connections from the pool, or 10 multiplied with the number of available servers. This may result in a long delay until the application can continue and react, as it will wait for the read timeout for each try. </para> <para> This has been fixed by adding a new configuration property infinispan.client.hotrod.max_retries. This property defines the maximum number of retries in case of a recoverable error. A valid value should be greater or equal to 0 (zero). Zero means no retry. Default is 10.
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker ISPN-3947 0 Critical Resolved HotRod client keep trying recover connections to a failed cluster 2019-09-05 05:30:15 UTC

Description wfink 2014-01-28 17:09:23 UTC
If an JDG cluster is not longer reachable for some reason, i.e. network disconnect, the hot-rod client try to re-establish the lost connections.
The client library will retry this by a fixed calculation based on the max numbers of connections from the pool or 10, multiplied with the number of available servers.
This can lead in a very long time until the application can continue and react as it will wait for the read- or connect-timeout for each try.

To improve this behaviour there should be a configurable limit of retries per server and/or a timeout in total.

This will give the application the chance to handle a remote-cache failure and reply to the user instead of hanging for minutes (with the default settings)

Comment 2 Dan Berindei 2014-01-31 07:55:41 UTC
Pull request integrated: https://github.com/infinispan/jdg/pull/17


Note You need to log in before you can comment on or make changes to this bug.