Bug 1058887

Summary:	HotRod client keep trying recover connections to a failed cluster for a long time
Product:	[JBoss] JBoss Data Grid 6	Reporter:	wfink
Component:	Infinispan	Assignee:	Tristan Tarrant <ttarrant>
Status:	CLOSED UPSTREAM	QA Contact:	Martin Gencur <mgencur>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	6.1.0	CC:	jdg-bugs, pruivo, vjuranek
Target Milestone:	CR1
Target Release:	6.2.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	If a cluster is no longer reachable for some reason, i.e. network disconnect, the hot-rod client tries to re-establish the lost connections. The client library will retry this by a fixed calculation based on the max numbers of connections from the pool, or 10 multiplied with the number of available servers. This may result in a long delay until the application can continue and react, as it will wait for the read timeout for each try. </para> <para> This has been fixed by adding a new configuration property infinispan.client.hotrod.max_retries. This property defines the maximum number of retries in case of a recoverable error. A valid value should be greater or equal to 0 (zero). Zero means no retry. Default is 10.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2025-02-10 03:34:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1060199, 1060655, 1075061

Description wfink 2014-01-28 17:09:23 UTC

If an JDG cluster is not longer reachable for some reason, i.e. network disconnect, the hot-rod client try to re-establish the lost connections.
The client library will retry this by a fixed calculation based on the max numbers of connections from the pool or 10, multiplied with the number of available servers.
This can lead in a very long time until the application can continue and react as it will wait for the read- or connect-timeout for each try.

To improve this behaviour there should be a configurable limit of retries per server and/or a timeout in total.

This will give the application the chance to handle a remote-cache failure and reply to the user instead of hanging for minutes (with the default settings)

Comment 2 Dan Berindei 2014-01-31 07:55:41 UTC

Pull request integrated: https://github.com/infinispan/jdg/pull/17

Comment 5 Red Hat Bugzilla 2025-02-10 03:34:57 UTC

This product has been discontinued or is no longer tracked in Red Hat Bugzilla.