Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1206590

Summary:	Nearcache broken after SocketTimeoutException
Product:	[JBoss] JBoss Data Grid 6	Reporter:	Vojtech Juranek <vjuranek>
Component:	Server	Assignee:	Galder Zamarreño <galder.zamarreno>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Martin Gencur <mgencur>
Severity:	unspecified	Docs Contact:
Priority:	high
Version:	6.5.0	CC:	jdg-bugs, pzapataf, rmarwaha, slaskawi, ttarrant
Target Milestone:	ER4
Target Release:	6.5.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: When near cache doesn't receive remote events for some time (e.g. cluster is inactive for some time or client doesn't load any new data), connection to remote JDG server fails with java.net.SocketTimeoutException. Consequence: Near cache is not able to receive any new remote events from remote JDG server, which makes near unusable. Fix: NA Result:	Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-06-23 12:25:00 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Vojtech Juranek 2015-03-27 13:26:42 UTC

Please see https://issues.jboss.org/browse/ISPN-5221

Comment 3 Vojtech Juranek 2015-05-15 14:29:58 UTC

Still see the exception (see bellow, it's from https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/JDG/view/PERF-CS/job/jdg-perf-cs-near-caching/57/console-edg-perf07/). However, substantial part of the responses measured by RadarGun are pretty fast (about 2 us, see e.g. https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/JDG/view/PERF-CS/job/jdg-perf-cs-near-caching/57/artifact/results/html/histogram_near-caching-eager_BasicOperations.Get_Infinispan_HotRod_Remote_listeners_1_0_total.png), which suggests, that the near cache is actually used, so maybe the error is actually not unrecoverable as it states.

09:54:09,409 ERROR [org.infinispan.client.hotrod.event.ClientListenerNotifier] (Client-Listener-9c1feb28604f4289) ISPN004043: Unrecoverable error reading event from server /172.18.1.5:11222, exiting event reader thread
org.infinispan.client.hotrod.exceptions.TransportException:: java.net.SocketTimeoutException
	at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport.readByte(TcpTransport.java:184)
	at org.infinispan.client.hotrod.impl.protocol.Codec20.readMagic(Codec20.java:282)
	at org.infinispan.client.hotrod.impl.protocol.Codec20.readEvent(Codec20.java:126)
	at org.infinispan.client.hotrod.event.ClientListenerNotifier$EventDispatcher.run(ClientListenerNotifier.java:236)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException
	at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229)
	at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
	at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport.readByte(TcpTransport.java:179)
	... 8 more

Comment 4 Vojtech Juranek 2015-05-15 15:13:18 UTC

> which suggests, that the near cache is actually used, so maybe the error is 
> actually not unrecoverable as it states.

well, I realized, that the timeout is not due to warm up phase, but due to fact that whole data are loaded in near cache, so all data are read from near cache and thus connection to the server times out (i.e. the exception is probably really unrecoverable)

Comment 5 Tristan Tarrant 2015-05-15 17:36:16 UTC

Looking at the code, a SocketTimeoutException is to be expected: the clientlistenernotifier blocks on the connection waiting for data to appear. If no events happen within the set timeout, the socket timeout will trigger. This is pretty normal behaviour and the client should just retry ignoring the exception. I don't why the error is being logged though.

Comment 6 Vojtech Juranek 2015-05-15 19:24:04 UTC

my fault, I use JDG ER4 for server, but forgot to rebuild RG with ER4 and thus RG use old HR client. Thanks Alan for pointing this out! Moving back to ON_QA. Sorry for that!

Comment 7 Vojtech Juranek 2015-05-15 21:11:09 UTC

verified, sorry for the rush once again