Bug 1206590

Summary: Nearcache broken after SocketTimeoutException
Product: [JBoss] JBoss Data Grid 6 Reporter: Vojtech Juranek <vjuranek>
Component: ServerAssignee: Galder ZamarreƱo <galder.zamarreno>
Status: CLOSED CURRENTRELEASE QA Contact: Martin Gencur <mgencur>
Severity: unspecified Docs Contact:
Priority: high    
Version: 6.5.0CC: jdg-bugs, pzapataf, rmarwaha, slaskawi, ttarrant
Target Milestone: ER4   
Target Release: 6.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When near cache doesn't receive remote events for some time (e.g. cluster is inactive for some time or client doesn't load any new data), connection to remote JDG server fails with java.net.SocketTimeoutException. Consequence: Near cache is not able to receive any new remote events from remote JDG server, which makes near unusable. Fix: NA Result:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-06-23 12:25:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vojtech Juranek 2015-03-27 13:26:42 UTC
Please see https://issues.jboss.org/browse/ISPN-5221

Comment 3 Vojtech Juranek 2015-05-15 14:29:58 UTC
Still see the exception (see bellow, it's from https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/JDG/view/PERF-CS/job/jdg-perf-cs-near-caching/57/console-edg-perf07/). However, substantial part of the responses measured by RadarGun are pretty fast (about 2 us, see e.g. https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/JDG/view/PERF-CS/job/jdg-perf-cs-near-caching/57/artifact/results/html/histogram_near-caching-eager_BasicOperations.Get_Infinispan_HotRod_Remote_listeners_1_0_total.png), which suggests, that the near cache is actually used, so maybe the error is actually not unrecoverable as it states.

09:54:09,409 ERROR [org.infinispan.client.hotrod.event.ClientListenerNotifier] (Client-Listener-9c1feb28604f4289) ISPN004043: Unrecoverable error reading event from server /172.18.1.5:11222, exiting event reader thread
org.infinispan.client.hotrod.exceptions.TransportException:: java.net.SocketTimeoutException
	at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport.readByte(TcpTransport.java:184)
	at org.infinispan.client.hotrod.impl.protocol.Codec20.readMagic(Codec20.java:282)
	at org.infinispan.client.hotrod.impl.protocol.Codec20.readEvent(Codec20.java:126)
	at org.infinispan.client.hotrod.event.ClientListenerNotifier$EventDispatcher.run(ClientListenerNotifier.java:236)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException
	at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229)
	at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
	at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport.readByte(TcpTransport.java:179)
	... 8 more

Comment 4 Vojtech Juranek 2015-05-15 15:13:18 UTC
> which suggests, that the near cache is actually used, so maybe the error is 
> actually not unrecoverable as it states.

well, I realized, that the timeout is not due to warm up phase, but due to fact that whole data are loaded in near cache, so all data are read from near cache and thus connection to the server times out (i.e. the exception is probably really unrecoverable)

Comment 5 Tristan Tarrant 2015-05-15 17:36:16 UTC
Looking at the code, a SocketTimeoutException is to be expected: the clientlistenernotifier blocks on the connection waiting for data to appear. If no events happen within the set timeout, the socket timeout will trigger. This is pretty normal behaviour and the client should just retry ignoring the exception. I don't why the error is being logged though.

Comment 6 Vojtech Juranek 2015-05-15 19:24:04 UTC
my fault, I use JDG ER4 for server, but forgot to rebuild RG with ER4 and thus RG use old HR client. Thanks Alan for pointing this out! Moving back to ON_QA. Sorry for that!

Comment 7 Vojtech Juranek 2015-05-15 21:11:09 UTC
verified, sorry for the rush once again